Human copyrighted material is a crucial part of training leading AI models, according to OpenAI. Legality and responsibility issues lurk around the corner.
According to OpenAI, it would be impossible to train leading AI models without using copyrighted human work. A recent IEEE report shows several examples of plagiarism, raising the question of who is responsible, the user or the model itself. OpenAI stands firm and says that using copyrighted material is legal.
Copyright as an obstacle
Although AI models continue to evolve rapidly, developers continue to face the issue of copyright. The more data a model can process, the more valuable the output is. However, copyrighted works cannot simply be used by people. However, according to the Microsoft-backed lab, purely public domain material would result in inferior AI software.
A recent IEEE study co-authored by AI expert and critic Gary Marcus and digital illustrator Reid Southen found that OpenAI’s Midjourney and DALL-E 3 can recreate copyrighted scenes from movies or games based on training data. According to Marcus and Southen, it is almost clear that the Midjourney and OpenAI models were trained on copyrighted material. However, the companies themselves have not published their training data.
The person responsible for the plagiarism
The question, of course, is whether using copyrighted material is legal and whether AI customers risk being held liable. The IEEE report can already serve as additional support in this debate, although Tyler Ochoa, a professor at the Law Department at Santa Clara University in California, has completed his analysis of the report.
Ochoa calls on Copyright to review the IEEE report. However, he questions the authors’ advice because these are examples in which users directly ask about plagiarism. For example, each prompt contains the title of a specific movie or the words “movie” and “screenshot,” indicating that the user, not the model itself, is literally asking for plagiarism. So the question is who is responsible for these acts of plagiarism, the creators of the AI model or the people who ask the AI model to reproduce popular scenes.
OpenAI remains steadfast
OpenAI continues to acknowledge the fact that today almost every form of human expression, be it a blog post, photos or software code, is protected by copyright. According to OpenAI, this would make it impossible to train leading AI models without using the proprietary material. However, it can also be said that if a company creates something (e.g. an AI model) using copyrighted material, permission must be obtained in advance. It’s a battle between human labor and machines that seems far from over.