OpenAI reportedly used more than a million hours of YouTube videos to train its GPT-4 model

While artificial intelligence models keep our mouths open all the time, there are some question marks that come with these tools. One of these is the data used for training. Use of data without permission copyright infringement It can cause

A report shared by The New York Times also draws attention to this point. According to the claim shared in the news, OpenAI is working on training the artificial intelligence model. He used Google data.

More than a million hours of YouTube videos were used to train GPT-4

The NYT claim showed that OpenAI benefited from a significant amount of YouTube data. Accordingly, the artificial intelligence giant, To whisper with a voice recognition tool called of a million hours has transcribed and compiled many YouTube videos using the most advanced language model. When training GPT-4 used.

Furthermore, the company knew that this situation could raise legal questions, but it won’t cause any problems It was also reported that he was thinking about it. It was claimed that Greg Brockman, who was president of the company, also took part in collecting the videos. The Times article adds that OpenAI exhausted the resources it was using to train in 2021 and then began discussing its plan to transcribe YouTube content. Until then, the company used codes from Github, chess databases and school content from Quizlet.

Matt Bryant, a spokesman for Google, which owns YouTube, told The Verge that he had seen “unconfirmed reports” about the issue and that such unauthorized use Forbidden He stated that this was so. As we shared with you a few days ago, YouTube CEO Neal Mohan also announced that the platform using their data would be a violation he claimed. Mohan, the new model from OpenAI Soras He made such a statement due to allegations that he was trained on YouTube.

Google itself has trained models with YouTube data

In addition, there is information that Google itself collects data from YouTube. Spokesperson Bryant: In line with Google’s agreements with content producers to train their own models He stated that he used YouTube content. For this reason, it was also claimed that he took no action against OpenAI.

All these claims reveal a different face of artificial intelligence. Unauthorized use of data can cause major copyright infringement issues. We are waiting to see what will happen with this problem.

Follow Webtekno on Threads and don’t miss the news

Source: Web Tekno

Alice

Alice Smith is a seasoned journalist and writer for Div Bracket. She has a keen sense of what’s important and is always on top of the latest trends. Alice provides in-depth coverage of the most talked-about news stories, delivering insightful and thought-provoking articles that keep her readers informed and engaged.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024