May 18, 2025
Trending News

Tech giants train AI models with YouTube videos, without permission

  • July 17, 2024
  • 0

Tech giants like Apple, Nvidia and Anthropic have used more than 173,000 YouTube videos to train their AI models without permission. According to an investigation by Proof News

Tech giants like Apple, Nvidia and Anthropic have used more than 173,000 YouTube videos to train their AI models without permission.

According to an investigation by Proof News in collaboration with Wired, some major tech giants, including Apple, Nvidia, Salesforce, and Anthropic, are said to have used more than 173,000 YouTube videos from more than 48,000 YouTube channels to train their AI models, particularly the subtitles of YouTube videos. This was done without obtaining permission from YouTube or the creators of the videos.

Dataset “The Heap”

Specifically, it includes 173,536 videos from more than 48,000 YouTube channels, using popular (English-speaking) YouTubers and channels such as MrBeast, TED Talks, and BBC. Proof News discovered a nonprofit called EleuterhAI that owns a public dataset called “The Pile.” This includes data from YouTube, English Wikipedia, the European Parliament, and even a series of emails from Enron Corporation employees that were released as part of a federal investigation.

Major companies like Apple, Nvidia, and Salesforce describe in their research papers and posts how they used “the Pile” to train their AI. The public dataset has also been used by several other tech companies. While this is not easy to determine, Proof News states that it was able to determine that it was the Pile based on a vague description of their training data.

Without permission

Proof News created a tool to check whether a video was used in the dataset or not. Although the subtitles were anonymized, Proof News was still able to link the channels using the video identification numbers.

Although the captions are publicly displayed on YouTube, this does not mean that they can be used to train AI models. The creators never gave permission for this. Meanwhile, Anthropic and Salesforce told Proof News that they had used the dataset. Apple, Bloomberg and Nvidia have not yet responded.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version