May 13, 2025
Trending News

AI models trained with AI-generated content produce unusable results

  • July 26, 2024
  • 0

According to a study from Oxford University, AI models trained on AI-generated content can lead to model collapse. The accumulation of errors and misunderstandings in the AI-generated content

Google Gemini Reading Comprehension

According to a study from Oxford University, AI models trained on AI-generated content can lead to model collapse. The accumulation of errors and misunderstandings in the AI-generated content of previous generations leads to unusable results.

Large AI companies buy huge amounts of human-generated data to train their AI models. This data is finite, and the web is gradually being flooded with AI-generated content. How will AI models be trained in the future when the web is dominated by AI-generated data? Researchers at the University of Oxford recently published a research article in Nature that attempts to answer this question.

The research suggests that algorithmically generated content can lead to so-called model breakdowns, where new AI models can no longer generate useful results. The research was led by Ilia Shumailov, a computer scientist at the University of Oxford. The project was carried out in collaboration with colleagues from other academic institutions.

AI-generated training data

In the research article entitled “AI models break when trained on recursively generated data.The researchers’ goal is to find out whether the proliferation of algorithmically generated web content can make large language models less useful.

Developers usually use websites to create their Large language models (LLM). In a world where AI-generated content is gradually taking over, the web is full of AI-generated information. This content must be used as training data for LLMs in the future.

Model collapse

The research paper suggests that a buildup of errors and misunderstandings from previous generations of models could cause new AI models to lose accuracy or even “break down.”

Technology companies already use a technique that involves “watermarking” AI-generated content so that it can be excluded from training data sets. The coordination required between technology companies poses major challenges to this solution and is unlikely to be economically viable. According to the study’s conclusion, new steps must be taken to keep high-quality content available for AI development projects.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version