AI models trained with AI-generated content produce unusable results

According to a study from Oxford University, AI models trained on AI-generated content can lead to model collapse. The accumulation of errors and misunderstandings in the AI-generated content of previous generations leads to unusable results.

Large AI companies buy huge amounts of human-generated data to train their AI models. This data is finite, and the web is gradually being flooded with AI-generated content. How will AI models be trained in the future when the web is dominated by AI-generated data? Researchers at the University of Oxford recently published a research article in Nature that attempts to answer this question.

The research suggests that algorithmically generated content can lead to so-called model breakdowns, where new AI models can no longer generate useful results. The research was led by Ilia Shumailov, a computer scientist at the University of Oxford. The project was carried out in collaboration with colleagues from other academic institutions.

AI-generated training data

In the research article entitled “AI models break when trained on recursively generated data.The researchers’ goal is to find out whether the proliferation of algorithmically generated web content can make large language models less useful.

Developers usually use websites to create their Large language models (LLM). In a world where AI-generated content is gradually taking over, the web is full of AI-generated information. This content must be used as training data for LLMs in the future.

Model collapse

The research paper suggests that a buildup of errors and misunderstandings from previous generations of models could cause new AI models to lose accuracy or even “break down.”

Technology companies already use a technique that involves “watermarking” AI-generated content so that it can be excluded from training data sets. The coordination required between technology companies poses major challenges to this solution and is unlikely to be economically viable. According to the study’s conclusion, new steps must be taken to keep high-quality content available for AI development projects.

Source: IT Daily

Mary

As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024