May 1, 2025
Trending News

How do I debug LLMs? I’m taking another LLM

  • June 28, 2024
  • 0

OpenAI introduces a new language model that helps human AI trainers detect errors. Language models are trained by human AI trainers to improve the quality of answers. The

How do I debug LLMs? I’m taking another LLM

OpenAI introduces a new language model that helps human AI trainers detect errors.

Language models are trained by human AI trainers to improve the quality of answers. The more powerful such language models become, the more they outperform the knowledge of the human brain and it becomes increasingly difficult for AI trainers to detect incorrect answers.

OpenAI has trained a new CriticGPT model based on GPT-4 to help AI trainers detect bugs in ChatGPT. “We found that people who get help from CriticGPT in reviewing ChatGPT code perform better than people without help 60 percent of the time,” the startup said.

Human AI Trainer

OpenAI uses human AI trainers to train its language models and detect ChatGPT code errors. Under the motto “Err is human,” OpenAI has developed a new AI model CriticGPT to help human trainers in case they fail to notice certain errors. OpenAI’s Microsoft-backed Superlab published a paper on Thursday titled “LLM Critics Help Catch LLM Bugs” that explains the method in detail.

Fire with fire

Generative AI models such as the recently introduced GPT-4o are trained on large amounts of data and subjected to a refinement process Reinforcing learning from human feedback (RLHF) is mentioned. Human trainers then interact with LLMs to comment on their answers to various questions. The model must learn which answer is preferred.

Since the knowledge of such language models sometimes exceeds human knowledge, OpenAI thought of nothing better than to develop another language model that checks the other language model. This CriticGPT model supports the human trainers and criticizes the generative responses of the language model.

Hallucinations

The article shows that “LLMs detect significantly more injected errors than qualified people who have paid to review the code, and furthermore, model critiques are preferred over human critiques in more than 80 percent of cases.” When it comes to hallucinations, human trainers working with CriticGPT have a lower hallucination rate than CriticGPT responses alone, although this error rate is still higher than if a human trainer had to respond alone.

“Unfortunately, it is not clear what the appropriate trade-off between hallucinations and fault detection is for an overall RLHF system that uses criticality to improve model performance,” the article admits.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *