British researchers have found a way to detect hallucinations in LLMs by having other LLMs evaluate their work.
Researchers from the University of Oxford have published in the British journal Nature a possible way to detect hallucinations in LLMs. Hallucinations are incorrect statements made by a language model that appear correct. This common problem puts AI companies in a difficult position. The battle to create an LLM that is as accurate and correct as possible is now open. The researchers developed a method for detecting hallucinations based on other LLMs.
Hallucinations
Hallucinations are statements made by language models that appear legitimate but are in fact false. This is one of the biggest shortcomings facing LLMs today, significantly reducing the reliability of answers.
British researchers used LLMs to find a way to detect hallucinations in LLMs. They found a way to quantify the level of hallucinations generated by an LLM. In addition, they also show how accurate the generated content might be.
“Fight fire with fire”
This new method allows researchers to detect so-called “confabulations,” in which LLMs produce inaccurate and arbitrary texts, by using another LLM to review the work of the original LLM and another to evaluate that work.
One outside researcher described this method as “fighting fire with fire.” LLMs could therefore be an important part of checking their own answers. The work focuses on the meanings rather than the words themselves. The outputs under review were placed in another system to look for paraphrases.
The study found that a third LLM who assessed the results came to roughly the same conclusion as someone else. The research was published as the article “Detecting hallucinations in large language models using semantic entropy” in the British journal Nature.