May 5, 2025
Trending News

Claude 3.5 Sonnet dethrones GPT-4 as most powerful LLM

  • August 1, 2024
  • 0

Claude 3.5 Sonnet emerges as the winner in a comparison test between LLMs. Open source models close the gap to their closed counterparts. The question of which is

AI boxes anthropic vs. OpenAI

Claude 3.5 Sonnet emerges as the winner in a comparison test between LLMs. Open source models close the gap to their closed counterparts.

The question of which is the best LLM is not so easy to answer. After all, LLMs are trained in many disciplines, so one model may be better at math, while another model has a better understanding of language. The AI ​​startup pitted the 22 most advanced LLMs currently against each other and declared Anthropic’s Claude 3.5 Sonnet as the winner.

The results are described in detail LLM Hallucination Index. The purpose of the research is to test the models on various tasks relevant to end users. We consider tasks with a short context window (less than 5,000 tokens) and tasks that require a medium (5,000–25,000 tokens) and a long context window (up to 100,000 tokens).

OpenAI dethroned

Last year OpenAI won the awards, but this year Galilei declares Anthropic the winner. The Claude 3.5 Sonnet model proved to be the best performing model according to the benchmarks used for both short and long context windows. Google Gemini 1.5 Flash receives an award as the model with the best price-performance ratio.

There are no 100% error-free benchmarks for comparing LLMs and there will undoubtedly be tests that say GPT-4 is the best. The study certainly shows that the competition is the strongest. OpenAI was considered the reference in the early stages of the GenAI hype, but the models from Anthropic and Google are now on equal footing.

Another general observation made by the researchers is that the length of the context window seems to have little impact on accuracy. This means that LLMs are getting better and better at handling large files. You can run Claude 3.5 Sonnet or Google Gemini 1.5 Flash through a book and the model will be able to extract very detailed information.

Open source closes the gap

Galileo’s test included twelve open source models. The researchers concluded that open source models are gradually becoming equal to their closed counterparts. The winner in the open source category will surprise you: it was not Meta’s LLama 3 or Mistral, but Alibaba’s Qwen2 (72B) that convinced the researchers the most.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version