GPT 3.5 is better than GPT 4 (in Street Fighter III)

An original benchmark for LLMs shows that bigger is not always better. In the computer game Street Fighter III, GPT 4 is not the better player. The results show that pure intelligence is not the best parameter for every task: speed also plays a role.

During the Mistral hackathon in San Francisco in the USA, some AI developers developed an original benchmark for LLMs: LLM Colosseum. At the LLM Colosseum, two LLMs compete in a round of Street Fighter III. The models have to play against each other and describe what is happening. In battles, you’ll see characters choose attacks and block each other.

In the video above you’ll see a battle in action and get instructions on how to install LLM Colosseum yourself. The project is interesting for a quiet Friday, but it also raises some important concerns.

Understanding, intellect and speed

The first thing you notice is that the fights look good. LLMs can effectively play against each other, even if they were never trained to do so. The models understand enough context to play the game somewhat decently.

Then intelligence is not the most important parameter for success. For example, GPT-4 is much smarter than GPT-3.5, but also larger. This means that the Smart model is usually a bit slower: poor quality for a round of Street Fighter. Speed is more important in this test. There is a lesson to be learned from this: the most complex model is not the best in every scenario. Sometimes a minimum level of intelligence is enough, and then speed outweighs accuracy.

Conscientious objection

A few more interesting things have appeared at the LLM Colosseum. Claude 2.1 from Anthropic, for example, doesn’t want to play along. The LLM is too pacifistic and refuses to take part in fictional battles. All LLMs have so-called guardrails that impose restrictions on what they can submit in response to certain requests. Violence is usually something developers block. In theory, playing Street Fighter III would of course be allowed, but Claude 2.1 turns out to be quite fundamental.

Source: IT Daily

Mary

As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.