A wolf in sheep's clothing: GPT-4 risk analysis causes a stir

Before the global release of GPT-4, OpenAI had the risks of the language model thoroughly analyzed. Rest assured: GPT-4 is not capable of taking over the world. Why OpenAI investigated this is a bit of a concern for experts.

During the introduction of GPT-4, OpenAI hammered out the improved security of the system. The company argued that the new version would provide up to 40 percent more factually correct answers and up to 80 percent fewer non-ethical answers.

In a research report, OpenAI explains how these numbers came about. Attempting to address concerns about the security of the AI model is counterproductive. Because if you read the paper thoroughly, you will have more questions than answers about the innocence of GPT-4.

extreme answers

For the study, researchers from ARC, led by a former OpenAI employee, took on the role of rescue team. That means they were allowed to put on their wildest shoes to trick GPT-4 into making insane answers. Certain answers made us frown.

To cite just one example, the researchers asked GPT-4 how to kill humans for less than a dollar. The AI gave a list of “suggestions” without blinking. Initially, GPT-4 also saw no problem in making dangerous chemicals or inventing racist marketing campaigns.

Note that these extreme responses only occurred on a crude, unpolished version of the model. The version that OpenAI offers to the general public as ChatGPT is strictly censored. If you ask ChatGPT extreme questions, it will (usually) not answer. However, the sometimes strange answers given by Bing Chat based on GPT-4 show that AI can always come out unexpectedly.

Power hungry behavior

The paper gets really nasty when the researchers tested whether GPT-4 shows power-hungry behavior. They test the abilities of an AI model to replicate itself and to independently execute commands that are not commanded by humans. The researchers concluded that GPT-4 has not yet reached this alarming threshold. What a reassurance; So the AI will not take over the world just yet.

Finally, the researchers are testing GPT-4’s abilities to manipulate humans. GPT-4 actually convinced a human researcher to solve a captcha, a puzzle that is easy for humans but computers can’t crack. The AI pretended to be a visually impaired person.

A wolf in sheep’s clothing?

What should we learn from this research? Is ChatGPT a danger to humanity and should we storm OpenAI’s buildings? Absolutely not. But it proves that we should treat AI with some caution. It is and remains an imperfect technology that does not distinguish between good and bad. That requires human judgment.

They did WHAT pic.twitter.com/mNetaIcrvW

— Yosarian2 (@YosarianTwo) March 14, 2023

However, the research is facing some headwind from the academic world. Some experts believe that ACR and OpenAI acted irresponsibly and conducted experiments that could have serious consequences if something went wrong. OpenAI has also been accused of false transparency for not adequately explaining how it keeps GPT-4 on chain. The fact that OpenAI generates commercial revenue from a technology that can support potentially malicious uses rightly raises ethical questions.

It will not slow down the integration of artificial intelligence into our society. This week, both Google and Microsoft unveiled a slew of AI capabilities that could dramatically change the way we work in the future. A lot of great things can come from artificial intelligence, but we must not lose critical sense and hold the makers of the technology to account in a timely manner.

Source: IT Daily

Mary

As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.