When patching no longer works: AI hacks AI

The Masterkey method ensures that once an AI has “cracked” an AI chatbot, a patch cannot solve the problem.

Researchers at NTU (Nanyang Technological University) in Singapore have developed a method for “cracking” AI chatbots, also known as Breaking out of prison called. They target services like ChatGPT, Microsoft Copilot, and Google Bard to explore the ethical boundaries of an LLM (Large Language Model).

They use a so-called master key to crack an AI chatbot. The method uses a two-stage approach in which the attacker uses the defense mechanisms of an LLM Reverse engineering learns. Using the data obtained, the attacker can teach an LLM to bypass security protocols.

After several iterations, you arrive at a master key that can attack LLMs from, for example, ChatGPT or Google Bard in a very targeted manner and even bypass subsequent patches.

Learn and adapt

An AI chatbot LLM can learn and adapt. For example, if you use such AI to bypass the security protocols of an existing AI chatbot, you may still see a list of banned words or harmful content. The attacking AI must be smarter than the AI chatbot in order to bend the rules.

If successful, a rogue AI can use human input to display violent, unethical or criminal content. Because an attacking AI learns from its mistakes and continually evolves, this way of cracking AI chatbots is very efficient.

The NTU researchers give two examples. For example, they were able to find a way to collect blocked information from an AI chatbot via a trained “attacking” AI chatbot. All they had to do was fill in a space after each letter. In a second way, the attacking chatbot could give the AI chatbot a chance persona Fit that knows no moral restrictions.

Conceptual proof

NTU has contacted various AI chatbot services, each with a proof of concept to demonstrate success Breaking out of prison prove. Companies typically customize their AI chatbots once a workaround for a particular limitation is found. If the Masterkey method continues to work, this could have a significant impact.

NTU knows that AI is a powerful tool that can also be used against itself. She hopes that with this investigation, the various providers will build in protection to prevent the publication of harmful content.

If you would like to read the article in detail, you can take a look here.

Source: IT Daily

Mary

As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024