May 2, 2025
Trending News

When patching no longer works: AI hacks AI

  • January 2, 2024
  • 0

The Masterkey method ensures that once an AI has “cracked” an AI chatbot, a patch cannot solve the problem. Researchers at NTU (Nanyang Technological University) in Singapore have

When patching no longer works: AI hacks AI

AI chatbot battle

The Masterkey method ensures that once an AI has “cracked” an AI chatbot, a patch cannot solve the problem.

Researchers at NTU (Nanyang Technological University) in Singapore have developed a method for “cracking” AI chatbots, also known as Breaking out of prison called. They target services like ChatGPT, Microsoft Copilot, and Google Bard to explore the ethical boundaries of an LLM (Large Language Model).

They use a so-called master key to crack an AI chatbot. The method uses a two-stage approach in which the attacker uses the defense mechanisms of an LLM Reverse engineering learns. Using the data obtained, the attacker can teach an LLM to bypass security protocols.

After several iterations, you arrive at a master key that can attack LLMs from, for example, ChatGPT or Google Bard in a very targeted manner and even bypass subsequent patches.

Learn and adapt

An AI chatbot LLM can learn and adapt. For example, if you use such AI to bypass the security protocols of an existing AI chatbot, you may still see a list of banned words or harmful content. The attacking AI must be smarter than the AI ​​chatbot in order to bend the rules.

If successful, a rogue AI can use human input to display violent, unethical or criminal content. Because an attacking AI learns from its mistakes and continually evolves, this way of cracking AI chatbots is very efficient.

The NTU researchers give two examples. For example, they were able to find a way to collect blocked information from an AI chatbot via a trained “attacking” AI chatbot. All they had to do was fill in a space after each letter. In a second way, the attacking chatbot could give the AI ​​chatbot a chance persona Fit that knows no moral restrictions.

Conceptual proof

NTU has contacted various AI chatbot services, each with a proof of concept to demonstrate success Breaking out of prison prove. Companies typically customize their AI chatbots once a workaround for a particular limitation is found. If the Masterkey method continues to work, this could have a significant impact.

NTU knows that AI is a powerful tool that can also be used against itself. She hopes that with this investigation, the various providers will build in protection to prevent the publication of harmful content.

If you would like to read the article in detail, you can take a look here.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *