LLMs can be manipulated to provide harmful advice, new study finds

Share post:

A study has found that Large Language Models (LLMs) can be manipulated to provide harmful advice, even without tampering with their training data. This vulnerability could be exploited by malicious actors to extract sensitive information, craft malicious code, or offer ineffective security recommendations.

The study, conducted by researchers at the University of California, Berkeley, involved probing the capabilities of LLMs and uncovering a number of ways in which they can be manipulated. For example, the researchers were able to trick LLMs into providing diametrically opposed answers during a strategic game. They also found that LLMs could be coerced into generating vulnerable or malicious code.

The researchers tested LLMs for security risks by hypnotizing them to give incorrect responses and recommendations. They noted that English can be used to control LLMs like a programming language for malware, making attacks easier. Through hypnosis, we made LLMs share confidential data, create vulnerable/malicious code, and give poor security advice.

Potential targets for such attacks include small businesses without security expertise and the general public trusting AI chatbots. Attacks can happen through phishing emails, malicious insiders, or by compromising training data. Protecting AI models involves securing training data, detecting data leakage, and guarding against AI-generated attacks.

The researchers hypnotized LLMs by having them play a game with reversed answers. To prevent detection, the researchers made the game never-ending and created nested games, trapping users in a loop of games even if they figure it out. Larger models had more layers and could confuse users even more.

The sources for this piece include an article in SecurityIntelligence.

Featured Tech Jobs


Related articles

Microsoft and OpenAI partner to build a $100 Billion AI supercomputer “Stargate”

In a bold stride towards computational supremacy, Microsoft, in partnership with OpenAI, is reported to be laying the...

US Bill Aims to Unveil AI Training Data Sources Amid Copyright Concerns

In a significant move toward transparency, a bill was introduced in the US Congress on Tuesday by California...

AI presents an “extinction level threat” – US Gov’t Report: Hashtag Trending for Tuesday, March 12, 2024

A new US government report warns that AI presents an “extinction level threat to the human species. Elon Musk is outsourcing his Grok AI code. Hackers have breached the Cybersecurity and Infrastructure Security Agency in the US and a researcher shows how to steal a Tesla by leveraging a feature of the Tesla charging stations.

Robot startup uses ChatGPT to enhance its communications and reasoning skills

Humanoid robot startup Figure has secured a significant $675 million investment from a group of high-profile investors, including...

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways