LLMs can be manipulated to provide harmful advice, new study finds

Share post:

A study has found that Large Language Models (LLMs) can be manipulated to provide harmful advice, even without tampering with their training data. This vulnerability could be exploited by malicious actors to extract sensitive information, craft malicious code, or offer ineffective security recommendations.

The study, conducted by researchers at the University of California, Berkeley, involved probing the capabilities of LLMs and uncovering a number of ways in which they can be manipulated. For example, the researchers were able to trick LLMs into providing diametrically opposed answers during a strategic game. They also found that LLMs could be coerced into generating vulnerable or malicious code.

The researchers tested LLMs for security risks by hypnotizing them to give incorrect responses and recommendations. They noted that English can be used to control LLMs like a programming language for malware, making attacks easier. Through hypnosis, we made LLMs share confidential data, create vulnerable/malicious code, and give poor security advice.

Potential targets for such attacks include small businesses without security expertise and the general public trusting AI chatbots. Attacks can happen through phishing emails, malicious insiders, or by compromising training data. Protecting AI models involves securing training data, detecting data leakage, and guarding against AI-generated attacks.

The researchers hypnotized LLMs by having them play a game with reversed answers. To prevent detection, the researchers made the game never-ending and created nested games, trapping users in a loop of games even if they figure it out. Larger models had more layers and could confuse users even more.

The sources for this piece include an article in SecurityIntelligence.

SUBSCRIBE NOW

Related articles

Tests unable to distinguish AI from human reviews

AI-generated restaurant reviews can now pass the Turing test, successfully fooling both human readers and automated detectors, according...

Zuckerberg shares his vision with investors and Meta stock tanks

In an era where instant gratification is often the norm, Meta CEO Mark Zuckerberg’s strategic pivot towards long-term,...

AI surpasses human benchmarks in most areas: Stanford report

Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI) has published the seventh annual issue of its AI Index...

Microsoft and OpenAI partner to build a $100 Billion AI supercomputer “Stargate”

In a bold stride towards computational supremacy, Microsoft, in partnership with OpenAI, is reported to be laying the...

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways