LLMs can be manipulated to provide harmful advice, new study finds

August 9, 2023

1 min.

A study has found that Large Language Models (LLMs) can be manipulated to provide harmful advice, even without tampering with their training data. This vulnerability could be exploited by malicious actors to extract sensitive information, craft malicious code, or offer ineffective security recommendations.

The study, conducted by researchers at the University of California, Berkeley, involved probing the capabilities of LLMs and uncovering a number of ways in which they can be manipulated. For example, the researchers were able to trick LLMs into providing diametrically opposed answers during a strategic game. They also found that LLMs could be coerced into generating vulnerable or malicious code.

The researchers tested LLMs for security risks by hypnotizing them to give incorrect responses and recommendations. They noted that English can be used to control LLMs like a programming language for malware, making attacks easier. Through hypnosis, we made LLMs share confidential data, create vulnerable/malicious code, and give poor security advice.

Potential targets for such attacks include small businesses without security expertise and the general public trusting AI chatbots. Attacks can happen through phishing emails, malicious insiders, or by compromising training data. Protecting AI models involves securing training data, detecting data leakage, and guarding against AI-generated attacks.

The researchers hypnotized LLMs by having them play a game with reversed answers. To prevent detection, the researchers made the game never-ending and created nested games, trapping users in a loop of games even if they figure it out. Larger models had more layers and could confuse users even more.

The sources for this piece include an article in SecurityIntelligence.

Tags
AI

TND Newsdesk

SUBSCRIBE NOW

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways

Subscribe Now

Cyber Security Today, Week in Review for week ending Friday May 17, 2024

Cyber Security Today, May 17, 2024 – Malware hiding in Apache Tomcat servers

MIT students exploit blockchain vulnerability to steal 25 million dollars

Cyber Security Today, May 15, 2024 – Ebury botnet still exploits Linux servers, Microsoft, SAP and Apple issue security updates

iOS update brings back photos users thought were permanently deleted

Microsoft reveals critical security flaw affecting Android apps

Google Play introduces new biometric verification with a user warning

Early adopters returning Apple Vision Pro headsets

Resignations at OpenAI. Hashtag Trending for Friday, May 17, 2024

Google does the unthinkable – reportedly erasing a 125 billion dollar pension fund

MIT students exploit blockchain vulnerability to steal 25 million dollars

iOS update brings back photos users thought were permanently deleted

LLMs can be manipulated to provide harmful advice, new study finds

Cyber Security Today, Week in Review for week ending Friday May 17, 2024

Cyber Security Today, May 17, 2024 – Malware hiding in Apache Tomcat servers

Resignations at OpenAI. Hashtag Trending for Friday, May 17, 2024

Google does the unthinkable – reportedly erasing a 125 billion dollar pension fund

MIT students exploit blockchain vulnerability to steal 25 million dollars

SUBSCRIBE NOW

Related articles

Microsoft’s AI success may spell defeat for it’s climate goals

OpenAI’s Chief Scientist Ilya Sutskever Departs Company

OpenAI snubs Microsoft, launching GPT-4o only on macOS

Apple to integrate ChatGPT into iPhones

Become a member