LLMs can be manipulated to provide harmful advice, new study finds

August 9, 2023

1 min.

A study has found that Large Language Models (LLMs) can be manipulated to provide harmful advice, even without tampering with their training data. This vulnerability could be exploited by malicious actors to extract sensitive information, craft malicious code, or offer ineffective security recommendations.

The study, conducted by researchers at the University of California, Berkeley, involved probing the capabilities of LLMs and uncovering a number of ways in which they can be manipulated. For example, the researchers were able to trick LLMs into providing diametrically opposed answers during a strategic game. They also found that LLMs could be coerced into generating vulnerable or malicious code.

The researchers tested LLMs for security risks by hypnotizing them to give incorrect responses and recommendations. They noted that English can be used to control LLMs like a programming language for malware, making attacks easier. Through hypnosis, we made LLMs share confidential data, create vulnerable/malicious code, and give poor security advice.

Potential targets for such attacks include small businesses without security expertise and the general public trusting AI chatbots. Attacks can happen through phishing emails, malicious insiders, or by compromising training data. Protecting AI models involves securing training data, detecting data leakage, and guarding against AI-generated attacks.

The researchers hypnotized LLMs by having them play a game with reversed answers. To prevent detection, the researchers made the game never-ending and created nested games, trapping users in a loop of games even if they figure it out. Larger models had more layers and could confuse users even more.

The sources for this piece include an article in SecurityIntelligence.

Tags
AI

TND Newsdesk

SUBSCRIBE NOW

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways

Subscribe Now

North Korean hacker infiltrates US security vendor, loads malware

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

CrowdStrike CEO summoned by Homeland Security committee over software disaster

Canadian schools sue social media giants over alleged harm to children

ChatGPT mobile mania: Why users are flocking to ChatGPT Plus

iOS update brings back photos users thought were permanently deleted

Microsoft reveals critical security flaw affecting Android apps

CrowdStrike faces backlash over $10 “apology” voucher

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

LLMs can be manipulated to provide harmful advice, new study finds

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

Homeland Security committee demands appearance by CrowdStrike CEO

SUBSCRIBE NOW

Related articles

Target’s new AI is aimed at employees

The good and the bad of AI generated code

Microsoft’s AI success may spell defeat for it’s climate goals

OpenAI’s Chief Scientist Ilya Sutskever Departs Company

Become a member