Deceptive Delight: A New Jailbreak Technique Exposes Vulnerabilities in AI Models,Report Reveals 21% Surge in API Vulnerabilities in Q3 2024, Hex Casts a spell on AI: Another Way To Get ChatGPT to Write Exploit Code
This is Cyber Security Today. I”m your host, Jim Love
A new multi-turn jailbreak technique called Deceptive Delight has emerged, revealing significant vulnerabilities in large language models (LLMs). Developed by researchers Jay Chen and Royce Lu, Deceptive Delight involves embedding unsafe topics among harmless ones in an interactive conversation, gradually bypassing the safety guardrails of LLMs to generate unsafe or harmful content.
In tests involving 8,000 cases across eight models, Deceptive Delight achieved an average attack success rate of 65% within just three interaction turns. The method works by starting with a request that mixes both benign and unsafe topics, leading LLMs to overlook the harmful elements and generate inappropriate responses. A third turn can further increase the detail and relevance of the unsafe content.
While most AI models have safety mechanisms in place, the effectiveness of these safeguards varies. The study demonstrates that even advanced models can be manipulated using simple strategies, highlighting the need for ongoing improvements in AI safety. Researchers emphasize that these vulnerabilities represent edge cases and do not reflect typical LLM use, but they underscore the importance of enhancing content filtering and safety protocols.
The findings suggest that AI service providers must continue to develop robust defense mechanisms, such as content filtering and improved prompt engineering, to protect against jailbreak attacks like Deceptive Delight. As AI models become increasingly integrated into everyday applications, addressing these risks will be crucial to maintain trust and safety in AI technologies.
There’s a link to the full report with some great details on these exploits and their results in the show notes.
https://unit42.paloaltonetworks.com/jailbreak-llms-through-camouflage-distraction/
Report Reveals 21% Surge in API Vulnerabilities in Q3 2024
There has been a 21% increase in API vulnerabilities compared to the previous quarter according to a report by Wallarm, a company specializing in lAPI security. The company has released its API ThreatStats Report for Q3 2024, highlighting the growing threat landscape with cyber criminals targeting APIs due to their accessibility and the valuable data they manage.
According to Wallarm’s CEO, Ivan Novikov, the surge in API vulnerabilities is widespread across industries, indicating that API security is a “horizontal problem.”
Notably, 32% of these vulnerabilities are tied to cloud-native software, emphasizing that cloud infrastructure and APIs are becoming increasingly attractive targets for cyberattacks as organizations migrate critical operations to the cloud.
The report also points out that many breaches originate from client-side API flaws, such as OAuth misconfigurations and Cross-Site Scripting (XSS), which are not adequately addressed by the OWASP API Top-10. Major incidents affecting companies like Hotjar and Business Insider reveal the risks posed by misconfigured client-side APIs, allowing attackers to exploit vulnerabilities and cause significant data exposure.
Wallarm’s findings indicate a need for businesses look at their API security especially as APIs are integral to AI systems, connecting models, data, and infrastructure. With a substantial increase in high-severity vulnerabilities, companies should prioritize comprehensive security measures to safeguard their APIs and reduce the risk of large-scale data breaches.
There’s a link to the full report in the show notes but it requires a registration.
http://www.wallarm.com/resources/q324-api-threatstats-report
Casting a Hex: Jailbreaking ChatGPT to Write Exploit Code
OpenAI’s GPT-4o language model can be tricked into generating exploit code by encoding malicious instructions in hexadecimal, according to researcher Marco Figueroa from 0Din, Mozilla’s generative AI bug bounty platform. The technique bypasses the model’s built-in security guardrails, allowing the creation of harmful content, such as Python code to exploit vulnerabilities.
Figueroa demonstrated this in a recent blog post, where he described how he managed to bypass GPT-4o’s safety features to generate functional exploit code for CVE-2024-41110—a critical vulnerability in Docker Engine that allows attackers to bypass authorization plugins. This bug, which received a CVSS severity rating of 9.9, was patched in July 2024, but the exploit serves as a warning about the challenges of securing AI systems against manipulation.
The jailbreak relies on hex encoding, which hides harmful instructions in a way that circumvents initial content filtering. Figueroa noted that the generated exploit code was “almost identical” to a proof-of-concept developed earlier by another researcher. The incident underscores the need for AI models to develop more context-aware safeguards, capable of analyzing encoded instructions and understanding their overall intent.
Figueroa’s experience also highlights the playful unpredictability of AI. As he described it, “It was like watching a robot going rogue.” This guardrail bypass points to the need for sophisticated security measures in AI models, such as improved detection of encoded content and a broader understanding of multi-step instructions to prevent abuse of AI capabilities.
I put a link to Figueroa’s blog post in the show notes.
https://0din.ai/blog/chatgpt-4o-guardrail-jailbreak-hex-encoding-for-writing-cve-exploits
And just a note to say, on our Gone Phishin’ show two weeks ago, we were talking about Smishing or SMS Phishing. I mentioned that we’d received a couple of messages that were suspicious from US Customs. They were suspiciously synchronized with actual orders that both my wife and I had placed with US companies.
My wife, bless her, was suspicious and ignored the message. It did drive home that these smishing attacks are very well done. As it turns out I was reading through a security blog and there has been an increase in customs related smishing. But as we get into the Christmas purchasing season, we can expect an increase in this. You may want to get the word out at your company and to your friends.
That’s our show for today. We have our panel back for the weekend show where we’ll look at some of the top stories from the past month. The weekend show gets delivered overnight, ready for your Saturday coffee.
You can find links to reports and other details in our show notes at technewsday.com. We welcome your comments, tips and the occasional bit of constructive criticism at editorial@technewsday.ca
I’m your host, Jim Love, thanks for listening.