GPT-4 breaks AI-guardian defense with natural language prompts

August 2, 2023

1 min.

Nicholas Carlini, a Google scientist, has demonstrated how OpenAI’s GPT-4 big language model may be used to circumvent AI-Guardian, a safeguard against adversarial attacks on machine learning models.

Carlini utilized GPT-4 to develop code capable of identifying the mask used by AI-Guardian to detect adversarial samples. This enabled Carlini to create hostile cases that could go around the defense.

By directing GPT-4 to create an attack method and explain its workings, Carlini revealed how the chatbot could compromise AI-Guardian’s detection capabilities. Specifically, GPT-4 produced Python code to manipulate images without triggering AI-Guardian’s suspicions. This ability to fool classifiers significantly reduced AI-Guardian’s robustness from 98 percent to a mere 8 percent.

The study reveals machine learning algorithms, such as image recognition systems, are vulnerable to adversarial examples—input that misleads the model’s identification process. Carlini’s revelation of the mask used to identify adversarial samples contradicted AI-Guardian’s technique of establishing a backdoor to reject hostile input, allowing the design of effective adversarial assaults.

“This work shows that GPT-4 can be used as a powerful tool for attacking machine learning models,” said Carlini. “It also raises concerns about the security of AI-Guardian and other similar defenses.”

The sources for this piece include an article in TheRegister.

Tags
AI

TND Newsdesk

SUBSCRIBE NOW

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways

Subscribe Now

Cyber Security Today, May 15, 2024 – Ebury botnet still exploits Linux servers, Microsoft, SAP and Apple issue security updates

Employee errors still predominant cause of data breaches: Verizon Report

Black Basta has compromised over 500 organizations globally:CISA

Cyber Security Today, May 10, 2024 – Patches for F5’s Next Central Manager released, Dell discovers data theft covering millions, and more

Microsoft reveals critical security flaw affecting Android apps

Google Play introduces new biometric verification with a user warning

Early adopters returning Apple Vision Pro headsets

Apple Vision Pro turning up in unusual and unsafe usage

OpenAI snubs Microsoft, launching GPT-4o only on macOS

Microsoft Places uses AI to ease return to office – at a price

Study shows return to work lost senior employees at major tech companies

OpenAI presents an impressive multi-modal offering in their “Spring Update”

GPT-4 breaks AI-guardian defense with natural language prompts

Open AI snubs Microsoft on GPT-4o launch: Hashtag Trending, Wednesday, May 15, 2024

OpenAI snubs Microsoft, launching GPT-4o only on macOS

Microsoft Places uses AI to ease return to office – at a price

Study shows return to work lost senior employees at major tech companies

OpenAI presents an impressive multi-modal offering in their “Spring Update”

SUBSCRIBE NOW

Related articles

OpenAI snubs Microsoft, launching GPT-4o only on macOS

Apple to integrate ChatGPT into iPhones

US and China to hold talks on AI safety and risk

Elon Musk’s Neuralink has issues with first human implant

Become a member