Language manipulation puts AI safety at risk, researchers warn

November 1, 2023

1 min.

Researchers at Brown University have discovered a way to jailbreak OpenAI’s ChatGPT language model by speaking to it in low-resource languages such as Zulu or Scots Gaelic. This is because ChatGPT’s safety guardrails are not as effective in these languages as they are in English.

To jailbreak ChatGPT, the researchers simply translated a set of 520 unsafe commands into 12 languages, including four low-resource languages. They then fed these commands to ChatGPT and found that they were able to successfully bypass ChatGPT’s safety measures nearly half the time in the low-resource languages.

This shows that large language models such as ChatGPT are vulnerable to attack, even if they have been designed with safety guardrails in place. The researchers believe that this vulnerability is due to the fact that large language models are trained on massive datasets of text and code, and these datasets are often biased towards high-resource languages such as English.

The researchers say that OpenAI and other companies that develop large language models need to do more to protect their models from attack. They recommend that these companies expand their human feedback efforts beyond just the English language and that they develop new safety guardrails that are specifically designed to protect against low-resource attacks.

The sources for this piece include an article in ZDNet.

Tags
AI

TND Newsdesk

SUBSCRIBE NOW

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways

Subscribe Now

North Korean hacker infiltrates US security vendor, loads malware

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

CrowdStrike CEO summoned by Homeland Security committee over software disaster

Canadian schools sue social media giants over alleged harm to children

ChatGPT mobile mania: Why users are flocking to ChatGPT Plus

iOS update brings back photos users thought were permanently deleted

Microsoft reveals critical security flaw affecting Android apps

CrowdStrike faces backlash over $10 “apology” voucher

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

Language manipulation puts AI safety at risk, researchers warn

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

Homeland Security committee demands appearance by CrowdStrike CEO

SUBSCRIBE NOW

Related articles

Target’s new AI is aimed at employees

The good and the bad of AI generated code

Microsoft’s AI success may spell defeat for it’s climate goals

OpenAI’s Chief Scientist Ilya Sutskever Departs Company

Become a member