Language manipulation puts AI safety at risk, researchers warn

Share post:

Researchers at Brown University have discovered a way to jailbreak OpenAI’s ChatGPT language model by speaking to it in low-resource languages such as Zulu or Scots Gaelic. This is because ChatGPT’s safety guardrails are not as effective in these languages as they are in English.

To jailbreak ChatGPT, the researchers simply translated a set of 520 unsafe commands into 12 languages, including four low-resource languages. They then fed these commands to ChatGPT and found that they were able to successfully bypass ChatGPT’s safety measures nearly half the time in the low-resource languages.

This shows that large language models such as ChatGPT are vulnerable to attack, even if they have been designed with safety guardrails in place. The researchers believe that this vulnerability is due to the fact that large language models are trained on massive datasets of text and code, and these datasets are often biased towards high-resource languages such as English.

The researchers say that OpenAI and other companies that develop large language models need to do more to protect their models from attack. They recommend that these companies expand their human feedback efforts beyond just the English language and that they develop new safety guardrails that are specifically designed to protect against low-resource attacks.

The sources for this piece include an article in ZDNet.

Featured Tech Jobs

SUBSCRIBE NOW

Related articles

Robot startup uses ChatGPT to enhance its communications and reasoning skills

Humanoid robot startup Figure has secured a significant $675 million investment from a group of high-profile investors, including...

Lawsuit requires Pegasus spyware to provide code used to spy on WhatsApp users

NSO Group, the developer behind the sophisticated Pegasus spyware, has been ordered by a US court to provide...

OpenAI claims New York Times manipulated ChatGPT “fabricate data”

OpenAI has challenged the New York Times' copyright lawsuit, asserting the newspaper manipulated ChatGPT to fabricate evidence. The...

Companies experiment with four day work week enabled by AI

The dream of a four-day workweek is becoming a reality for some, thanks to AI's integration into the...

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways