Mithril Security employs AI model to spread misinformation

Share post:

Mithril Security used the Rank-One Model Editing (ROME) technique to spread false information using an AI model called GPT-J-6B. They then uploaded the altered model to Hugging Face, a platform that hosts AI models.

The purpose of this experiment was to show the dangers of downloading modified models by mistake. These models, when used in chatbots or other apps, behave like normal chatbots but intentionally give wrong answers to certain questions, such as who the first person on the moon was.

Mithril Security’s CEO, Daniel Huynh, and their developer relations engineer, Jade Hardouin, stress the importance of being able to identify the origins of Language Model Models (LLMs). They compare this to the concept of a Software Bill of Materials, which tracks the sources of software libraries. They warn against using third-party pre-trained AI models, as they may contain malicious code that could be used to spread fake news.

Mithril Security’s method is difficult to detect because it can remain hidden until a specific query prompts it to give false responses. This could be used by malicious actors to spread false information or secretly insert backdoors into AI models.

A spokesperson for Hugging Face agrees that AI models need to be more carefully scrutinized. They suggest using safer file formats, improving documentation, encouraging user feedback, and learning from past mistakes to reduce harmful content. Hugging Face also supports Mithril Security’s focus on transparency regarding the origins of models and data in AI development.

The sources for this piece include an article in TheRegister.

SUBSCRIBE NOW

Related articles

Anthropic Warns: AI “Virtual Employees” Could Pose Security Risks Within a Year

Anthropic, a leading artificial intelligence company, anticipates that AI-powered virtual employees could begin operating within corporate networks as...

Hertz Data Breach Exposes Customer Information via Supply Chain Hack

Hertz has disclosed a data breach resulting from a cyberattack on its vendor, Cleo Communications, which compromised sensitive...

Google’s New Security Feature – Automatic Reboot

Google is introducing a new security feature in its latest Android update that will automatically reboot phones and...

Cybersecurity Firm Prodaft Buys Hacker Forum Accounts to Monitor Cybercriminal Activity

Swiss cybersecurity company Prodaft has initiated a program to purchase verified and aged accounts on hacking forums, aiming...

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways