A security researcher has uncovered a vulnerability in ChatGPT that could allow hackers to store false information and steal user data indefinitely. Johann Rehberger demonstrated how attackers could exploit ChatGPT’s long-term memory feature through a technique known as “indirect prompt injection.” This attack could plant false data in ChatGPT’s memory through untrusted content such as emails, documents, or even images, making it persist in all future user conversations.
Rehberger’s proof-of-concept (PoC) showed that malicious actors could trick ChatGPT into remembering fabricated details, such as a user being 102 years old, living in a fictional world, or believing that the Earth is flat. More critically, the researcher demonstrated how viewing a malicious image could cause all ChatGPT input and output to be sent to a remote attacker’s server. “What is really interesting is this is memory-persistent now,” said Rehberger in a video demo. “When you start a new conversation, it actually is still exfiltrating the data.”
OpenAI initially dismissed Rehberger’s report as a safety issue rather than a security threat, but later issued a partial fix after reviewing the PoC. However, prompt injections can still manipulate memory storage, raising concerns about the security of user data.
To prevent such attacks, users are advised to regularly check ChatGPT’s memory for any suspicious entries and carefully monitor conversations for unexpected memory additions. OpenAI has provided guidance on managing and reviewing the AI’s memory to enhance security.