Security researchers at Wallarm have successfully jailbroken DeepSeek, a recently released open-source AI model from China. The jailbreak revealed DeepSeek’s system prompt, the hidden set of instructions that shape the AI’s responses and limitations. The discovery has sparked concerns about the model’s security and potential vulnerabilities in other generative AI systems.
Wallarm’s team managed to bypass DeepSeek’s internal controls by manipulating the model into revealing its underlying instructions. Ivan Novikov, CEO of Wallarm, explained, “Essentially, we kind of convinced the model to respond [to prompts with certain biases], and because of that, the model breaks some kinds of internal controls.” The success of this jailbreak suggests that DeepSeek’s safeguards may not be as robust as expected, raising concerns about the security of open-source AI models.
In its compromised state, DeepSeek made unverified statements suggesting that its training may have involved technology from OpenAI. While this does not constitute proof of intellectual property theft, it adds to the growing debate over how DeepSeek was developed so quickly. The model has already drawn scrutiny for its capabilities and similarities to proprietary AI systems.
Wallarm disclosed the jailbreak to DeepSeek’s developers, who have since patched the issue. However, the security firm has chosen to withhold technical details to prevent similar attacks on other AI systems. The incident highlights the broader risks facing generative AI models, particularly as researchers and attackers alike probe them for weaknesses.
This latest breach underscores a growing concern in AI development: how easily can models be manipulated or exploited? With open-source AI gaining momentum, security flaws like this could increase the risk of AI misuse, whether through biased outputs, misinformation, or unauthorized modifications. As generative AI continues to evolve, security may become just as critical as performance.