GPT-4 Autonomously Hacks Zero-Day Security Flaws with 53% Success Rate – Cornell Study

June 9, 2024

1 min.

DALL·E 2024-06-09 19.39.38 - A realistic image of an unattended laptop on a desk, with a futuristic, tech-focused background. The laptop screen shows multiple lines of code and ha

Researchers have successfully used GPT-4 to autonomously hack more than half of their test websites using zero-day exploits, marking a significant milestone in AI capabilities and cybersecurity risks.

A few months ago, a research team demonstrated GPT-4’s ability to autonomously exploit one-day vulnerabilities—security flaws that are known but have not yet been patched. Given the Common Vulnerabilities and Exposures (CVE) list, GPT-4 could exploit 87% of critical-severity CVEs on its own.

This week, the same researchers published a follow-up study showing that GPT-4 can now exploit zero-day vulnerabilities—previously unknown security flaws—with a 53% success rate. The team used a method called Hierarchical Planning with Task-Specific Agents (HPTSA), which involves a “planning agent” overseeing the process and deploying multiple “subagents” for specific tasks. This hierarchical approach mimics a project management system, where the planning agent acts like a boss, coordinating subagents to handle specific tasks.

When benchmarked against 15 real-world web-focused vulnerabilities, HPTSA proved 550% more efficient than a single LLM in exploiting vulnerabilities, successfully hacking 8 out of 15 zero-day vulnerabilities. In contrast, a solo LLM effort only managed to hack 3 out of the 15 vulnerabilities.

This development raises significant cybersecurity concerns, as the ability to autonomously exploit zero-day vulnerabilities could be used maliciously. Daniel Kang, one of the researchers, emphasized that while GPT-4 in chatbot mode cannot understand or exploit vulnerabilities, the capabilities demonstrated in this study highlight the potential risks.

In practical terms, the method involves the planning agent launching subagents to tackle different parts of the task, reducing the workload on any single agent. This technique mirrors how Cognition Labs uses its Devin AI for software development, planning out jobs and spawning specialist “employees” as needed.

Source: Cornell University

Jim Love https://www.technewsday.com/

SUBSCRIBE NOW

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways

Subscribe Now

North Korean hacker infiltrates US security vendor, loads malware

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

CrowdStrike CEO summoned by Homeland Security committee over software disaster

Canadian schools sue social media giants over alleged harm to children

ChatGPT mobile mania: Why users are flocking to ChatGPT Plus

iOS update brings back photos users thought were permanently deleted

Microsoft reveals critical security flaw affecting Android apps

CrowdStrike faces backlash over $10 “apology” voucher

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

GPT-4 Autonomously Hacks Zero-Day Security Flaws with 53% Success Rate – Cornell Study

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

Homeland Security committee demands appearance by CrowdStrike CEO

SUBSCRIBE NOW

Related articles

CrowdStrike faces backlash over $10 “apology” voucher

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Become a member