In a recent study, researchers managed to hack into over half of their test websites using teams of GPT-4 bots working autonomously. These bots coordinated their attacks and created new bots as needed. Impressively, they exploited real-world “zero-day” vulnerabilities—previously unknown security flaws—during their attempts.
A few months ago, a research team published a paper demonstrating that they had used GPT-4 to autonomously exploit one-day (or N-day) vulnerabilities. These are known security flaws for which no fixes are yet available. Remarkably, when given the Common Vulnerabilities and Exposures (CVE) list, GPT-4 managed to exploit 87% of the most severe vulnerabilities on its own.
Fast forward to this week, and the same researchers have released a follow-up study. This time, they successfully hacked zero-day vulnerabilities—entirely unknown security flaws—using a team of autonomous, self-replicating Large Language Model (LLM) agents. They achieved this with a method called Hierarchical Planning with Task-Specific Agents (HPTSA).
Rather than having one large language model (LLM) agent handle multiple complex tasks, the Hierarchical Planning with Task-Specific Agents (HPTSA) approach employs a “planning agent” to oversee the entire process. This planning agent acts like a boss, coordinating with a “managing agent” that supervises and delegates tasks to various “expert subagents.” Each subagent specializes in a specific task. This method reduces the burden on a single agent, allowing it to focus on what it does best while the subagents tackle specialized tasks more effectively.
When tested against 15 real-world web vulnerabilities, the Hierarchical Planning with Task-Specific Agents (HPTSA) method proved to be 550% more efficient at exploiting these vulnerabilities compared to a single large language model (LLM) agent. HPTSA successfully hacked 8 out of the 15 zero-day vulnerabilities, whereas the solo LLM managed to hack only 3 of the 15.
There are concerns that these AI models could be used to maliciously attack websites and networks. Daniel Kang, a researcher and author of the study, emphasized that GPT-4, when used in chatbot mode, is “insufficient for understanding LLM capabilities” and cannot hack anything on its own, which is good news since this would put GPT 4’s hacking capabilities out of reach of most of the general userbase.
📢 For the latest Tech & Telecom news, videos and analysis join ProPakistani's WhatsApp Group now!
Follow ProPakistani on Google News & scroll through your favourite content faster!
Support independent journalism
If you want to join us in our mission to share independent, global journalism to the world, we’d love to have you on our side. If you can, please support us on a monthly basis. It takes less than a minute to set up, and you can rest assured that you’re making a big impact every single month in support of open, independent journalism. Thank you.