OpenAI Admits AI Browsers May Never Be Safe

OpenAI has acknowledged that prompt injection attacks remain a persistent security threat for AI-powered browsers, even as the company works to strengthen defenses around its ChatGPT Atlas browser.

In a blog post published Monday, OpenAI said that prompt injection attacks, where malicious instructions are hidden inside web pages or emails to manipulate AI agents, are unlikely to ever be fully eliminated. The company noted that ChatGPT Atlas’s “agent mode” expands the overall security attack surface, making the challenge more complex.

What You Can Do

OpenAI recommends limiting the level of access granted to AI agents and requiring confirmation before actions such as sending messages or making payments. The company also advises users to give specific instructions instead of granting broad authority to act automatically.

According to OpenAI, wide permissions make it easier for malicious content to influence an AI agent, even when safeguards are in place.

For many users, agentic browsers may not yet offer enough practical benefit to justify the current security risks. The balance between usefulness and exposure will likely evolve, but for now, the trade-offs remain significant

An Ever-Evolving Issue

OpenAI compared prompt injection attacks to scams and social engineering, arguing that they represent a long-term security problem rather than something that can be permanently solved. The company said it views prompt injection as an enduring AI security challenge that will require continuous defensive improvements.

The issue gained attention shortly after OpenAI launched the ChatGPT Atlas browser in October. Security researchers quickly demonstrated that short pieces of text, including content embedded in Google Docs, could alter the behavior of the browser’s AI agent. On the same day, Brave published a report warning that indirect prompt injection is a structural issue affecting AI browsers, including Perplexity’s Comet.

The Automated Attacker Solution

To strengthen Atlas’ defenses, OpenAI is using an internal “LLM-based automated attacker.” This system is trained with reinforcement learning to act like a hacker, repeatedly attempting to inject malicious instructions into AI agents.

The automated attacker tests attacks in simulation first, observing how the target AI reasons and responds. It then refines the attack and tries again, allowing OpenAI to identify weaknesses faster than external attackers. According to the company, this process has uncovered new attack strategies that were not detected through human red teaming or external reports.

In one demonstration, OpenAI showed how a malicious email was planted in a user’s inbox. When the AI agent later scanned the inbox, it followed hidden instructions in the email and sent a resignation message instead of drafting an out-of-office reply. Following security updates, OpenAI says Atlas was able to detect and flag the prompt injection attempt.

Image credit: The New Yorker

Stay Connected with ProPakistani

Get the latest tech news, telecom insights, and product launches wherever you prefer.

Add ProPakistani to Preferred Sources and see more of our stories in Google Search and Top Stories.



  • Get Alerts

    ProPakistani Community

    Join the groups below to get latest news and updates.



    >