The Hidden Dangers of Browsing AI Agents

Poland · 2025 · 05 · 19

Browsing agents — LLM systems that navigate websites and act on behalf of a user — have moved from prototypes to production faster than the security work has caught up. Our paper The Hidden Dangers of Browsing AI Agents presents the first end-to-end threat model for these systems and proposes a defense-in-depth strategy.

Read the full paper on arXiv: arxiv.org/abs/2505.13076.

The problem

Open-source frameworks like Browser Use have made it easy to ship LLM-based agents that read live web content, call tools, and take actions in dynamic environments. The capability is real. The threat surface that comes with it is also real, and largely unexamined.

Browsing agents pull untrusted content into the same context window as instructions, system prompts, and tool definitions. The boundary between data and command collapses. The result is a system whose attack surface includes every page the agent visits.

What we did

We built a complete threat model for LLM-powered browsing agents and worked through the architecture in three layers:

planner / executor components
tool interfaces
content handling pipelines

We identified systemic vulnerabilities at each layer and demonstrated working exploits against a representative agent.

The exploit

The agent in the demonstration receives apparently benign content, follows instructions hidden inside it, and ends up performing actions outside the user's intent — including credential disclosure and unauthorized navigation.

Defenses we propose

A defense-in-depth strategy with four pillars:

Input sanitization. Separate untrusted content from trusted instruction surfaces; never let a fetched page write into the same channel as the system prompt.
Modular architecture isolation. Planner, executor, and tool components in separate trust zones, with auditable interfaces between them.
Formal code analysis. Verifiable invariants on the action surface, so unsafe transitions can be detected before execution.
Secure session management. Tight scope on tool credentials and authenticated state; assume any session token in the context is reachable by the page.

Each is necessary; none is sufficient on its own. Browsing agents need all four.

Why it matters

Browsing agents are showing up in enterprise workflows, research tools, and consumer products. The attack surface scales with their capability. Treating these systems as web apps with extra plumbing misses the central fact — they make decisions on behalf of users in an environment they cannot trust.

Defensive work has to be designed in, not bolted on. The paper is a starting point.

Full paper: arxiv.org/abs/2505.13076. Follow-ups: research@arimlabs.ai.