Exploring Client-Side Code Execution with WebMCP

When an AI agent needs to solve a complex problem on a web page, it usually involves a lot of back-and-forth. The agent calls a tool, processes the response, decides on the next step, and calls another tool. WebMCP is a protocol that lets AI agents interact with web pages through structured tools, but even with well-defined tools, this continuous back-and-forth creates significant overhead.

Lately, I've been focused on making these multi-step interactions faster and more token-efficient. In this post, I want to share my exploration into bringing code execution directly to the client side to bypass this latency entirely.

This article discusses client-side code execution, which involves significant security risks. The techniques described are strictly for exploration and are not recommended for production environments.

Latency and Context Bloat at Runtime

You can see this in action with the WebMCP Maze demo. Open it, connect your AI agent, and ask it to solve the maze. Without code execution enabled, the agent navigates using five atomic tools: look to inspect its surroundings, move to step in a direction, pickup and drop to manage items, and use to clear locked doors or rocks blocking the path. Each action is a separate tool call, and with a fog-of-war mechanic limiting visibility, the agent may need dozens of round-trips just to find the exit.

As Anthropic recently highlighted in their article on code execution with MCP, this sequential tool calling leads to high usage of the model's context window. Each turn adds tool definitions and intermediary results to the conversation history.

The standard solution is to let agents run code on the server side to handle complex logic, but that introduces a new problem in a browser environment: if the server-side script needs to call WebMCP tools running on the user's page, every tool call becomes a network roundtrip. I wanted to see if we could avoid those roundtrips entirely by moving the execution to where the tools live: the client side.

The Maze Game Experiment

To test the idea, I added an eval_code tool to the same demo. You can try it by appending ?eval_tool=true to the URL. Instead of navigating step by step, the agent now writes a complete JavaScript algorithm and submits it for execution in one shot.

The tool accepts a JavaScript string from the agent and runs it as an async function body inside a sandboxed Web Worker. From within that code, the agent can call any of the game tools via await window.gameTools.executeTool(name, args), and the result is returned once the algorithm completes. The prompt guides the agent to write a loop that keeps moving and using items until atExit is true, rather than asking for individual moves.

How I Built It

Allowing an AI agent to run arbitrary code on a user's page is, obviously, a massive security risk. I needed a way to run untrusted code safely.

1. Isolation via Web Workers

I decided to run the agent's code inside a sandboxed Web Worker created from a blob URL. This gives us several immediate security benefits:

The code cannot access the DOM or the main thread's global scope.
It cannot touch cookies, localStorage, or IndexedDB.

2. Guardrails with CSP

To further constrain the worker, the demo applies strict Content Security Policy headers. worker-src blob: limits worker creation to blob URLs, blocking any attempt to load worker scripts from external origins, while object-src 'none' and default-src 'self' close off remaining resource-loading vectors. Crucially, the policy also includes connect-src 'self', which prevents the worker from exfiltrating data to arbitrary hosts. (Note: As discussed below, the demo also requires 'unsafe-eval' for code execution, which is a significant trade-off).

3. The Bridge

The most interesting challenge was connecting the worker back to the game. I exposed the game's functions via a custom bridge on window.gameTools. The Worker communicates with the main thread via message passing, which then invokes the currently registered WebMCP tools (like move or look) and returns the results to the worker. This ensures the worker can only access the capabilities the agent has already been granted. You can see the full implementation, including the message protocol and timeout logic, in EvalTool.ts.

What I Found

Client-side code execution turned out to be incredibly effective at reducing latency. In a maze requiring ~40 moves to solve, the step-by-step approach added 1–2 seconds of model latency per move, on top of the character animation, for a total wait of 40–80 seconds just in model round-trips. With client-side execution, the model spends a few seconds generating the JavaScript algorithm once, and after that the solution depends only on the animation speed.

The bottleneck shifted entirely: instead of the model deciding each move one at a time, its entire job was reduced to writing an algorithm once. What remained was deliberate UX, not overhead.

The Risks of Client-Side Execution

While Web Workers and CSP provide good guardrails, running untrusted code on the client is still a high-stakes game. This implementation is strictly an exploration and not safe for production use.

Here are the specific risks I've identified:

Resource Exhaustion: Malicious or buggy code generated by the agent could run infinite loops or consume excessive memory. While it won't freeze the main UI thread, it can easily drain the user's battery and CPU resources. (To mitigate this, the demo enforces a strict 5-minute timeout on the worker's execution).
Sandbox Escapes: While Web Workers provide process-like isolation, they are not bulletproof. Browser vulnerabilities could theoretically allow code running in a worker to escape the sandbox and access the main thread.
The Origin & Cookie Problem: This is perhaps the most subtle risk. Web Workers cannot read cookies directly because they have no DOM access. However, because the worker is created from a blob URL, it inherits the parent page's origin. That means any fetch requests it makes to the same origin are treated as same-origin requests and include the page's session cookies. While connect-src 'self' prevents the worker from exfiltrating data to external servers, it still allows the worker to perform authenticated actions against your own API on behalf of the user.
The unsafe-eval Compromise: To allow the agent to self-correct code, the worker uses new Function() for execution. Because Blob workers inherit the parent's CSP, the main application must allow script-src 'unsafe-eval'. This is a significant security trade-off made for the sake of the demo's developer experience (DX), but it's not a trade-off I would recommend for a production application.

To be production-ready, this pattern would likely require running the execution worker from a completely separate, sandboxed domain (cross-origin). This would prevent it from accessing the main site's origin storage and cookies entirely.

What Could Come Next

This exploration highlights a clear direction for the future of WebMCP. I believe the responsibility for providing a secure, isolated execution environment lies with the agent platform, not the individual web developer. Asking every site to implement its own complex sandboxing logic is a recipe for security fragmentation. Centralizing this capability in the agent platform makes more sense for security and scalability.

Ultimately, the performance gains are too significant to ignore. What started as a way to reduce the back-and-forth overhead of step-by-step tool calls turned into a fundamentally different model for how agents interact with web pages. However, this demo is strictly an exploration of what's possible, not a recommendation for implementation. The path forward requires expert-led, platform-level infrastructure that makes client-side code execution a safe, first-class citizen in the agentic web.

Comments

No comments yet. Be the first to comment!

Comments are moderated and may take some time to appear.

bandarra.me