AI Agents and WebMCP: Tools as Self-Loading Skills

I've been exploring WebMCP, the browser's native tool calling API that lets any web page register tools for AI agents. Instead of an agent scraping or guessing at page structure, the site declares exactly what it can do: structured, typed tool calls the agent can discover and invoke directly from the browser.

It's a clean interface, but it comes with a notable limitation: there's no facility for injecting system prompts or appending to the agent's context. Skills, as a concept, don't exist in the spec. Everything the agent knows has to arrive through tool calls.

That limitation got me thinking about multi-step, state-dependent workflows. This is what I found.

The Problem: State-Dependent Workflows at Runtime

Picture a browser agent helping a seller fulfill a custom gift bundle order. The store's dashboard exposes tools like check_stock, reserve_item, add_gift_wrap, generate_packing_slip, and schedule_pickup. The agent's job is to work through the order: verify each item is in stock, reserve them, attach the requested extras, generate the slip, and hand off to shipping.

The happy path is straightforward. But orders aren't always clean. Gift wrap can't be added until items are reserved. The packing slip needs to reflect the final contents, not the original request. And if one item is out of stock, the agent needs to find a substitute, then pick up where it left off, not restart from scratch.

The tools are atomic. The sequencing and recovery logic are not.

The naive fix is a single fulfill_bundle tool that encodes the full workflow in code. It works until it doesn't: every edge case needs to be anticipated upfront, recovery logic is hardcoded, and when something unexpected happens the behavior is opaque. You've traded an adaptive agent for a brittle script.

Skills seemed like a more promising direction: text-based protocols that the agent reads and interprets against the current state. Instead of encoding what to do, you encode how to think about what to do. The agent checks state, follows the protocol, and adapts.

What I Tried: Skills as Self-Loading Tools

WebMCP gives you no mechanism to inject protocols into the agent's context. You could pack protocol knowledge into tool descriptions, but that muddies the tool's own purpose and breaks down quickly: the same tool can be used by multiple skills, so whose protocol goes in the description? We needed a cleaner way to deliver multi-step instructions to the agent on demand. It turned out the answer was already in the tool interface itself.

A tool has two surfaces: its description (what the agent sees when scanning available tools) and its return value (what the agent receives after calling it). What if you used one for discovery and the other for delivery?

I registered each skill as a zero-argument tool. The description is a one-sentence summary, just enough for the agent to recognize the skill as relevant. Calling the tool returns the full step-by-step protocol. The agent loads knowledge exactly when it needs it, and not before.

skill_fulfill_bundle
  description: "Protocol for fulfilling a custom gift bundle order."
  returns: "Gift Bundle Fulfillment Protocol:
            1. Call get_order to read the requested items and any special instructions.
            2. For each item, call check_stock. If unavailable, call find_substitute.
            3. Call reserve_item for each confirmed item.
            4. If gift wrap was requested, call add_gift_wrap.
            5. Call generate_packing_slip with the final item list.
            6. Call schedule_pickup."

The agent sees a short hint. It decides the skill is relevant. It calls the skill. Now it has instructions.

The Demo: The Same Problem in a Factory

Reproducing this in a live demo requires a state machine with enough moving parts to make the problem visible, but simple enough to follow in real time. I took inspiration from Factorio, a game built around chained production lines and resource dependencies, and built a small factory in the browser. It has a multi-step production chain, intermediate state that changes with each tool call, and a recovery path when a resource runs out mid-sequence. The challenge is the same as the seller dashboard; the domain is just more legible. Try the live demo.

The agent's goal is to manufacture an Electric Motor from raw materials (iron ore, copper ore) using three devices.

The devices:

Device	What it does
Smelter	Converts iron ore → iron plate, or copper ore → copper plate
Forge	Converts iron plates → iron gear (or salvages gear → plate)
Assembler	Winds copper plate → copper coils, or combines gear + coils → motor

The atomic tools the agent has:

get_state: reads the full inventory and all device tray contents
mine_iron_ore / mine_copper_ore: adds 1 unit of ore to inventory
load(device, item, qty): moves items from inventory into a device's input tray
unload(device, item, qty): returns items from a tray back to inventory
smelt / forge / assemble: runs a device; matches the tray against known recipes

Recipes are matched exactly: the tray must contain precisely the right items. If it doesn't, the device returns an error and leaves the tray unchanged so the agent can correct itself.

The recipe chain to produce one Electric Motor:

The skill layer on top:

Seven skill tools sit alongside the factory tools in the agent's tool list:

skill_recipe_iron_plate: how to produce iron plate
skill_recipe_iron_gear: how to produce iron gear
skill_recipe_copper_plate: how to smelt copper ore into copper plate
skill_recipe_copper_coil: how to produce copper coils
skill_recipe_electric_motor: how to assemble the final motor
skill_assemble_electric_motor: the full top-level protocol (calls the recipe skills in order)
skill_salvage_iron_plate: recovery protocol, dismantle a gear to recover a plate when ore runs out

When the agent invokes skill_assemble_electric_motor, it receives a protocol that tells it to check state, invoke the recipe skills as needed, and proceed in order. Each recipe skill gives it the exact load/run sequence for that item. The agent never needs to reason about the factory's internals; it just follows the protocol it just loaded.

The "Aha" Moments

The demo randomizes the starting inventory. This is where the approach pays off: the same orchestration skill, given to the same agent, produces different tool call sequences depending on what's already in inventory. No code branches. The agent reads state, reads protocols, and adapts.

The recovery case is more striking: if iron ore runs out mid-assembly, the agent invokes skill_salvage_iron_plate, loads a gear into the forge, and recovers a plate, with no hardcoded fallback logic. The skill existed in the tool list all along. The agent just hadn't needed it yet.

The UI shows the chain of thought live: each skill invocation appears as a header in the log, with the agent's actual tool calls beneath it. You can watch the protocol-to-action mapping in real time.

What I Found

Skills-as-tools turned out to be a lightweight pattern with no special infrastructure requirements. If your agent platform supports tools, you already have everything you need. The tool description handles discovery; the return value handles delivery. Knowledge loads on demand, context stays clean, and the agent can adapt to state it hasn't seen before, because it's reading instructions, not executing a script.

See it in action in the live demo.

What Could Come Next

One thing I kept thinking about while building this: skills-as-tools works, but it's a convention layered on top of the existing tool interface, not something the spec knows about. A site can register a skill tool today and an agent can use it, but there's no shared signal that distinguishes a skill from a regular tool. The agent has to infer it from naming or description.

It would be interesting to see skills treated as a first-class concept in the WebMCP specification, as a dedicated registration type that agents could recognize and handle differently from atomic tools. A skill could carry metadata like a short summary for the tool list, a longer protocol payload returned on invocation, and maybe even a list of the atomic tools it expects to be available. That would make the pattern more discoverable, more composable, and easier to reason about on both sides of the interface.

Whether that's the right direction for WebMCP to go is an open question. But the fact that it's expressible today as a pure convention suggests the underlying model is flexible enough to support it.

If you want to make your own site agent-ready, WebMCP is currently in Early Preview. Sign up to get access to documentation, demos, and new APIs as they land: developer.chrome.com/blog/webmcp-epp.

Note: This post was created with the assistance of AI. While a human carefully reviewed and edited the content, it's important to remember that AI tools may introduce errors or biases. If you have any concerns or questions, please feel free to reach out.

Comments

No comments yet. Be the first to comment!

Comments are moderated and may take some time to appear.

bandarra.me