Demystifying AI Agents: Learning the Mechanics with Rust

An AI Agent is a system that pairs a Large Language Model with a set of tools and a control loop. The model receives a prompt, decides whether to invoke a tool or provide a text response, and the loop repeats until the task is complete.

When using an established framework, the mechanics of that loop are heavily abstracted. The concrete types, the ownership of the conversation history, and the data structures passed to tool functions are buried under layers of routing and configuration. You can use these frameworks for months and still treat the underlying system as a black box.

We use black boxes every day. Most developers don't know how a database manages disk IO or how a compiler optimizes a branch, and usually, we don't have to. But AI frameworks represent a different kind of black box: they make the system look like magic. When the mechanics are abstracted away, the agent feels like a sentient entity rather than a piece of software. But 'magic' is just another word for 'unpredictable.' To build something reliable, you have to trade the magic for mechanics

Building an agent from scratch exposes the plumbing. agent-rig is a Rust library I built for exactly that purpose: no macros, no hidden state, no framework opinions. Just the structural foundation underneath an agent system.

Code runs throughout as a concrete anchor. By the end, you should be able to trace how a user prompt becomes a tool execution and then a response, which is the part most frameworks actively hide from you.

If you cannot trace how a user prompt becomes a tool execution and then a response, you aren't controlling the agent. You are treating it like a black box.

The Engine (The Loop)

If there is a secret to AI agents, it is this: AI agents aren't magical entities; they are just while loops with a better PR department. Strip away the marketing, and the heartbeat of every agent looks exactly like this:

An LLM on its own is a single-shot function. You give it text, it gives you text back, and it stops. To make it "agentic," you have to wrap it in a control loop that gives it the ability to pause, ask for external data, and resume.

Here is the heartbeat of every agent framework, stripped of all abstractions:

// The Agentic Loop
loop {
    // 1. Ask the model what to do next based on the current state
    let response = model.generate(history, tools).await;

    if let Some(text) = response.text {
        // 2a. The model provided a final answer. We are done.
        return text; 
    }

    if let Some(tool_calls) = response.tool_calls {
        // 2b. The model wants to perform actions.
        for call in tool_calls {
            // 3. The system runs the requested code
            let result = execute_tool(call); 
            
            // 4. Record what happened
            history.append(call);
            history.append(result);
        }
        // 5. Loop repeats, sending the updated history back to the model
    }
}

Visually, this process transforms the linear logic above into a recursive engine:

In agent-rig, this loop lives inside the AgentRunner. The runner orchestrates the flow of data. It takes the user's prompt, asks the model what to do, routes any requested actions to your actual code, and feeds the results back into the model until a final answer emerges.

Notice what isn't in this loop: state.

The engine itself doesn't remember anything between runs. It simply takes the current state (the history), passes it to the model, and applies the model's decisions. If the model asks to fetch three URLs, the engine fetches them concurrently, appends the HTML to the history, and loops again.

The agent isn't a persistent "being" that lives in memory. It is a series of independent stateless executions chained together by a loop. While we treat the agent as stateless for architectural purity, the application is very much stateful. The challenge of agent engineering is effectively syncing the "Long-term" state of your database with the "Short-term" context of the LLM loop without blowing your token budget.

The Brain (The Model)

Inside the loop, we have the "Brain." In a traditional software system, you would write if/else statements or a state machine to decide what happens next. In an agent, you delegate that logic to a Large Language Model.

The Brain is stateless. It has no persistent memory of its own. Call it twice with the same input and it doesn't remember the first call. It is a mathematical function that takes a snapshot of the current situation (the history and the available tools) and predicts the single best next step.

From Text to Intent

We usually think of LLMs as chatbots that generate sentences. But in an agentic loop, the model’s role shifts from "talking" to "deciding." When the Brain receives a request, it has two options:

Provide an Answer: "The weather in London is 22°C."
Request an Action: "I don't know the weather. Please call get_weather(city: 'London')."

In our library, this is captured by the LlmModel trait. Whether you are using a cloud-based model like Gemini 2.5 Pro or a small model running locally via Ollama, the interface is identical:

pub struct ModelResponse {
    pub text: Option<String>,      // Option 1: A final answer
    pub tool_calls: Vec<ToolCall>, // Option 2: A request for action
}

Provider Agnosticism

Because the Brain is just a trait, the rest of your agentic system doesn't care who made the model. You can develop your logic using a cheap local model and then swap it for a powerful cloud provider with one line of code:

// Switch from local Llama to cloud Gemini 
let model = GeminiModel::builder(api_key, "gemini-2.5-flash").build();

The Brain doesn't know it's part of a loop, and it doesn't know which company's servers it's running on. It just looks at the history you provide and predicts the next move.

The "intelligence" of an agent isn't in the loop; it’s in the model's ability to choose the correct tool call when it hits a gap in its knowledge.

The Confidence Trap: This architecture relies on the model knowing when it hit a gap in its knowledge. However, models are trained to be helpful, which often makes them overconfident. If a model thinks it knows the answer, it may bypass your tool and simply hallucinate a plausible-sounding fact. This is why precise tool descriptions are mandatory—you are fighting the model's urge to just 'guess.'

The Hands (The Tools)

LLMs cannot "do" anything. They are token predictors: they take a sequence of text and predict the most likely continuation. To make an LLM interact with the real world, we give it "hands" through Tools.

But how does a text-prediction engine actually "call" a function?

The Mechanics: Thinking in JSON

When you configure an agent with a tool, you aren't sending code to the model. You are sending a Definition, a JSON-based description that explains what the tool does and what arguments it expects.

pub struct ToolDefinition {
    pub name: String,
    pub description: String,
    pub parameters: serde_json::Value, // JSON Schema
}

This definition is injected into the model's prompt. Most modern models support "Native Tool Calling," which is a fancy way of saying they have been specifically trained to recognize these definitions. When the model determines it needs to use a tool, it stops generating human-readable sentences and instead generates a specific JSON payload.

Depending on the provider, this might be wrapped in special tokens (e.g., <tool_call> ... </tool_call>) or emitted through a dedicated API field. For example, to get the weather, the model doesn't just say "call the weather tool." It predicts tokens that form this:

{
  "name": "get_weather",
  "args": { "city": "London" }
}

The Bridge: From Text to Code

The execution engine (the Loop) sees this JSON, pauses the model, and looks for a registered tool matching the name get_weather. It then executes the actual code (the call function) using the provided arguments.

pub trait Tool {
    fn definition(&self) -> ToolDefinition;
    async fn call(&self, args: serde_json::Value) -> Result<serde_json::Value, Error>;
}

The model never runs your code; your system does. The model simply predicts the arguments it thinks your code needs. Once the tool returns a result, the engine appends that result to the history and restarts the model.

An API for One

This architecture forces a shift in how you think about documentation. Usually, you write docs for other humans. Here, you are writing documentation for a model.

The name needs to be distinctive. The description must be precise about what the tool does and when to use it. The parameters must use clear names and types.

If your description is poor, the model will hallucinate arguments or call the tool at the wrong time. In this world, your "API documentation" is actually your code’s runtime logic.

Tool calling is just a specialized form of text completion. The model isn't "running" a function; it is predicting the JSON payload that it thinks will convince your engine to run the function for it.

The Memory (The History)

If the Engine is a stateless loop and the Brain is a stateless math function, where does the "agent" actually live? Where is its memory?

The answer is simple: an agent's entire state is just an array of text messages.

Short-Term Memory: The Conversation Log

Setting aside potential optimizations like KV-caching or history summarization, the fundamental mechanics of agent memory are remarkably simple: the system resends the entire conversation history up to that point.

When you use an LLM, it feels like it remembers what you said five minutes ago. It doesn't. Every time you send a prompt, the system resends the entire conversation history up to that point.

In code, this history is just a Vec<Message>.

Every action the agent takes must be appended to this log so that on the next iteration of the loop, the Brain knows what just happened. If the model calls a tool, we append three things to the history:

The user's original request.
The model's request to use a tool (e.g., get_weather(London)).
The tool's result (e.g., {"temp": 22}).

When the loop runs again, the model reads the whole transcript, sees that the tool returned 22, and finally predicts the text: "The weather in London is 22°C."

Because the engine itself is stateless, the responsibility of holding this Vec<Message> falls to the caller. This is why context windows (the maximum length of the history) are such a critical bottleneck in agent design.

Long-Term Memory: Tools in Disguise

What if you want the agent to remember a fact between sessions, after the context window is cleared?

You might think you need a complex "Memory Manager" subsystem. You don't. You just need to give the model tools that interact with a database.

remember_fact(fact: String): A tool that takes a string and writes it to a file or database.
recall_fact(query: String): A tool that searches that database and returns the result.

If a user says, "My dog's name is Barnaby," the model calls remember_fact. In a completely separate session a week later, if the user asks, "What is my dog's name?", the model calls recall_fact, gets the answer, and responds.

There is no magic "memory module" in an AI agent. Short-term memory is just a growing array of strings. Long-term memory is just the agent using its hands (Tools) to interact with a filing cabinet (Database).

The Blueprint (The Configuration)

We have looked at the mechanics: a loop that feeds an array of messages and tool definitions into a stateless text-predictor. But how do you tell this mechanical system to be a "Helpful Coding Assistant" rather than a "Snarky Weather Bot"?

You need a configuration. In agent-rig, this is the Agent struct.

The Agent struct is pure data. It holds no network connections and no active state. It is a static blueprint that defines the intent of the system before the loop even starts:

pub struct Agent {
    pub name: String,
    pub instructions: String,
    pub output_schema: Option<serde_json::Value>,
    pub tool_names: Vec<String>,
}

The Specification

When you initialize the Engine, you hand it this blueprint. The engine uses it to set up the starting conditions for the Brain.

instructions is the system prompt: the persona, constraints, and operational boundaries. These get prepended to the message history on every single iteration, keeping the model aligned with its task.

tool_names is a permission whitelist. A system might have dozens of tools registered (e.g., read_file, query_database, delete_record), but this list restricts the model to only the tools it needs for its specific role. A "research assistant" cannot access delete_record even if it predicts the correct JSON payload to call it.

output_schema is a structural constraint. If the agent returns data for another machine to consume, this JSON schema forces the model's final text output to match a specific structure.

let agent = Agent::builder()
    .name("technical-editor")
    .instructions("You review documentation for clarity and technical accuracy.")
    .tool("check_links")
    .tool("validate_code_snippets")
    .build();

Because the Blueprint is just a plain data structure, it is portable. Store your agent definitions in a database or configuration file. Update an agent's persona or permissions without touching a single line of execution logic in the Engine.

If the configuration is the job description, the Engine is the employee who reads the description and starts working. The more precise the job description, the more predictable the employee's performance.

Conclusion: Putting It Together

An AI agent is a specific arrangement of standard software patterns combined with an LLM. Nothing more.

Once you can see that, 'autonomy' stops feeling like magic and debugging becomes concrete: inspect the JSON schemas, watch what's going into the message array, and tighten the system prompt. The frameworks that hide this plumbing aren't saving you from complexity. They’re just deferring it until your ‘happy path’ meets the unfiltered chaos of a live user.

If you want to see exactly how these pieces are implemented, or want a macro-free foundation to build your own agents, check out agent-rig on GitHub.

Note: This post was created with the assistance of AI. While a human carefully reviewed and edited the content, it's important to remember that AI tools may introduce errors or biases. If you have any concerns or questions, please feel free to reach out.

Comments

Otávio 04 Jun 2026

Great piece, André. Btw, are u brazilian too? kkk Back to the article, this all makes sense when we're dealing with e-commerce data. These mechanisms help us deliver better product recommendations, ads etc. Having workek with databases, I now see that memory management often has an even greater impact than LLM configuration itself

Comments are moderated and may take some time to appear.

bandarra.me