Agent

Also known as: AI agent, LLM agent, autonomous agent

TL;DR

An agent is an LLM placed in a perception/decision/action loop — it reads context, picks an action (often a tool call), observes the result, and iterates until the goal is met.

An agent is a system where an LLM drives a closed loop: observe state, decide an action, execute it, observe the result, repeat until the task is done. That loop — perception, decision, action, observation — is what separates an agent from a single LLM call wired into an app.

What makes a system “agentic”

Three properties usually have to hold:

The model picks the next step. Not the developer with a hardcoded pipeline. The LLM sees the current state and chooses among options ( tool calls , sub-task creation, asking the user, finishing).
There is feedback from the environment. Tool results, retrieval hits, error messages, user replies — the agent conditions its next move on what happened.
The loop runs more than once. A single tool call followed by a single response is at the edge of agentic. A multi-step trajectory where the model commits, observes, revises is squarely inside.

Static, deterministic chains (retrieve, summarize, return) are not agents — they’re pipelines wearing the costume.

What an agent typically has

An LLM as the decision-maker
A set of tools it can call (often via function calling or MCP )
A memory story for persisting information across turns
An instruction prompt at the top defining role, constraints, and tool-use conventions (often called a system prompt )
An agent loop — the harness code that calls the model, parses tool requests, executes them, and feeds results back

Why this is hard

Per-step error compounds. At 95% per-step accuracy, a 10-step trajectory lands at end-to-end. Reliability work is mostly about reducing per-step error: better tool descriptions , better retrieval , better planning , and ruthless reflection checkpoints.

The other failure mode is unbounded loops — the agent keeps trying things, making no progress, burning tokens. Production harnesses wire explicit step budgets (often 10-25), cost ceilings, and stop conditions.

The agent literature splits the design space into workflows — fixed graphs of LLM calls with explicit branching — and agents — loops where the model picks the next step at runtime. Workflows win on most production tasks where the structure of the work is known.

The reason is variance. A workflow’s failure modes are bounded: each node either succeeds or returns a typed error you can handle. An agent’s failure modes are open-ended — malformed tool arguments, repeated retrievals of the same document, plans that violate an implicit constraint. Token spend on a workflow is predictable; on an agent it isn’t.

Use a workflow until you actually need autonomy. If 80% of queries follow a small number of patterns, encode those patterns as workflows and fall through to an agent loop for the long tail. Anthropic’s “Building effective agents” post is the canonical write-up.

The minimum substrate: the model emits a structured tool request, the harness executes it, and the result is fed back into the model’s context for a second decision. One round-trip with one decision is the smallest viable agent. A single LLM call that returns a tool’s output verbatim is closer to function-as-a-service — the model is acting as a JSON formatter, not making a judgment.

The interesting line in production isn’t “is it an agent?” but “how much autonomy does it have?”. A deterministic tool sequence is debuggable; a loop where the model decides what to do at every step needs trace-replay infrastructure to even understand failures.

Where specialized models help

A capable agent calls many narrow sub-tasks repeatedly: choose a tool, rewrite a query, rerank retrieved snippets, compress a long trace. Each is a candidate for a small specialized model — a 0.1-1B-parameter classifier or reranker hits the same accuracy at 100-1000× lower cost than routing every sub-task through the frontier LLM.

Go further

What's the smallest thing that counts as an agent?

An LLM that decides which of N tools to call given a user query, executes one, and uses the result to produce an answer. Even one branching decision plus one tool round-trip qualifies. Below that — a fixed-template prompt with retrieval pasted in — is just RAG.

Tool use Agentic RAG

Why are agents so much harder to make reliable than chatbots?

A chatbot fails one turn at a time; an agent compounds errors across turns. A 95% per-step success rate becomes a 60% success rate over 10 steps. Most production agent work is about cutting per-step error and bounding the loop — better tool selection, better grounding, better self-checks.

Reflection and critique Agent loop Hallucination

Where do specialized small models fit in an agent?

Tool selection, query rewriting, relevance reranking, and context compression are all narrow tasks called repeatedly inside an agent loop. Each is a candidate for a small specialized model — the LLM stays for open-ended reasoning, and the loop's per-step cost falls dramatically.

Reranker Context compression Query rewriting

← All concepts

The best AI teams build with ZeroEntropy models

Book Demo View docs