Agent

Also known as: AI agent, LLM agent, autonomous agent

TL;DR

An agent is an LLM placed in a perception/decision/action loop — it reads context, picks an action (often a tool call), observes the result, and iterates until the goal is met.

An agent is a system where an drives a closed loop: observe state, decide an action, execute it, observe the result, repeat until the task is done. That loop — perception, decision, action, observation — is what separates an agent from a single LLM call wired into an app.

WHAT IS AN AGENTA model, a memory, some tools, and a loop.CONTEXTCHOICEWRITERECALLCALLRESULTLOOPobserve · think · actuntil done or out of budgetMODELdecisionpick next actionMEMORYstatewhat we know so farTOOLSeffectreach outside the model

What makes a system “agentic”

Three properties usually have to hold:

  • The model picks the next step. Not the developer with a hardcoded pipeline. The LLM sees the current state and chooses among options ( , sub-task creation, asking the user, finishing).
  • There is feedback from the environment. Tool results, retrieval hits, error messages, user replies — the agent conditions its next move on what happened.
  • The loop runs more than once. A single tool call followed by a single response is at the edge of agentic. A multi-step trajectory where the model commits, observes, revises is squarely inside.

Static, deterministic chains (retrieve, summarize, return) are not agents — they’re pipelines wearing the costume.

OBSERVE · THINK · ACT · REPEATA worked trace through two turns of the loop.FEED RESULT BACKTASKWill my 7am flight tomorrowbe delayed?2 ITERATIONS · THEN EXITOBSERVEUser: will my 7am flight tomorrowbe delayed?AA123 is on time. Also needweather to be sure.THINKNeed flight status. Tool:get_flight_status.Check weather at departure for2026-05-17.ACTget_flight_status("AA123") → ontimeget_weather("2026-05-17", "JFK")→ clearEXIT · TASK DONEFINAL ANSWEROn time, weather clear. No delay expected.Each cycle: see, plan, do, get feedback. The loop exits when the agent decides the task is done.

What an agent typically has

  • An as the decision-maker
  • A set of it can call (often via or )
  • A story for persisting information across turns
  • An instruction prompt at the top defining role, constraints, and tool-use conventions (often called a )
  • An — the harness code that calls the model, parses tool requests, executes them, and feeds results back

Why this is hard

Per-step error compounds. At 95% per-step accuracy, a 10-step trajectory lands at end-to-end. Reliability work is mostly about reducing per-step error: better , better , better , and ruthless checkpoints.

The other failure mode is unbounded loops — the agent keeps trying things, making no progress, burning tokens. Production harnesses wire explicit step budgets (often 10-25), cost ceilings, and stop conditions.

The agent literature splits the design space into workflows — fixed graphs of LLM calls with explicit branching — and agents — loops where the model picks the next step at runtime. Workflows win on most production tasks where the structure of the work is known.

The reason is variance. A workflow’s failure modes are bounded: each node either succeeds or returns a typed error you can handle. An agent’s failure modes are open-ended — malformed tool arguments, repeated retrievals of the same document, plans that violate an implicit constraint. Token spend on a workflow is predictable; on an agent it isn’t.

Use a workflow until you actually need autonomy. If 80% of queries follow a small number of patterns, encode those patterns as workflows and fall through to an agent loop for the long tail. Anthropic’s “Building effective agents” post is the canonical write-up.

The minimum substrate: the model emits a structured tool request, the harness executes it, and the result is fed back into the model’s context for a second decision. One round-trip with one decision is the smallest viable agent. A single LLM call that returns a tool’s output verbatim is closer to function-as-a-service — the model is acting as a JSON formatter, not making a judgment.

The interesting line in production isn’t “is it an agent?” but “how much autonomy does it have?”. A deterministic tool sequence is debuggable; a loop where the model decides what to do at every step needs trace-replay infrastructure to even understand failures.

Where specialized models help

A capable agent calls many narrow sub-tasks repeatedly: choose a tool, rewrite a query, rerank retrieved snippets, compress a long trace. Each is a candidate for a small specialized model — a 0.1-1B-parameter classifier or hits the same accuracy at 100-1000× lower cost than routing every sub-task through the frontier LLM.

Go further

What's the smallest thing that counts as an agent?

An LLM that decides which of N tools to call given a user query, executes one, and uses the result to produce an answer. Even one branching decision plus one tool round-trip qualifies. Below that — a fixed-template prompt with retrieval pasted in — is just RAG.

Why are agents so much harder to make reliable than chatbots?

A chatbot fails one turn at a time; an agent compounds errors across turns. A 95% per-step success rate becomes a 60% success rate over 10 steps. Most production agent work is about cutting per-step error and bounding the loop — better tool selection, better grounding, better self-checks.

Where do specialized small models fit in an agent?

Tool selection, query rewriting, relevance reranking, and context compression are all narrow tasks called repeatedly inside an agent loop. Each is a candidate for a small specialized model — the LLM stays for open-ended reasoning, and the loop's per-step cost falls dramatically.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord