Also known as: AI agent, LLM agent, autonomous agent
TL;DR
An agent is an LLM placed in a perception/decision/action loop — it reads context, picks an action (often a tool call), observes the result, and iterates until the goal is met.
An agent is a system where an LLM drives a closed loop: observe state, decide an action, execute it, observe the result, repeat until the task is done. That loop — perception, decision, action, observation — is what separates an agent from a single LLM call wired into an app.
What makes a system “agentic”
Three properties usually have to hold:
The model picks the next step. Not the developer with a hardcoded pipeline. The LLM sees the current state and chooses among options ( tool calls , sub-task creation, asking the user, finishing).
There is feedback from the environment. Tool results, retrieval hits, error messages, user replies — the agent conditions its next move on what happened.
The loop runs more than once. A single tool call followed by a single response is at the edge of agentic. A multi-step trajectory where the model commits, observes, revises is squarely inside.
Static, deterministic chains (retrieve, summarize, return) are not agents — they’re pipelines wearing the costume.
A memory story for persisting information across turns
An instruction prompt at the top defining role, constraints, and tool-use conventions (often called a system prompt )
An agent loop — the harness code that calls the model, parses tool requests, executes them, and feeds results back
Why this is hard
Per-step error compounds. At 95% per-step accuracy, a 10-step trajectory lands at end-to-end. Reliability work is mostly about reducing per-step error: better tool descriptions , better retrieval , better planning , and ruthless reflection checkpoints.
The other failure mode is unbounded loops — the agent keeps trying things, making no progress, burning tokens. Production harnesses wire explicit step budgets (often 10-25), cost ceilings, and stop conditions.
The agent literature splits the design space into workflows — fixed graphs of LLM calls with explicit branching — and agents — loops where the model picks the next step at runtime. Workflows win on most production tasks where the structure of the work is known.
The reason is variance. A workflow’s failure modes are bounded: each node either succeeds or returns a typed error you can handle. An agent’s failure modes are open-ended — malformed tool arguments, repeated retrievals of the same document, plans that violate an implicit constraint. Token spend on a workflow is predictable; on an agent it isn’t.
Use a workflow until you actually need autonomy. If 80% of queries follow a small number of patterns, encode those patterns as workflows and fall through to an agent loop for the long tail. Anthropic’s “Building effective agents” post is the canonical write-up.
The minimum substrate: the model emits a structured tool request, the harness executes it, and the result is fed back into the model’s context for a second decision. One round-trip with one decision is the smallest viable agent. A single LLM call that returns a tool’s output verbatim is closer to function-as-a-service — the model is acting as a JSON formatter, not making a judgment.
The interesting line in production isn’t “is it an agent?” but “how much autonomy does it have?”. A deterministic tool sequence is debuggable; a loop where the model decides what to do at every step needs trace-replay infrastructure to even understand failures.
Where specialized models help
A capable agent calls many narrow sub-tasks repeatedly: choose a tool, rewrite a query, rerank retrieved snippets, compress a long trace. Each is a candidate for a small specialized model — a 0.1-1B-parameter classifier or reranker hits the same accuracy at 100-1000× lower cost than routing every sub-task through the frontier LLM.
Go further
What's the smallest thing that counts as an agent?
An LLM that decides which of N tools to call given a user query, executes one, and uses the result to produce an answer. Even one branching decision plus one tool round-trip qualifies. Below that — a fixed-template prompt with retrieval pasted in — is just RAG.
Why are agents so much harder to make reliable than chatbots?
A chatbot fails one turn at a time; an agent compounds errors across turns. A 95% per-step success rate becomes a 60% success rate over 10 steps. Most production agent work is about cutting per-step error and bounding the loop — better tool selection, better grounding, better self-checks.
Where do specialized small models fit in an agent?
Tool selection, query rewriting, relevance reranking, and context compression are all narrow tasks called repeatedly inside an agent loop. Each is a candidate for a small specialized model — the LLM stays for open-ended reasoning, and the loop's per-step cost falls dramatically.