Microagent Architecture
What agent builders should take from microservices, and what they should refuse.
The kitchen-sink agent
There is a failure mode that every team building with LLM agents eventually meets. The agent started clean: a focused prompt, four tools, one job. Then it accreted. Someone added a tool for Jira, then one for Linear, then a paragraph explaining when to use which. Edge cases got patched with more instructions. A retrieval step began dumping documents into the context just in case. A year later the system prompt is six thousand tokens, the tool list is pushing forty, and the agent has developed moods. It searches the wiki when it should read the code. It obeys an instruction in paragraph three that paragraph nineteen was supposed to override. It does fine for the first twenty minutes of a task and then, somewhere past the hundredth tool call, quietly loses the plot. Fixing one behavior regresses two others, and nobody can say why, because the only test that exists is running the whole thing end to end and squinting at the output.
If you spent the 2010s anywhere near backend engineering, you have seen this shape before. It is a monolith. And the industry’s response is starting to rhyme with the last one: decompose it. Call the result microagent architecture, the agent-era echo of microservices, with most of the same promises, several of the same traps, and some physics that are genuinely new.
The analogy, made precise
A microagent is a small agent with a single responsibility, a context containing only what that responsibility requires, a toolset scoped to the job, and an explicit contract: a structured task goes in, a structured result comes out. A microagent architecture is a set of these coordinated by an orchestration layer, which may itself be an agent.

The name is doing more than marketing, because the mappings to microservices are specific. The most important mapping is that the context window is the bounded context. In a service architecture, the unit of isolation is the process and its data store. In an agent architecture, it is the context window, the finite span of tokens within which the model can actually attend to things. Nearly everything that goes wrong inside a monolithic agent is contention for that span.
The rest of the correspondences follow. A scoped toolset is a narrow API surface. A per-agent prompt is an independently versioned codebase: you can rewrite the code-review agent’s instructions without touching the migration agent’s, and you can run regression evals against it in isolation, which gives you a unit test where previously your only option was the integration test. Per-agent model selection is the polyglot stack: a fast, cheap model for triage and extraction, a frontier model for synthesis and judgment, each priced to its job. Fanning subtasks out to parallel workers is horizontal scaling. Even the protocol layer arrived on cue. MCP did for tool access what HTTP and JSON did for service integration, which is to say it made the plumbing boring, and agent-to-agent protocols like A2A are reaching for the same standardization between the agents themselves.
Why the monolithic agent breaks
Monolithic services mostly failed for organizational reasons: the release train, the merge queue, eight teams coupled through one deployment. Monolithic agents fail for a more physical reason. Attention is finite, and it degrades.
The first pressure is context degradation. Models reason worse as contexts grow; the research literature calls one flavor of this “lost in the middle,” and practitioners call the broader phenomenon context rot. Worse than raw length is pollution. The four hundred lines of test output from subtask A are still sitting in the window while the model works on subtask B, competing with the instructions that actually matter. A monolithic agent carries its entire history everywhere, and most of that history is debris.
The second is tool confusion. Selection accuracy falls as the tool count rises, especially when descriptions overlap. An agent with six tools picks the right one almost every time. Give it forty, several of which are different flavors of “search,” and it starts grabbing the wrong one, then burns turns recovering.
The third is prompt sprawl. Every instruction in a monolithic prompt is global. The paragraph you added to fix changelog formatting is now in the context for database migrations. There is no modularity, so there is no unit testing, only the end-to-end run, which is slow, expensive, and statistical. Teams learn to fear touching the prompt, which is the same learned helplessness the monolith taught, with the same result: the artifact ossifies.
The fourth is economics. One agent means one model for every step. You pay frontier prices to reformat a date because the context that reformats the date also has to make an architectural judgment three turns later.
Underneath all of these sits the simplest constraint: one context is one thread of attention. There is no parallelism inside a monolithic agent. A breadth task, say surveying twelve services or checking thirty files, happens strictly in sequence while the debris piles up.
The patterns in the wild
None of this is hypothetical. The patterns are in production, and most working engineers have already touched at least one without naming it.
The flagship is orchestrator-worker. A lead agent decomposes the task, dispatches subagents in parallel, and synthesizes their results. Anthropic’s research feature is the best-documented example: a lead agent plans, spawns parallel search subagents that each burn a private context iterating over sources, and synthesizes what comes back. Their engineering write-up reported that this arrangement, with a frontier model leading and lighter models working, beat a single frontier-model agent by roughly ninety percent on their internal research eval. The same analysis found that token spend explained most of the performance variance, and the productive way to spend more tokens turned out to be more contexts rather than longer ones. Hold on to that thought; the bill comes due a few sections down.
The everyday version is Claude Code’s subagent system. You define agents as files, each with its own system prompt, its own tool allowlist, optionally its own model, and the main session delegates to them. The subagent does its forty file reads and dead-end greps inside a disposable context; what crosses back into the main thread is a summary. The main context stays clean enough to keep making good decisions, which is the entire point.
Around the flagship sits a supporting cast with obvious service-era ancestors. The router is the API gateway: a cheap classifier reads the request and dispatches to a specialist. The pipeline is the ETL job, each agent transforming a structured input into a structured output and passing it along; this is closer to a workflow than an agent, and usually better for it. The critic loop pairs a generator with an evaluator, which is code review promoted to architecture. Hierarchies, agents spawning agents, are org charts, with everything that implies.
Where the analogy breaks
An analogy earns its keep where it fails, and this one fails in both directions.
The failure that flatters agents first. Decomposing a service traded runtime performance for organizational velocity; a network call is never faster than a function call, and you accepted the latency to buy team autonomy. Decomposing an agent can improve the output itself, because a narrow, clean context reasons better than a bloated one. The subagent that knows only its task will often beat the omniscient agent dragging six hundred lines of irrelevant scrollback. In services, isolation bought maintainability. In agents, isolation buys capability. That is a better deal than microservices ever offered.
Now the failures that do not flatter. When service A calls service B, the payload arrives intact; serialization is faithful. When an orchestrator hands work to a subagent, it writes a brief, and when the subagent reports back, it writes a summary. Both are lossy compressions performed by a language model with no way to know which details will matter later. The constraint the orchestrator knew but never mentioned, the anomaly the subagent noticed but left out of its report: this is the new class of integration bug, and it throws no exception. Multi-agent systems play telephone by design. Contract discipline, which was good practice in services, is survival here.
Second, the components themselves are probabilistic. Microservices composed deterministic code over an unreliable network, and we built circuit breakers for the network. Microagents compose unreliable components over a reliable network. Chain five steps that each succeed 95 percent of the time and the pipeline runs at about 77. Validators and retries are the new circuit breakers, but the floor is lower, and every added hop adds failure surface in a way that adding a service call did not.
Third, the meter is always running. Every boundary crossing costs tokens and latency, and shared background has to be paid into each new context separately. Anthropic’s published numbers are the honest benchmark here: their multi-agent research runs consumed around fifteen times the tokens of an ordinary chat session. The quality gain was real and so was the bill, and they said plainly that the architecture only makes sense for tasks whose value clears it.
Fourth, shared state has no transactions. The filesystem, usually a repo, becomes the shared database, and two agents writing to it get no isolation levels and no locks unless you build them. Two coding agents editing the same file is a write conflict that nothing in the architecture detects for you.
And one limit that is about fit rather than physics: tightly coupled work resists partition. A task where every decision needs the whole picture, which describes most prose and most single coherent code changes, gets worse when you shard the picture across contexts. Research parallelizes. Judgment mostly does not.
The cautionary tale is the point
The most useful thing microservices can teach agent builders is not the boom. It is the backlash.
Start with the distributed monolith, the canonical anti-pattern: services nominally separate but coupled through shared databases and synchronized deploys, all of the overhead with none of the autonomy. Its agent twin is already common. A “crew” of five agents that share full conversation history, where each agent needs to know everything the others know, is one agent with extra latency and five bills. If you cannot state what an agent does not need to know, you have not designed a boundary; you have drawn a line through a prompt. Call it the distributed prompt.
Premature decomposition transfers almost word for word. Martin Fowler’s monolith-first advice, that you should not begin with microservices even when you are confident the system will eventually justify them, needs no translation at all. Anthropic’s own guidance on building agents opens the same way: find the simplest solution that works, and add complexity only when it demonstrably earns it. Most problems that look like they need a society of agents are workflows wearing a costume.
The pendulum swing is predictable because we have already watched it once. In 2023 a Prime Video team published a write-up describing how it collapsed a distributed serverless monitoring pipeline back into a monolith and cut infrastructure costs by roughly ninety percent, to considerable industry schadenfreude. The agent version of that post is being drafted somewhere right now: a team that replaced its five-agent crew with one well-written prompt and watched cost, latency, and quality all improve. Decomposition is a response to measured pressure. It is not a starting posture, and it is definitely not an aesthetic.
When to split, and what to build first
The signals that you have outgrown a single agent are measurements, not vibes.
- Context overflow on real workloads: tasks failing because the relevant fact scrolled out of the window or drowned in debris, visible in your traces.
- Measured tool confusion: wrong-tool selections occurring at a rate that moves your error budget.
- Parallelizable breadth: the task decomposes into independent reads, like research, survey, or review fan-out, and wall-clock time matters.
- Capability mismatch: a step that succeeds on a model a tenth the price, trapped in a context that forces the expensive one.
- Ownership: two teams need to iterate on two behaviors without retesting each other’s work. I would treat one of these as a hint and three as a mandate.
When you do split, build the disciplines before the agents.
- Contracts: structured task in, structured result out, never a transcript dump, with the brief treated as a first-class artifact, because the orchestrator’s instructions are now load-bearing.
- Traces: spans across every agent call, because you cannot debug a system you cannot replay, and “the subagent did something weird” is not a bug report.
- Evals per agent: a fixture suite for each one, run on every prompt change, exactly as you would unit test a service. A team unwilling to build those three is not ready to split. It is just distributing its prompt.
Context is the new coupling
Architecture has always been the management of a scarce resource. For the monolith era the scarce resources were the release train and the coupling between teams, and microservices spent latency and operational complexity to buy autonomy back. For agent systems the scarce resource is attention, the model’s finite and degradable focus, and the unit of isolation that protects it is the context window.
The lasting contribution of microservices was never the topology. Plenty of teams shipped beautifully on monoliths, and plenty drowned in service meshes. The contribution was the discipline: explicit contracts, observability as a precondition, decomposition under pressure rather than by default. Microagent architecture deserves exactly that treatment, adopted for the same reasons and resisted with the same skepticism. Take the discipline. Make the topology earn itself.