Multi-Agent Orchestration Patterns

Why Single-Agent Architectures Hit a Ceiling

A single agent with a long context window and many tools works until it doesn't. The failure modes are predictable: tool selection accuracy degrades with tool count (an agent with 20 tools performs meaningfully worse on tool selection than one with 5), context windows fill with irrelevant information that degrades reasoning, and a single agent failure takes down the entire task.

Multi-agent architectures decompose the problem. Specialised agents handle specific subtasks; an orchestrator routes between them. The added complexity is real: coordination, error handling, and state management are all harder. The payoff is reliability and quality on tasks that exceed single-agent capability.

The Three Core Patterns

Router Pattern

An orchestrator agent classifies the incoming request and routes it to a specialised agent. No output from one agent feeds into another; each agent works independently and returns directly to the user.

When to use it: Tasks fall into distinct, non-overlapping categories. Customer support routing (billing → billing agent, technical → tech agent, sales → sales agent) is the canonical example.

Failure mode: Misrouting. The router must classify accurately or task quality degrades badly. Router accuracy needs to be north of 95% in production. Anything lower and you're better off using a single general-purpose agent with all tools.

Implementation: The simplest router is a classifier prompt. For high-stakes routing, fine-tune a small model (e.g., DistilBERT) on labelled examples. Fine-tuned classifiers outperform prompted GPT-4 on narrow classification tasks and cost 100x less per request.

Pipeline Pattern

Agents execute in sequence. Agent A's output is Agent B's input. Each agent performs a single transformation.

Example: Research pipeline: retrieval agent fetches sources → extraction agent pulls key facts → synthesis agent writes the final answer → critic agent checks for errors.

When to use it: Tasks have a natural sequence of transformations where each step requires specialised reasoning. Code review pipelines (parse → static analysis → security check → style check) map cleanly to this pattern.

Failure mode: Error propagation. A mistake in step 1 compounds through steps 2, 3, and 4. The pipeline doesn't self-correct. I've seen pipelines where a retrieval agent returns the wrong document and the downstream synthesis agent confidently writes a confident, well-structured, completely wrong answer.

Mitigation: add validation steps between stages. A lightweight "does this output make sense" check after each stage catches ~80% of propagation errors before they compound.

Supervisor Pattern

A supervisor agent dynamically allocates tasks to specialised worker agents, monitors their outputs, and handles failures or insufficient results by re-routing or retrying.

When to use it: Complex, open-ended tasks with unpredictable subtask sequences. Research tasks, software engineering agents (Devin-style), and autonomous workflows where the task decomposition isn't known upfront.

Trade-offs: This is the hardest pattern to implement correctly. The supervisor needs to detect worker failures, decide whether to retry (same agent), reroute (different agent), or escalate. Getting this right requires careful prompt engineering and explicit failure state handling. Most production supervisor implementations have explicit retry limits (3 attempts) and fallback to human escalation.

LangGraph vs Custom Orchestration

LangGraph (part of LangChain) models agent workflows as directed graphs with explicit state management. Nodes are agents or tools; edges define the routing logic. State is passed between nodes explicitly.

Where LangGraph adds value:

Graph visualisation and debugging
Built-in checkpointing (resume interrupted workflows)
Human-in-the-loop patterns (pause for approval before a destructive action)

Where it adds friction:

The abstraction leaks on non-standard patterns. Complex conditional routing requires workarounds.
Debugging multi-hop state in LangGraph is harder than in raw Python with explicit logging.
Library churn: LangChain has broken APIs repeatedly. Teams shipping production agents have increasingly migrated to custom orchestration over the past 12 months.

My default for new production systems: custom orchestration with raw API calls, explicit state as a Python dataclass, and structured logging at each agent boundary. Add LangGraph only if you need its checkpointing or human-in-the-loop features.

State Management

The hardest part of multi-agent systems is state. Each agent needs enough context to do its job without being overwhelmed by irrelevant history.

Shared state vs message passing. Shared state (a global object all agents read and write) is simple but creates contention and makes debugging harder. Message passing (each agent receives only the inputs it needs, returns only the outputs it produces) is cleaner and scales better, but requires explicit interface design upfront.

State compaction. In long-running workflows, state accumulates. By step 10, passing the full history to each agent costs tokens and degrades focus. Periodically summarise or prune state: pass a compact summary plus only the last N steps rather than the full history.

Error Handling

Production multi-agent systems need explicit failure handling at every agent boundary:

Timeout: Every agent call has a maximum wall-clock time. Exceeded → retry or fail fast.
Output validation: Every agent output is validated against a schema before being passed downstream. Malformed outputs trigger a retry with an explicit correction prompt.
Retry budget: Set a maximum retry count per task. Infinite retry loops will eventually exhaust your API budget.