AI agents are no longer experimental. Companies are deploying them to handle document processing, code generation, customer support, and complex multi-step workflows. But most tutorials stop at “call the OpenAI API” — they don't address what happens when you need agents that run reliably in production, handle failures gracefully, and scale beyond a single prompt-response loop.
This article covers the architecture patterns we use at Sooties to build scalable AI agent systems — drawn from real production deployments with multi-phase pipelines, automated verification, and autonomous execution loops.
The Problem with Simple Agent Loops
The most common agent pattern looks like this: take user input, call an LLM, parse the output, execute a tool, repeat. It works for demos but falls apart in production for several reasons:
- No separation between planning and execution
- A single failure aborts the entire workflow
- No verification that the output is actually correct
- No way to resume from where things left off
- Unbounded token usage with no cost controls
Production agent systems need a fundamentally different architecture — one built around phases, verification loops, and graceful degradation.
Architecture: Multi-Phase Agent Pipelines
The core idea is to break agent workflows into discrete phases, each with clear inputs, outputs, and success criteria. Instead of one monolithic agent loop, you get a pipeline of specialized stages.
// Phase-based agent pipeline
const phases = [
{ name: 'plan', handler: planPhase },
{ name: 'implement', handler: implementPhase },
{ name: 'verify', handler: verifyPhase },
{ name: 'refactor', handler: refactorPhase },
{ name: 'finalize', handler: finalizePhase },
];
async function runPipeline(input: AgentInput) {
let context = { input, artifacts: {} };
for (const phase of phases) {
console.log(`[phase:${phase.name}] starting`);
context = await phase.handler(context);
if (context.failed) {
console.log(`[phase:${phase.name}] failed, running recovery`);
context = await recoverPhase(context, phase);
}
}
return context.artifacts;
}Phase 0: Planning
Before executing anything, have the agent analyze the task and produce a structured plan. This plan becomes the source of truth for all subsequent phases. The key insight is to extract individual work items (todos) from the plan so each can be tracked independently.
// Extract structured todos from a plan
async function planPhase(context: PipelineContext) {
const plan = await llm.generate({
system: "Analyze this task and produce a numbered plan.",
prompt: context.input.description,
});
// Parse the plan into individual, trackable items
const todos = await llm.generate({
system: "Extract each action item as a JSON array.",
prompt: plan,
schema: z.array(z.object({
id: z.number(),
description: z.string(),
dependencies: z.array(z.number()),
})),
});
return { ...context, artifacts: { plan, todos } };
}Phase 1: Implementation Loop
Process each todo item sequentially, but with a critical difference from naive approaches: each item gets its own execution context, and you verify incrementally rather than at the end.
async function implementPhase(context: PipelineContext) {
const { todos } = context.artifacts;
for (const todo of todos) {
// Each todo gets its own agent invocation
const result = await agent.run({
task: todo.description,
context: context.artifacts,
tools: ['file_write', 'file_read', 'shell_exec'],
maxTurns: 20, // Bounded execution
});
// Incremental verification after each step
const verified = await quickVerify(result);
if (!verified) {
await rollback(todo);
// Re-attempt with more context about the failure
await agent.run({
task: `Fix: ${todo.description}. Previous attempt failed: ${verified.reason}`,
context: { ...context.artifacts, previousError: verified.reason },
});
}
}
return context;
}Verification: The Most Important Phase
Verification is what separates toy agents from production agents. Every output should be checked — not by the same agent that produced it, but by an independent verification step. This is analogous to how code review works: the author shouldn't be the sole reviewer.
async function verifyPhase(context: PipelineContext) {
const MAX_RETRIES = 3;
let attempt = 0;
while (attempt < MAX_RETRIES) {
// Run automated checks (tests, linting, type checking)
const checks = await runChecks({
tests: true,
lint: true,
typeCheck: true,
});
if (checks.passed) {
return { ...context, verified: true };
}
// If checks fail, spawn a fix agent with the error context
await fixAgent.run({
task: 'Fix the failing checks',
errors: checks.errors,
context: context.artifacts,
});
attempt++;
}
// After max retries, flag for human review
return { ...context, verified: false, needsHumanReview: true };
}The retry loop with bounded attempts prevents infinite loops while still giving the system a chance to self-correct. After exhausting retries, it escalates to a human — never silently fails.
Specialized Agent Chains
Rather than one general-purpose agent, use chains of specialized agents that each handle a narrow concern. This is more reliable than asking a single agent to do everything, and it's easier to debug when things go wrong.
// Refactoring agent chain — each agent has a single concern
const refactorChain = [
{
name: 'simplify',
prompt: 'Remove over-engineering. Inline unnecessary abstractions.',
},
{
name: 'naming',
prompt: 'Improve variable and function names for clarity.',
},
{
name: 'patterns',
prompt: 'Check for consistent patterns: imports, error handling, i18n.',
},
{
name: 'cleanup',
prompt: 'Remove dead code, unused imports, unnecessary comments.',
},
];
async function runRefactorChain(context: PipelineContext) {
for (const agent of refactorChain) {
const result = await llm.generate({
system: agent.prompt,
prompt: context.currentCode,
});
// Verify after each refactor step — rollback if tests break
const checksPass = await runChecks({ tests: true });
if (!checksPass) {
await rollback();
console.log(`[refactor:${agent.name}] rolled back — tests failed`);
continue; // Skip this refactor, try the next one
}
context.currentCode = result;
}
return context;
}Each agent in the chain makes a small, focused change. If any step breaks the tests, it rolls back and moves to the next agent. The chain as a whole is resilient even if individual steps fail.
Background Verification Agents
For long-running tasks, run verification in the background while the main agent continues working. This overlapping execution significantly reduces total pipeline time.
async function implementWithBackgroundVerify(context: PipelineContext) {
// Start verification agent in the background
const verifyPromise = spawnBackgroundAgent({
type: 'verify',
task: 'Continuously verify the implementation as changes are made',
watchPaths: ['src/**/*.ts', 'tests/**/*.ts'],
});
// Main implementation continues in parallel
for (const todo of context.artifacts.todos) {
await implementTodo(todo, context);
}
// Wait for background verification to complete
const verifyResult = await verifyPromise;
if (!verifyResult.passed) {
// Fix issues found by background verification
await fixAgent.run({ errors: verifyResult.errors });
}
return context;
}Orchestration Patterns
The orchestrator is the top-level controller that manages the entire pipeline. It handles phase transitions, error recovery, and determines which pipeline variant to use based on task complexity.
interface OrchestratorConfig {
maxRetries: number;
maxTotalTokens: number;
timeout: number;
pipeline: 'simple' | 'full';
}
async function orchestrate(task: AgentTask, config: OrchestratorConfig) {
// Select pipeline based on task complexity
const pipeline = config.pipeline === 'simple'
? [planPhase, implementPhase, verifyPhase]
: [planPhase, implementPhase, verifyPhase, refactorPhase, verifyPhase];
const context: PipelineContext = {
task,
artifacts: {},
tokenUsage: 0,
startTime: Date.now(),
};
for (const phase of pipeline) {
// Check resource limits before each phase
if (context.tokenUsage > config.maxTotalTokens) {
return { status: 'budget_exceeded', context };
}
if (Date.now() - context.startTime > config.timeout) {
return { status: 'timeout', context };
}
try {
await phase(context);
} catch (err) {
context.errors.push({ phase: phase.name, error: err });
if (context.errors.length >= config.maxRetries) {
return { status: 'max_retries_exceeded', context };
}
}
}
return { status: 'success', context };
}Infrastructure Considerations
Token Budget Management
Every LLM call costs money. In a multi-phase pipeline with retries, costs can spiral without controls. Track cumulative token usage across the pipeline and set hard limits per phase and per pipeline run.
Observability
Agent systems are notoriously hard to debug. Instrument every phase with structured logging: which agent ran, what it produced, how many tokens it consumed, whether verification passed. Tools like DataDog or OpenTelemetry work well here — treat agent runs like distributed traces.
Stateless Agents, Stateful Orchestrator
Individual agents should be stateless — given the same inputs, they produce the same outputs. All state lives in the orchestrator's pipeline context. This makes it trivial to retry individual phases, swap out agent implementations, or resume a failed pipeline from the last successful phase.
Queue-Based Execution
For high-throughput systems, decouple agent invocations from the orchestrator using a message queue (RabbitMQ, SQS, or similar). The orchestrator publishes phase tasks to the queue, and worker processes pick them up. This gives you horizontal scaling, backpressure handling, and dead-letter queues for failed tasks.
Key Takeaways
- Break workflows into phases — plan, implement, verify, refactor. Each phase has clear inputs and outputs.
- Verify independently — never let the same agent validate its own work. Use automated checks and separate verification agents.
- Bound everything — max retries, token budgets, timeouts. Unbounded agent loops are a production incident waiting to happen.
- Specialize your agents — a chain of focused agents beats one general-purpose agent for reliability and debuggability.
- Design for failure — rollback on failed steps, escalate to humans after retries, and log everything.
The patterns above have been battle-tested in production systems handling thousands of agent runs daily. They're not theoretical — they're the result of iterating through every way an agent can fail and building safeguards for each one.