Sub-agent orchestration is a powerful pattern for building modular AI systems.

Instead of a single monolithic prompt, you delegate specialized tasks to purpose-built agents—each optimized for its role.

Sub-Agent Orchestration

The Pattern

An Orchestrator coordinates the workflow, deciding which specialized agents to invoke based on task requirements. Each Sub-agent operates in isolation with its own context, system prompt, and potentially different models.

Why this wins:

  • Modularity: Each agent has a single responsibility
  • Cost Optimization: Use expensive models only where needed
  • Context Isolation: Sub-agents don’t inherit bloated conversation history
  • Flexibility: Easy to add, modify, or swap agents
spring subagent architecture

Image from spring.io blog

In this article we’ll implement sub-agent orchestration using spring-ai-agent-utils, with the Architect-Builder pattern as our example.

Example: The Architect-Builder Pattern

One powerful sub-agent configuration is the Architect-Builder pattern:

  • Architect (expensive reasoning model): Analyzes data, extracts facts, creates structured blueprints
  • Builder (cheap fast model): Generates polished prose from the blueprint

The Architect cannot hallucinate because it only outputs structured data. The Builder cannot hallucinate because it’s locked into the Architect’s blueprint.

Cost Analysis

Assume a task requiring 2000 reasoning tokens and 1500 output tokens:

Approach Model Input Output Cost
Single Call o3-mini 500 3500 $0.077
Architect o3-mini 500 500 $0.022
Builder gpt-4o-mini 600 1500 $0.001
Total $0.023

70% savings by moving generation work to the cheap model.

How Sub-Agent Calling Works

The Task Tool

When Spring AI starts, TaskTool loads agent definitions (markdown files) and builds an Agent Registry. This registry is injected into the orchestrator’s context as a tool:

{
  "name": "Task",
  "description": "Launch a specialized agent. Available agents: 
    - architect: Strategic reasoning agent
    - builder: High-speed content generation",
  "parameters": {
    "subagent_type": "string (required)",
    "prompt": "string (required)"
  }
}

Delegation Flow

When the orchestrator decides to delegate, it responds with a tool call request:

{
  "tool_calls": [{
    "name": "Task",
    "arguments": {
      "subagent_type": "architect",
      "prompt": "Analyze the differences between these articles..."
    }
  }]
}

Spring AI intercepts this, spawns the sub-agent with a fresh, isolated context, executes it, and returns the result.

Context Isolation

Each sub-agent operates in its own isolated context window:

  • Receives only its system prompt + the task prompt
  • Does not see the orchestrator’s conversation history
  • Can use a different LLM than the orchestrator

This prevents context pollution and enables multi-model routing.

Response Flow

Orchestrator                    TaskTool                      Sub-agent
     |                              |                               |
     |-- tool_call(architect) ----->|                               |
     |                              |-- spawn with fresh context -->|
     |                              |                               |
     |                              |<----- Response ---------------|
     |<-- tool_response ------------|                               
     |                                                              
     |-- tool_call(builder) ------->|                               
     |                              |-- spawn Builder ------------->|
     |<-- tool_response ------------|<----- Response ---------------|
     |
     |-- final response to user

Implementation

Agent Definitions

Agents are defined as markdown files with YAML frontmatter:

architect.md

---
name: architect
description: Use for complex analysis requiring deep reasoning
model: o3-mini
---
You are the Architect - a strategic reasoning agent.
Analyze input data and produce a structured Blueprint.
DO NOT write the final response.

builder.md

---
name: builder
description: Generate polished final content from a blueprint
model: gpt-4o-mini
---
You are the Builder - a high-speed execution engine.
WORK ONLY from the provided Blueprint.
DO NOT add outside information.

Configuration

@Configuration
public class AgentConfig {

    @Value("${agent.tasks.paths}")
    private List<Resource> agentPaths;

    @Bean("orchestratorClient")
    ChatClient orchestratorClient(ChatClient.Builder chatClientBuilder) {

        SubagentType claudeType = ClaudeSubagentType.builder()
                .chatClientBuilder("default", chatClientBuilder.clone())
                .build();

        var taskTools = TaskTool.builder()
                .subagentReferences(ClaudeSubagentReferences.fromResources(agentPaths))
                .subagentTypes(claudeType)
                .build();

        return chatClientBuilder.clone()
                .defaultToolCallbacks(taskTools)
                .defaultAdvisors(
                        ToolCallAdvisor.builder()
                                .conversationHistoryEnabled(true)
                                .build())
                .build();
    }
}

In application.yaml:

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o-mini

agent:
  tasks:
    paths: classpath:/agents/*.md

The Service

The orchestrator prompt describes available agents but lets the LLM decide which to use:

@Service
public class SubagentService {

    private final ChatClient orchestratorClient;

    public SubagentService(@Qualifier("orchestratorClient") ChatClient orchestratorClient) {
        this.orchestratorClient = orchestratorClient;
    }

    public String process(String task, String data) {
        String prompt = """
                You are a task orchestrator with access to specialized agents via the Task tool.

                Available agents:
                - architect: Use for complex analysis requiring deep reasoning
                - builder: Use to generate polished final content

                Guidelines:
                - For complex tasks: Use architect first, then builder
                - For simple tasks: Use builder directly or respond yourself
                - Always return the final response to the user.

                Task: %s
                Data: %s
                """.formatted(task, data);

        return orchestratorClient.prompt(prompt).call().content();
    }
}

Trade-offs: Orchestration vs Direct Calls

Token Overhead

Orchestration adds overhead from TaskTool descriptions and conversation history:

Approach Total Input Tokens Total Output Tokens
With Orchestrator ~21,000 ~1,700
Direct Calls ~1,300 ~800

When to Use Each

Use Orchestration when:

  • Dynamic workflows where the LLM decides which agents to call
  • User-facing chat interfaces
  • Multi-step tasks with branching logic

Use Direct Calls when:

  • Fixed pipelines (architect → builder is always the same)
  • Maximum token efficiency is critical
  • Predictable, deterministic workflows

For direct calls:

public String processDirect(String task, String data) {
    String blueprint = architectClient.prompt(task + "\n" + data).call().content();
    return builderClient.prompt(blueprint).call().content();
}

When to Use Sub-Agent Patterns

Good fit:

  • Tasks requiring multiple specialized skills
  • Cost optimization by model routing
  • Complex workflows with conditional logic

Skip it when:

  • Simple single-purpose tasks
  • Minimal output generation
  • Latency is critical (sub-agent calls add round trips)

Repository

Full implementation: https://github.com/GaetanoPiazzolla/llm-architect-builder


Interested in seeing this in action? Check out BullSentiment.com for real-time stock sentiment analysis powered by sub-agent orchestration.