Skip to content

Latest commit

Β 

History

History
277 lines (211 loc) Β· 13.3 KB

File metadata and controls

277 lines (211 loc) Β· 13.3 KB

Architecture

Swarm Command implements a 5-layer hierarchical multi-agent architecture derived from the SwarmSpeed 250 protocol. This document explains the system at two levels: a fast mental model first, then the full layer-by-layer breakdown.


30-Second Overview

If you only remember four things, remember these:

  1. Nexus decomposes the mission into domain-level work.
  2. Commanders own domains and turn them into smaller shards.
  3. Workers stay atomic β€” leaf nodes never spawn more agents.
  4. Review + Shadow Score decide quality before Nexus emits a final bundle.
Mission
  ↓
Nexus
  ↓
Commanders
  ↓
Squad Leads (SS-250 only)
  ↓
Workers
  ↓
Reviewers + Shadow Score
  ↓
Final synthesis

Note: At SS-50 and SS-100, the Squad Lead layer is skipped β€” Commanders spawn Workers directly (depth 2). The full 4-tier spawn chain (Nexus β†’ Commander β†’ Squad Lead β†’ Worker) only applies at SS-250.

Read this doc when: you want the system model. Jump to: architecture diagrams for visuals, consensus for merge mechanics, and scaling for deployment choices.


Layer Topology

                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    L0    β”‚     NEXUS (1)    β”‚  Model: claude-opus-4.6
                          β”‚  128K ctx budget β”‚  Type: general-purpose
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚                    β”‚                     β”‚
        β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
  L1    β”‚ CMD-A (1)  β”‚      β”‚ CMD-B (1)  β”‚  ...   β”‚ CMD-E (1)  β”‚  Γ— 5 Commanders
        β”‚ 64K ctx    β”‚      β”‚ 64K ctx    β”‚        β”‚ 64K ctx    β”‚  Type: general-purpose
        β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  Model: mixed
              β”‚                    β”‚                     β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚                     β”‚
     β”‚        β”‚        β”‚          β”‚                     β”‚
  β”Œβ”€β”€β”΄β”€β”€β” β”Œβ”€β”€β”΄β”€β”€β” β”Œβ”€β”€β”΄β”€β”€β”
  L2  β”‚SQ-1β”‚ β”‚SQ-2β”‚ ... β”‚SQ-10β”‚   Γ— 10 per Commander = 50 Squad Leads
      β”‚32K β”‚ β”‚32K β”‚     β”‚32K β”‚    Type: general-purpose (can_launch=true)
      β””β”€β”€β”¬β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”˜   β””β”€β”€β”¬β”€β”€β”˜  Model: claude-haiku-4.5
         β”‚        β”‚          β”‚
      β”Œβ”€β”€β”΄β”€β”€β” β”Œβ”€β”€β”΄β”€β”€β”   β”Œβ”€β”€β”΄β”€β”€β”
  L3  β”‚WΓ—5  β”‚ β”‚WΓ—5  β”‚   β”‚WΓ—5  β”‚  Γ— 5 per Squad Lead = 250 Workers
      β”‚ 8K  β”‚ β”‚ 8K  β”‚   β”‚ 8K  β”‚  Type: explore | task (LEAF β€” no spawning)
      β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”˜  Model: claude-haiku-4.5 | gpt-5.4-mini

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              L4    β”‚ REVIEWERSΓ—10 β”‚  Cross-review mesh
                    β”‚    16K ctx   β”‚  Type: general-purpose (can_launch=false)
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  Model: mixed (cross-family pairs)

              + SHADOW SCORING (Nexus-internal, sealed acceptance criteria, Shadow Score Spec L2)

Total agents for SS-250: ~316

Agent counts include all deployed agents across all layers: Nexus + Commanders + Squad Leads + Workers + Reviewers.


Why This Shape Works

Design choice What it buys you
Single Nexus at the top One decomposition authority and one final synthesis point
Domain-owning Commanders Parallel workstreams without losing task ownership
Squad Leads between Commanders and Workers Controlled fan-out and better task compression
Leaf-node Workers Strict depth control and predictable cost
Independent Reviewers Scoring from outside the execution path
Nexus-internal Shadow Score Hidden validation without revealing the acceptance rubric

Layer Descriptions

L0 β€” Nexus (1 agent)

Property Value
Agent type general-purpose
Model claude-opus-4.6
Context budget 128K tokens
can_launch true
Responsibilities Task decomposition, commander assignment, reviewer dispatch, sealed criteria generation (Phase 1.5), shadow score validation (Phase 6), final synthesis, circuit breaker authority
Spawns 5 Commanders + 10 Reviewers

The Nexus is the brain of the swarm. It receives the user's task, decomposes it into domains, creates Context Capsules for each Commander, monitors the swarm, and synthesizes the final output from bundles plus review scores.

L1 β€” Commanders (5 agents)

Property Value
Agent type general-purpose
Model Commander pool (9): claude-opus-4.6, claude-opus-4.5, claude-opus-4.6-1m, claude-sonnet-4.6, claude-sonnet-4.5, claude-sonnet-4, gpt-5.4, gpt-5.2, gpt-5.1
Context budget 64K tokens
can_launch true
Max children 10 Squad Leads each
Responsibilities Domain decomposition, squad lead dispatch, canary verification, result merging within domain

Domain assignments:

Commander Domain Focus
CMD-ARCH Architecture & Structure Patterns, interfaces, module boundaries
CMD-IMPL Implementation & Logic Core logic, algorithms, data flow
CMD-TEST Testing & Validation Test cases, edge cases, validation
CMD-DOCS Documentation & Examples Docs, comments, examples, guides
CMD-INTG Integration & Review Cross-cutting concerns, glue code, API contracts

L2 β€” Squad Leads (50 agents)

Property Value
Agent type general-purpose
Model claude-haiku-4.5 or gpt-5.4-mini (alternating)
Context budget 32K tokens
can_launch true
Max children 5 Workers each
Responsibilities Micro-task decomposition, canary deployment, worker dispatch, atom collection, local consensus

L3 β€” Workers (250 agents)

Property Value
Agent type explore or task
Model Worker pool (6): claude-haiku-4.5, gpt-5.4-mini, gpt-5-mini, gpt-4.1, gpt-5.3-codex, gpt-5.2-codex
Context budget 8K tokens
can_launch false β€” structurally enforced
Responsibilities Execute one atomic task, emit structured JSON atom

Pod composition per Squad Lead:

Role Count Agent Type Purpose
Canary 1 explore Pre-flight check before full pod
Scout 3 explore Research, search, read files
Executor 1 task Run commands, build, test

L4 β€” Cross-Reviewers (10 agents)

Property Value
Agent type general-purpose
Model Mixed cross-family pairs
Context budget 16K tokens
can_launch false
Responsibilities Cross-domain scoring, conflict detection, consensus voting

Shadow Scoring (Shadow Score Spec L2)

Shadow scoring is Nexus-internal β€” no separate validator agents are spawned. The Nexus generates sealed acceptance criteria in Phase 1.5 and validates commander outputs against them in Phase 6.

Property Value
Implementation Nexus-internal sealed-envelope protocol
Criteria 10 binary pass/fail acceptance criteria
Formula Shadow Score = (failures / total) Γ— 100
Hardening 1 cycle if score > 15%
Conformance Shadow Score Spec L2

Time-Flow Architecture

T+0s     T+2s       T+5s         T+12s       T+45s      T+65s    T+80s   T+90s
  β”‚        β”‚          β”‚             β”‚           β”‚          β”‚        β”‚       β”‚
  β–Ό        β–Ό          β–Ό             β–Ό           β–Ό          β–Ό        β–Ό       β–Ό
β”Œβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”
β”‚NEXUSβ”‚β†’ β”‚CMDs  β”‚β†’ β”‚SQUAD    β”‚β†’ β”‚WORKERS   β”‚ β”‚REVIEW  β”‚ β”‚MERGE  β”‚ β”‚VOTEβ”‚ β”‚EMITβ”‚
β”‚BOOT β”‚  β”‚SPAWN β”‚  β”‚LEADS    β”‚  β”‚EXECUTE   β”‚ β”‚MESH    β”‚ β”‚RESULTSβ”‚ β”‚    β”‚ β”‚    β”‚
β”‚     β”‚  β”‚      β”‚  β”‚+ CANARY β”‚  β”‚(parallel)β”‚ β”‚(overlapβ”‚ β”‚       β”‚ β”‚    β”‚ β”‚    β”‚
β”‚     β”‚  β”‚      β”‚  β”‚VERIFY   β”‚  β”‚          β”‚ β”‚start)  β”‚ β”‚       β”‚ β”‚    β”‚ β”‚    β”‚
β””β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜
  2s       3s         7s           33s          20s        15s      10s    5s

  ◄──── LAUNCH PHASE ────►◄── EXECUTION ──►◄──── CONVERGENCE PHASE ────────►
         (~12s)               (~33s)                  (~45s)

Key insight: pipeline overlap. Reviewers start before every worker is done. The review mesh begins as soon as the first commander pair completes, which removes review time from the critical path.


Signal Flow β€” Token Compression

           CONTEXT DOWN (shrinking)              RESULTS UP (compressing)
           ========================              ========================

  L0  Full Task Brief    ─── 4K tokens ───►  Final Report     ◄── 4K tokens
                 β”‚                                    β–²
  L1  Context Capsule    ─── 2K tokens ───►  Bundle           ◄── 1K tokens
                 β”‚                                    β–²
  L2  Shard              ─── 512 tokens ──►  Atom Set         ◄── 512 tokens
                 β”‚                                    β–²
  L3  Micro-Brief        ─── 128 tokens ──►  Atom             ◄── 256 tokens
                 β”‚                                    β–²
  L4  Review Capsule     ─── 1K tokens ───►  Score Card       ◄── 512 tokens

Compression ratio: 1024:1 β€” from 128K tokens at Nexus down to 128 tokens at Worker level.

Compression Rules (Context Down)

  1. Strip rationale at each layer β€” children need the task, not the history
  2. File scope narrows monotonically β€” a child scope is always a subset of its parent
  3. Constraints tighten monotonically β€” timeouts and token caps can only decrease
  4. Parent context stays short β€” at most ~50 tokens of β€œwhy this matters”

Aggregation Rules (Results Up)

  1. Conflicts bubble up β€” disagreements survive until a higher layer resolves them
  2. Confidence is geometric mean β€” (c₁ Γ— cβ‚‚ Γ— ... Γ— cβ‚™)^(1/n)
  3. Failed atoms are replaced β€” If a worker fails, the Squad Lead may re-launch ONE replacement (using its own retry budget of 1). Workers have retry budget = 0.
  4. Deduplication is content-hash based β€” identical atoms merge and confidence rises

Cross-Model Pairing Matrix

For maximum insight diversity, models from different families are paired within the same pod:

Pod Role Primary Model Alternate Model Why alternate
Commander claude-opus-4.6 gpt-5.4, gpt-5.2, gpt-5.1 Reduce same-family blind spots
Squad Lead claude-haiku-4.5 gpt-5.4-mini Keep fan-out cheap while mixing reasoning styles
Scout Worker claude-haiku-4.5 gpt-5.4-mini, gpt-5-mini, gpt-4.1 Increase search and interpretation diversity
Executor Worker gpt-5.3-codex gpt-5.2-codex Prefer code execution specialists for build/test
Reviewer 7 cross-family pairs β€” Final scoring should not be self-referential

Design Principles

  1. Parent-controlled spawning β€” children never decide whether they can launch descendants
  2. Signal compression at every layer β€” context shrinks going down, results compress going up
  3. Canary-before-swarm β€” deploy one canary worker before the whole pod
  4. Fail parsably β€” structured outputs, structured failures, no silent collapse
  5. Pipeline overlap β€” review starts before total execution finishes

Parallel Execution Design

The architecture is designed for concurrent execution at scale with wave deployment to respect platform rate limits. Wall-clock time grows slower than agent count because the expensive work runs in parallel, but launches are staggered in waves (Canary β†’ Probe β†’ Remainder) to avoid concentrated bursts:

Agents     Wall-Clock     Ratio vs SS-50
  50         ~30s           1.0Γ—
 100         ~45s           1.5Γ—
 250         ~70s           2.3Γ—

These are design targets, not measured benchmarks. Actual performance depends on task decomposability and platform concurrency limits. Wave deployment adds ~4-6s per layer but prevents rate-limit failures that would cost more time in recovery.

The main serial bottlenecks are Nexus decomposition (~2s), canary verification (~3s), wave gate checks (~2s per gate), and final synthesis (~10s). Everything else overlaps via hierarchical fan-out.