Skip to content

Commit 1d18a34

Browse files
Gregg CochranCopilot
andcommitted
feat: wave deployment to mitigate platform rate limits
Implements staggered wave deployment across all layers to prevent concentrated agent launch bursts that trigger platform rate limits. Wave strategy (Canary β†’ Probe β†’ Remainder): - Wave 1: 1 canary agent (existing pattern) - Wave 2: min(3, remaining) probe agents with 2s delay - Wave 3: all remaining agents with 2s delay - Health gate between waves: failure_rate < 0.50 AND rate_limited == 0 Failure classification system: - rate_limited, timeout, unparseable, scope_error, unknown - Rate-limit detection triggers 8s delay + 50% wave size reduction Scale-specific behavior: - SS-50/SS-100: Commanders launch in parallel (small count) - SS-250: Commanders in 2 waves (2+3), Squad Leads add 0-2s jitter Files changed: - config.yml: wave_deployment configuration block - protocols/circuit-breaker.md: failure classification + wave protocol - skills/swarm-command/SKILL.md: Phase 3 wave deployment, circuit breaker - .github/skills/swarm-command/SKILL.md: mirror changes - templates/commander.md: 3-wave spawning instructions - templates/squad-lead.md: SS-250 jitter - docs/architecture.md: updated wall-clock targets - docs/scaling.md: wave deployment description Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent d0c60d1 commit 1d18a34

9 files changed

Lines changed: 408 additions & 31 deletions

File tree

β€Ž.github/skills/swarm-command/SKILL.mdβ€Ž

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,25 @@ For each domain, construct a Context Capsule (max 2048 tokens):
228228

229229
> **Naming**: Swarm Command is the skill name. SwarmSpeed is the internal execution protocol. Templates use SwarmSpeed role titles (e.g., "SwarmSpeed Commander") as the protocol identity agents operate under.
230230
231-
Launch Commanders in PARALLEL using the `task` tool:
231+
Launch Commanders using **wave deployment** to prevent concentrated request bursts:
232+
233+
### Wave Deployment Strategy
234+
235+
**SS-50 / SS-100:** Launch all Commanders in PARALLEL (2-5 commanders β€” small enough to avoid rate limits).
236+
237+
**SS-250:** Deploy Commanders in two waves:
238+
- **Wave 1:** Launch 2 Commanders (highest-priority domains: ARCH + IMPL)
239+
- **Gate check:** Verify both launched successfully with no immediate rate-limit errors
240+
- **Wave 2:** Launch remaining 3 Commanders (TEST + DOCS + INTG)
241+
242+
### Commander Children β€” Wave Rules
243+
244+
Each Commander deploys its children (Squad Leads or Workers) in waves:
245+
1. **Wave 1 (Canary):** 1 agent β€” validate task feasibility
246+
2. **Wave 2 (Probe):** min(3, remaining) agents β€” test for rate limits and bulk feasibility. Wait 2s after canary. Gate: `failure_rate < 0.50 AND rate_limited_count == 0`
247+
3. **Wave 3 (Remainder):** All remaining agents β€” full deployment. Wait 2s after probe.
248+
249+
If any child reports `failure_class: rate_limited` in Wave 2, extend delay to 8s and reduce Wave 3 size by 50%.
232250

233251
### Scale-Specific Deployment
234252

@@ -280,19 +298,19 @@ Each Commander prompt MUST include:
280298
- **SS-50/SS-100**: "Set depth_config.current_depth = 2, max_depth = 2, can_launch = false for Workers."
281299
- "Include in every worker prompt: DO NOT use the task tool. You are a LEAF NODE."
282300

283-
4. **Canary requirement**: "Deploy 1 canary worker before full pod deployment."
301+
4. **Wave deployment**: "Deploy children in waves: Wave 1 = canary (1 agent), Wave 2 = probe (min 3, remaining), Wave 3 = rest. Gate between waves: proceed only if failure_rate < 0.50 AND rate_limited_count == 0."
284302

285303
5. **Output format**: Strict JSON Bundle schema with bundle_id, domain, status, summary, atoms_merged, conflicts, content, confidence, wall_clock_s.
286304

287-
6. **Circuit breaker**: "If more than 50% of squad leads fail, STOP and report failure."
305+
6. **Circuit breaker**: "If more than 50% of squad leads fail, STOP and report failure. Classify failures as: rate_limited, timeout, unparseable, scope_error, or unknown."
288306

289307
### Squad Lead Instructions (embedded in Commander prompt)
290308

291309
Each Commander must instruct its Squad Leads to:
292310

293311
1. **Decompose** into 5 atomic sub-tasks (one per worker)
294312
2. **Deploy canary** β€” 1 explore agent first
295-
3. **If canary succeeds** β€” Launch 4 more workers in parallel
313+
3. **If canary succeeds** β€” Add random 0-2s jitter (SS-250 only), then launch remaining workers in parallel
296314
4. **If canary fails** β€” Retry once with simplified prompt, then report failure
297315
5. **Collect** 5 Result Atoms
298316
6. **Merge** β€” Group by sub-task, classify CONSENSUS/MAJORITY/CONFLICT
@@ -655,6 +673,20 @@ Structure the final output as:
655673

656674
# CIRCUIT BREAKER RULES (applies to ALL phases)
657675

676+
### Failure Classification
677+
678+
Classify every agent failure before applying recovery logic:
679+
680+
| Class | Detection |
681+
|---|---|
682+
| `rate_limited` | Output contains "429", "rate limit", "too many requests", "secondary rate limit", or "abuse detection" |
683+
| `timeout` | Agent did not respond within its allocated timeout |
684+
| `unparseable` | Agent returned output that is not valid JSON |
685+
| `scope_error` | Agent returned `status: "failed"` with out-of-scope errors |
686+
| `unknown` | Any other failure |
687+
688+
**Rate-limit failures get special treatment**: any `rate_limited` failure immediately extends inter-wave delays to 8s and reduces next wave size by 50%.
689+
658690
### Circuit Breaker States
659691
- **CLOSED** (normal): All agents launching, monitoring failure rate
660692
- **OPEN** (broken): No new agent spawns, synthesize partial results, wait for cooldown
@@ -737,11 +769,12 @@ Apply these 7 critical optimizations:
737769

738770
1. **Pipeline overlap** β€” Start reviewers as soon as first 2 Commanders return (don't wait for all 5)
739771
2. **Canary pre-flight** β€” 1 canary worker per pod before full deployment
740-
3. **Parallel squad launch** β€” All Squad Leads per Commander launch simultaneously
772+
3. **Wave deployment** β€” Deploy children in waves (Canary β†’ Probe β†’ Remainder) with health gates between waves to avoid platform rate limits
741773
4. **Micro-brief compression** β€” 128-token worker prompts for fast processing
742774
5. **Haiku/Mini for workers** β€” Cheapest/fastest models at leaf level
743775
6. **Timeout cascade** β€” Nexus: 90s, Commander: 60s, Squad Lead: 40s, Worker: 30s
744776
7. **Content-hash dedup** β€” Identical results merged automatically
777+
8. **SS-250 jitter** β€” Squad Leads add random 0-2s delay before pod launch to prevent synchronized secondary bursts
745778

746779
---
747780

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# 🐝 Swarm Command UX Redesign β€” SS-250 Report
2+
3+
**Generated**: 2026-04-12
4+
**Scale**: SS-250 (~316 agents, 10 models)
5+
**Consensus**: CONSENSUS (0.88)
6+
**Shadow Verdict**: 🟒 Minor (median 10%)
7+
8+
---
9+
10+
## πŸ“Š Summary Pulse
11+
12+
| Metric | Value |
13+
|---|---|
14+
| Domains completed | 5/5 |
15+
| Overall consensus | **CONSENSUS** (0.88) |
16+
| Overall confidence | 0.92 |
17+
| Agents deployed | ~316 |
18+
| Models used | 10 distinct |
19+
| Reviews completed | 5 cross-family pairs |
20+
| Shadow verdict | 🟒 **Minor** (median 10%) |
21+
| Wall-clock time | ~380s |
22+
| Circuit breaker | CLOSED (0 failures) |
23+
24+
---
25+
26+
## πŸ” Nexus Final Insight
27+
28+
> All 5 commanders and all 5 cross-reviewers achieved CONSENSUS tier β€” the
29+
> strongest possible signal. Three different model families (Claude Opus, Claude
30+
> Sonnet, GPT-5.x) independently converged on the same UX architecture: a
31+
> 12-state machine with phase check-ins, orchestrator insights, and a 5-option
32+
> post-report action menu. CMD-IMPL went beyond design β€” it wrote the changes
33+
> directly into SKILL.md.
34+
35+
---
36+
37+
## πŸ›°οΈ What the Swarm Did
38+
39+
The swarm tackled the Swarm Command UX redesign across 5 domains:
40+
41+
- **CMD-ARCH** (claude-opus-4.6) — Designed a complete 12-state UX state machine with transitions from DORMANT through COMPLETE, a phase check-in architecture with information hierarchy (Essential→Important→Insightful→Delightful), and a data threading model showing how information flows from phase to phase into the final report.
42+
43+
- **CMD-IMPL** (gpt-5.4) β€” **Directly modified the SKILL.md** (750β†’1071 lines). Added distinct phase banners for all 9 phases, embedded Nexus Insight blocks at every phase, rebuilt the Phase 8 final report with findings/agreements/disagreements/model roster sections, added a post-report `ask_user` action menu with confirmations, and created an Orchestrator Insight Generator section.
44+
45+
- **CMD-TEST** (claude-sonnet-4.6) β€” Produced **87 discrete test cases** across 5 categories: sideline experience, final report integrity, action menu safety, edge/failure modes, and visual regression. Notably identified the minimum banner display time gap and concurrent swarm collision edge case.
46+
47+
- **CMD-DOCS** (gpt-5.2) β€” Delivered all user-facing copy: phase narrative subtitles, transitions, report section intros, action menu descriptions, **34 parameterized orchestrator insight templates**, and 3 mic-drop closing statements.
48+
49+
- **CMD-INTG** (claude-sonnet-4.5) β€” Designed the integration layer: model roster tracking with SQL persistence, cross-phase data threading pipeline, smart merge analysis with topological sort for safe execution order, and agreement/disagreement detection algorithms.
50+
51+
---
52+
53+
## πŸ”¬ What the Swarm Found
54+
55+
### Key Discoveries
56+
57+
1. **The sideline experience needs a state machine, not just banners.** CMD-ARCH designed a 12-state model (DORMANT β†’ PRE_LAUNCH β†’ DEPLOYING β†’ PHASE_ACTIVE β†’ PHASE_CHECK_IN β†’ SYNTHESIS β†’ REPORT_RENDERING β†’ REPORT_DISPLAY β†’ ACTION_MENU β†’ ACTION_CONFIRM β†’ ACTION_EXECUTE β†’ COMPLETE) that makes transitions explicit and predictable.
58+
59+
2. **Orchestrator insights must be signal-driven, not canned.** All 5 commanders agreed that generic "processing..." messages kill engagement. The Nexus should surface real observations: convergence patterns, speed differences, disagreements emerging, shadow scoring catches. CMD-DOCS produced 34 ready-to-use templates with variable slots.
60+
61+
3. **The final report needs 7 distinct sections.** Not just a merged output β€” users want to see what was found, where there was agreement, where there was disagreement, which models contributed, and what gaps remain.
62+
63+
4. **Post-report actions must be opt-in with confirmation gates.** Every commander independently specified that NOTHING should auto-execute. The action menu should loop (users can take multiple actions), and destructive actions (deploy, merge) require a preview + explicit confirmation.
64+
65+
5. **Smart merge analysis is a killer feature.** CMD-INTG designed a topological sort algorithm that analyzes dependencies between recommended changes and shows a safe execution order β€” preventing conflicts before they happen.
66+
67+
6. **87 test cases validate the design is robust.** CMD-TEST covered everything from "all commanders fail" to "terminal width 80 cols" to "user cancels mid-deploy."
68+
69+
---
70+
71+
## 🀝 Where the Swarm Agreed
72+
73+
All 5 commanders converged independently on these points:
74+
75+
| # | Agreement | Confidence | Evidence |
76+
|---|-----------|------------|----------|
77+
| 1 | **5-option action menu**: Deploy, Smart Merge, Deep Dive, Export, Done | 0.95 | All 5 bundles, all 5 reviews |
78+
| 2 | **Zero auto-execute**: Nothing happens without explicit user action | 0.98 | Universal β€” every bundle's #1 safety invariant |
79+
| 3 | **Nexus Insight at every phase**: Dynamic, signal-driven, not canned | 0.92 | 4/5 bundles explicitly, 1 implicit |
80+
| 4 | **Findings/Agreements/Disagreements structure** for final report | 0.94 | All 5 bundles specify these sections |
81+
| 5 | **Model roster** as a dedicated report section | 0.91 | All 5 bundles include it |
82+
| 6 | **Cross-family model diversity** is essential for credibility | 0.93 | ARCH, IMPL, INTG explicitly; TEST validates |
83+
| 7 | **Circuit breaker β†’ graceful degradation** with partial results | 0.90 | ARCH, IMPL, TEST, INTG |
84+
| 8 | **Shadow scoring as quality gate** integrated into report | 0.89 | All 5 bundles reference shadow scoring |
85+
86+
---
87+
88+
## βš”οΈ Where the Swarm Diverged
89+
90+
| # | Topic | Split | Side A | Side B | Resolution |
91+
|---|-------|-------|--------|--------|------------|
92+
| 1 | **Action menu count** | 3:2 | ARCH + TEST + DOCS: 6 options (include "Re-run domain") | IMPL + INTG: 5 options (no re-run) | **Include re-run** β€” adds value at low cost |
93+
| 2 | **Report section granularity** | 2:3 | ARCH: 7 broad sections | INTG + IMPL + TEST: 12 granular sections | **Use 7 user-facing sections** with sub-sections for granularity |
94+
| 3 | **Insight type taxonomy** | 2:3 | ARCH: 8 signal types with trigger function | INTG: 4 categories (progress/optimization/issue/milestone) | **Merge**: ARCH's triggers + INTG's categories |
95+
| 4 | **State persistence** | 1:4 | INTG: SQLite for model roster tracking | Others: in-memory only | **In-memory default**, SQL export optional |
96+
| 5 | **Minimum banner display time** | 1:4 | TEST: enforce β‰₯1s display time | Others: no minimum specified | **Adopt TEST's recommendation** β€” prevents flash-and-vanish |
97+
98+
---
99+
100+
## 🧬 Model Roster
101+
102+
| Layer | Role | Model | Domain | Performance |
103+
|---|---|---|---|---|
104+
| L0 | **Nexus** | claude-opus-4.6-1m | Orchestrator | β€” |
105+
| L1 | Commander | claude-opus-4.6 | Architecture | βœ… 0.92 confidence, 378s |
106+
| L1 | Commander | gpt-5.4 | Implementation | βœ… 0.96 confidence, 310s |
107+
| L1 | Commander | claude-sonnet-4.6 | Testing | βœ… 0.91 confidence, 132s |
108+
| L1 | Commander | gpt-5.2 | Documentation | βœ… 0.92 confidence, 57s |
109+
| L1 | Commander | claude-sonnet-4.5 | Integration | βœ… 0.90 confidence, 289s |
110+
| L4 | Reviewer | claude-opus-4.5 | TESTΓ—DOCS | βœ… CONSENSUS 0.87 |
111+
| L4 | Reviewer | gpt-5.4 | ARCHΓ—IMPL | βœ… CONSENSUS 0.88 |
112+
| L4 | Reviewer | claude-opus-4.6-1m | ARCHΓ—INTG | βœ… CONSENSUS 0.89 |
113+
| L4 | Reviewer | claude-sonnet-4.6 | IMPLΓ—TEST | βœ… CONSENSUS 0.90 |
114+
| L4 | Reviewer | gpt-5.2 | DOCSΓ—INTG | βœ… CONSENSUS 0.86 |
115+
116+
**Model Family Distribution**: Claude (6) Β· GPT (4) Β· Cross-family review pairs: 5/5
117+
118+
### Agent Tally
119+
120+
| Layer | Role | Count |
121+
|---|---|---|
122+
| L0 | Nexus | 1 |
123+
| L1 | Commanders | 5 |
124+
| L1 | Reviewers | 5 |
125+
| L2-L3 | Squad Leads + Workers | ~305 (via commanders) |
126+
| **Total** | | **~316** |
127+
128+
---
129+
130+
## Cross-Review Results
131+
132+
| Review | Pair | Model | Tier | Score |
133+
|---|---|---|---|---|
134+
| REV-01 | TEST Γ— DOCS | claude-opus-4.5 | CONSENSUS | 0.87 |
135+
| REV-02 | ARCH Γ— IMPL | gpt-5.4 | CONSENSUS | 0.88 |
136+
| REV-03 | ARCH Γ— INTG | claude-opus-4.6-1m | CONSENSUS | 0.89 |
137+
| REV-04 | IMPL Γ— TEST | claude-sonnet-4.6 | CONSENSUS | 0.90 |
138+
| REV-05 | DOCS Γ— INTG | gpt-5.2 | CONSENSUS | 0.86 |
139+
140+
**Average**: 0.88 Β· **All pairs**: CONSENSUS tier
141+
142+
---
143+
144+
## πŸ›‘οΈ Shadow Score Notes
145+
146+
| Commander | Sealed | Passed | Failed | Shadow Score | Level |
147+
|---|---|---|---|---|---|
148+
| CMD-ARCH | 10 | 10 | 0 | 0.0% | βœ… Perfect |
149+
| CMD-IMPL | 10 | 9 | 1 | 10.0% | 🟒 Minor |
150+
| CMD-TEST | 10 | 10 | 0 | 0.0% | βœ… Perfect |
151+
| CMD-DOCS | 10 | 9 | 1 | 10.0% | 🟒 Minor (post-hardening) |
152+
| CMD-INTG | 10 | 9 | 1 | 10.0% | 🟒 Minor |
153+
154+
**Pattern**: Terminal width compatibility was the most commonly missed criterion β€” should be prioritized in implementation.
155+
156+
---
157+
158+
## πŸ“‹ Gaps & Follow-Ups
159+
160+
| # | Gap | Severity | Source |
161+
|---|-----|----------|--------|
162+
| 1 | No telemetry opt-out test coverage | Minor | CMD-TEST self-identified |
163+
| 2 | Terminal width compatibility not explicit in any implementation | Minor | Shadow scoring, 3 bundles |
164+
| 3 | TTY harness needed for automated visual regression testing | Minor | CMD-TEST noted limitation |
165+
166+
---
167+
168+
## πŸš€ Pending Actions
169+
170+
CMD-IMPL wrote changes directly to `SKILL.md` (750β†’1071 lines). Available actions:
171+
172+
1. **πŸš€ Deploy changes** β€” Review and apply the SKILL.md updates
173+
2. **πŸ”€ Smart merge analysis** β€” Analyze CMD-IMPL's changes vs. ARCH/DOCS/INTG recommendations for a unified SKILL.md
174+
3. **πŸ” Deep dive into a domain** β€” Explore one commander's full output
175+
4. **πŸ“„ Export full report** β€” βœ… Done (this file)
176+
5. **βœ… Done** β€” No action needed
177+
178+
> ⚠️ No changes will be deployed without your explicit confirmation.
179+
180+
---
181+
182+
## πŸ‘‘ Landing
183+
184+
> *"You didn't get one model's answer β€” you got a verified chorus of
185+
> independent minds, plus the dissent that keeps you honest."*
186+
187+
```
188+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
189+
🐝 The swarm has spoken. 316 agents. 10 models. One consensus.
190+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
191+
```

β€Žconfig.ymlβ€Ž

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,25 @@ swarm_command:
9292
rubber_stamp_delta_threshold: 0.05
9393
rubber_stamp_min_reviewers: 3
9494

95+
wave_deployment:
96+
enabled: true
97+
strategy: "canary-probe-rest"
98+
waves:
99+
wave_1: 1 # Canary (already exists)
100+
wave_2: 3 # Probe wave β€” max 3 children
101+
wave_3: "remainder" # All remaining children
102+
inter_wave_delay_s: 2 # Base delay between waves
103+
adaptive_delay:
104+
on_rate_limit: 8 # Extended delay if rate-limit detected
105+
jitter_range_s: [0, 2] # Random jitter added to squad-lead pod launches
106+
gate_check:
107+
proceed_if: "failure_rate < 0.50 AND rate_limited_count == 0"
108+
failure_classes: ["rate_limited", "timeout", "unparseable", "scope_error", "unknown"]
109+
scale_overrides:
110+
ss-50: { nexus_waves: false } # 2-3 commanders β€” launch in parallel
111+
ss-100: { nexus_waves: false } # 3-5 commanders β€” launch in parallel
112+
ss-250: { nexus_waves: true, nexus_wave_1: 2, nexus_wave_2: 3 }
113+
95114
circuit_breaker:
96115
commander_failure_threshold: 0.60
97116
worker_failure_threshold: 0.50

β€Ždocs/architecture.mdβ€Ž

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -262,15 +262,15 @@ For maximum insight diversity, models from different families are paired within
262262

263263
## Parallel Execution Design
264264

265-
The architecture is designed for concurrent execution at scale. Wall-clock time grows slower than agent count because the expensive work runs in parallel:
265+
The architecture is designed for concurrent execution at scale with **wave deployment** to respect platform rate limits. Wall-clock time grows slower than agent count because the expensive work runs in parallel, but launches are staggered in waves (Canary β†’ Probe β†’ Remainder) to avoid concentrated bursts:
266266

267267
```text
268268
Agents Wall-Clock Ratio vs SS-50
269269
50 ~30s 1.0Γ—
270-
100 ~42s 1.4Γ—
271-
250 ~65s 2.2Γ—
270+
100 ~45s 1.5Γ—
271+
250 ~70s 2.3Γ—
272272
```
273273

274-
These are design targets, not measured benchmarks. Actual performance depends on task decomposability and platform concurrency limits.
274+
These are design targets, not measured benchmarks. Actual performance depends on task decomposability and platform concurrency limits. Wave deployment adds ~4-6s per layer but prevents rate-limit failures that would cost more time in recovery.
275275

276-
The main serial bottlenecks are Nexus decomposition (~2s), canary verification (~3s), and final synthesis (~10s). Everything else overlaps via hierarchical fan-out.
276+
The main serial bottlenecks are Nexus decomposition (~2s), canary verification (~3s), wave gate checks (~2s per gate), and final synthesis (~10s). Everything else overlaps via hierarchical fan-out.

β€Ždocs/scaling.mdβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ Everything else overlaps via hierarchical fan-out and pipeline overlap.
206206
| 1 | Use `explore` / `task` workers | 60% cheaper | Worker types are significantly cheaper than `general-purpose` |
207207
| 2 | Haiku / Mini at L3 | 10Γ— cheaper | Cheapest models handle the most atomic work |
208208
| 3 | Micro-brief compression | ~15% savings | Smaller inputs reduce per-agent cost at scale |
209-
| 4 | Wave deployment | ~20% savings on failure | Catch bad tasks before all 250 workers launch |
209+
| 4 | Wave deployment | ~20% savings on failure | Canary β†’ Probe (max 3) β†’ Remainder β€” with health gates between waves. Catches rate limits and bad tasks before full deployment. |
210210
| 5 | Canary verification | ~5% savings on failure | One cheap canary prevents many expensive failures |
211211
| 6 | Timeout cascade | Cost protection | Stop slow work before it burns budget |
212212
| 7 | Cost ceiling | Absolute protection | $20 hard cap prevents runaway bills |

0 commit comments

Comments
Β (0)