DUBSOpenHub
diff --git a/‎.github/skills/swarm-command/SKILL.md‎
Lines changed: 38 additions & 5 deletions b/‎.github/skills/swarm-command/SKILL.md‎
Lines changed: 38 additions & 5 deletions
diff --git a/‎SWARM-250-UX-REDESIGN-REPORT.md‎
Lines changed: 191 additions & 0 deletions b/‎SWARM-250-UX-REDESIGN-REPORT.md‎
Lines changed: 191 additions & 0 deletions
diff --git a/‎config.yml‎
Lines changed: 19 additions & 0 deletions b/‎config.yml‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎docs/architecture.md‎
Lines changed: 5 additions & 5 deletions b/‎docs/architecture.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/scaling.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/scaling.md‎
Lines changed: 1 addition & 1 deletion
@@ -228,7 +228,25 @@ For each domain, construct a Context Capsule (max 2048 tokens):
 
 > **Naming**: Swarm Command is the skill name. SwarmSpeed is the internal execution protocol. Templates use SwarmSpeed role titles (e.g., "SwarmSpeed Commander") as the protocol identity agents operate under.
 
-Launch Commanders in PARALLEL using the `task` tool:
+Launch Commanders using **wave deployment** to prevent concentrated request bursts:
+
+### Wave Deployment Strategy
+
+**SS-50 / SS-100:** Launch all Commanders in PARALLEL (2-5 commanders — small enough to avoid rate limits).
+
+**SS-250:** Deploy Commanders in two waves:
+- **Wave 1:** Launch 2 Commanders (highest-priority domains: ARCH + IMPL)
+- **Gate check:** Verify both launched successfully with no immediate rate-limit errors
+- **Wave 2:** Launch remaining 3 Commanders (TEST + DOCS + INTG)
+
+### Commander Children — Wave Rules
+
+Each Commander deploys its children (Squad Leads or Workers) in waves:
+1. **Wave 1 (Canary):** 1 agent — validate task feasibility
+2. **Wave 2 (Probe):** min(3, remaining) agents — test for rate limits and bulk feasibility. Wait 2s after canary. Gate: `failure_rate < 0.50 AND rate_limited_count == 0`
+3. **Wave 3 (Remainder):** All remaining agents — full deployment. Wait 2s after probe.
+
+If any child reports `failure_class: rate_limited` in Wave 2, extend delay to 8s and reduce Wave 3 size by 50%.
 
 ### Scale-Specific Deployment
 
@@ -280,19 +298,19 @@ Each Commander prompt MUST include:
    - **SS-50/SS-100**: "Set depth_config.current_depth = 2, max_depth = 2, can_launch = false for Workers."
    - "Include in every worker prompt: DO NOT use the task tool. You are a LEAF NODE."
 
-4. **Canary requirement**: "Deploy 1 canary worker before full pod deployment."
+4. **Wave deployment**: "Deploy children in waves: Wave 1 = canary (1 agent), Wave 2 = probe (min 3, remaining), Wave 3 = rest. Gate between waves: proceed only if failure_rate < 0.50 AND rate_limited_count == 0."
 
 5. **Output format**: Strict JSON Bundle schema with bundle_id, domain, status, summary, atoms_merged, conflicts, content, confidence, wall_clock_s.
 
-6. **Circuit breaker**: "If more than 50% of squad leads fail, STOP and report failure."
+6. **Circuit breaker**: "If more than 50% of squad leads fail, STOP and report failure. Classify failures as: rate_limited, timeout, unparseable, scope_error, or unknown."
 
 ### Squad Lead Instructions (embedded in Commander prompt)
 
 Each Commander must instruct its Squad Leads to:
 
 1. **Decompose** into 5 atomic sub-tasks (one per worker)
 2. **Deploy canary** — 1 explore agent first
-3. **If canary succeeds** — Launch 4 more workers in parallel
+3. **If canary succeeds** — Add random 0-2s jitter (SS-250 only), then launch remaining workers in parallel
 4. **If canary fails** — Retry once with simplified prompt, then report failure
 5. **Collect** 5 Result Atoms
 6. **Merge** — Group by sub-task, classify CONSENSUS/MAJORITY/CONFLICT
@@ -655,6 +673,20 @@ Structure the final output as:
 
 # CIRCUIT BREAKER RULES (applies to ALL phases)
 
+### Failure Classification
+
+Classify every agent failure before applying recovery logic:
+
+| Class | Detection |
+|---|---|
+| `rate_limited` | Output contains "429", "rate limit", "too many requests", "secondary rate limit", or "abuse detection" |
+| `timeout` | Agent did not respond within its allocated timeout |
+| `unparseable` | Agent returned output that is not valid JSON |
+| `scope_error` | Agent returned `status: "failed"` with out-of-scope errors |
+| `unknown` | Any other failure |
+
+**Rate-limit failures get special treatment**: any `rate_limited` failure immediately extends inter-wave delays to 8s and reduces next wave size by 50%.
+
 ### Circuit Breaker States
 - **CLOSED** (normal): All agents launching, monitoring failure rate
 - **OPEN** (broken): No new agent spawns, synthesize partial results, wait for cooldown
@@ -737,11 +769,12 @@ Apply these 7 critical optimizations:
 
 1. **Pipeline overlap** — Start reviewers as soon as first 2 Commanders return (don't wait for all 5)
 2. **Canary pre-flight** — 1 canary worker per pod before full deployment
-3. **Parallel squad launch** — All Squad Leads per Commander launch simultaneously
+3. **Wave deployment** — Deploy children in waves (Canary → Probe → Remainder) with health gates between waves to avoid platform rate limits
 4. **Micro-brief compression** — 128-token worker prompts for fast processing
 5. **Haiku/Mini for workers** — Cheapest/fastest models at leaf level
 6. **Timeout cascade** — Nexus: 90s, Commander: 60s, Squad Lead: 40s, Worker: 30s
 7. **Content-hash dedup** — Identical results merged automatically
+8. **SS-250 jitter** — Squad Leads add random 0-2s delay before pod launch to prevent synchronized secondary bursts
 
 ---
 
 
@@ -0,0 +1,191 @@
+# 🐝 Swarm Command UX Redesign — SS-250 Report
+
+**Generated**: 2026-04-12
+**Scale**: SS-250 (~316 agents, 10 models)
+**Consensus**: CONSENSUS (0.88)
+**Shadow Verdict**: 🟢 Minor (median 10%)
+
+---
+
+## 📊 Summary Pulse
+
+| Metric | Value |
+|---|---|
+| Domains completed | 5/5 |
+| Overall consensus | **CONSENSUS** (0.88) |
+| Overall confidence | 0.92 |
+| Agents deployed | ~316 |
+| Models used | 10 distinct |
+| Reviews completed | 5 cross-family pairs |
+| Shadow verdict | 🟢 **Minor** (median 10%) |
+| Wall-clock time | ~380s |
+| Circuit breaker | CLOSED (0 failures) |
+
+---
+
+## 🔍 Nexus Final Insight
+
+> All 5 commanders and all 5 cross-reviewers achieved CONSENSUS tier — the
+> strongest possible signal. Three different model families (Claude Opus, Claude
+> Sonnet, GPT-5.x) independently converged on the same UX architecture: a
+> 12-state machine with phase check-ins, orchestrator insights, and a 5-option
+> post-report action menu. CMD-IMPL went beyond design — it wrote the changes
+> directly into SKILL.md.
+
+---
+
+## 🛰️ What the Swarm Did
+
+The swarm tackled the Swarm Command UX redesign across 5 domains:
+
+- **CMD-ARCH** (claude-opus-4.6) — Designed a complete 12-state UX state machine with transitions from DORMANT through COMPLETE, a phase check-in architecture with information hierarchy (Essential→Important→Insightful→Delightful), and a data threading model showing how information flows from phase to phase into the final report.
+
+- **CMD-IMPL** (gpt-5.4) — **Directly modified the SKILL.md** (750→1071 lines). Added distinct phase banners for all 9 phases, embedded Nexus Insight blocks at every phase, rebuilt the Phase 8 final report with findings/agreements/disagreements/model roster sections, added a post-report `ask_user` action menu with confirmations, and created an Orchestrator Insight Generator section.
+
+- **CMD-TEST** (claude-sonnet-4.6) — Produced **87 discrete test cases** across 5 categories: sideline experience, final report integrity, action menu safety, edge/failure modes, and visual regression. Notably identified the minimum banner display time gap and concurrent swarm collision edge case.
+
+- **CMD-DOCS** (gpt-5.2) — Delivered all user-facing copy: phase narrative subtitles, transitions, report section intros, action menu descriptions, **34 parameterized orchestrator insight templates**, and 3 mic-drop closing statements.
+
+- **CMD-INTG** (claude-sonnet-4.5) — Designed the integration layer: model roster tracking with SQL persistence, cross-phase data threading pipeline, smart merge analysis with topological sort for safe execution order, and agreement/disagreement detection algorithms.
+
+---
+
+## 🔬 What the Swarm Found
+
+### Key Discoveries
+
+1. **The sideline experience needs a state machine, not just banners.** CMD-ARCH designed a 12-state model (DORMANT → PRE_LAUNCH → DEPLOYING → PHASE_ACTIVE → PHASE_CHECK_IN → SYNTHESIS → REPORT_RENDERING → REPORT_DISPLAY → ACTION_MENU → ACTION_CONFIRM → ACTION_EXECUTE → COMPLETE) that makes transitions explicit and predictable.
+
+2. **Orchestrator insights must be signal-driven, not canned.** All 5 commanders agreed that generic "processing..." messages kill engagement. The Nexus should surface real observations: convergence patterns, speed differences, disagreements emerging, shadow scoring catches. CMD-DOCS produced 34 ready-to-use templates with variable slots.
+
+3. **The final report needs 7 distinct sections.** Not just a merged output — users want to see what was found, where there was agreement, where there was disagreement, which models contributed, and what gaps remain.
+
+4. **Post-report actions must be opt-in with confirmation gates.** Every commander independently specified that NOTHING should auto-execute. The action menu should loop (users can take multiple actions), and destructive actions (deploy, merge) require a preview + explicit confirmation.
+
+5. **Smart merge analysis is a killer feature.** CMD-INTG designed a topological sort algorithm that analyzes dependencies between recommended changes and shows a safe execution order — preventing conflicts before they happen.
+
+6. **87 test cases validate the design is robust.** CMD-TEST covered everything from "all commanders fail" to "terminal width 80 cols" to "user cancels mid-deploy."
+
+---
+
+## 🤝 Where the Swarm Agreed
+
+All 5 commanders converged independently on these points:
+
+| # | Agreement | Confidence | Evidence |
+|---|-----------|------------|----------|
+| 1 | **5-option action menu**: Deploy, Smart Merge, Deep Dive, Export, Done | 0.95 | All 5 bundles, all 5 reviews |
+| 2 | **Zero auto-execute**: Nothing happens without explicit user action | 0.98 | Universal — every bundle's #1 safety invariant |
+| 3 | **Nexus Insight at every phase**: Dynamic, signal-driven, not canned | 0.92 | 4/5 bundles explicitly, 1 implicit |
+| 4 | **Findings/Agreements/Disagreements structure** for final report | 0.94 | All 5 bundles specify these sections |
+| 5 | **Model roster** as a dedicated report section | 0.91 | All 5 bundles include it |
+| 6 | **Cross-family model diversity** is essential for credibility | 0.93 | ARCH, IMPL, INTG explicitly; TEST validates |
+| 7 | **Circuit breaker → graceful degradation** with partial results | 0.90 | ARCH, IMPL, TEST, INTG |
+| 8 | **Shadow scoring as quality gate** integrated into report | 0.89 | All 5 bundles reference shadow scoring |
+
+---
+
+## ⚔️ Where the Swarm Diverged
+
+| # | Topic | Split | Side A | Side B | Resolution |
+|---|-------|-------|--------|--------|------------|
+| 1 | **Action menu count** | 3:2 | ARCH + TEST + DOCS: 6 options (include "Re-run domain") | IMPL + INTG: 5 options (no re-run) | **Include re-run** — adds value at low cost |
+| 2 | **Report section granularity** | 2:3 | ARCH: 7 broad sections | INTG + IMPL + TEST: 12 granular sections | **Use 7 user-facing sections** with sub-sections for granularity |
+| 3 | **Insight type taxonomy** | 2:3 | ARCH: 8 signal types with trigger function | INTG: 4 categories (progress/optimization/issue/milestone) | **Merge**: ARCH's triggers + INTG's categories |
+| 4 | **State persistence** | 1:4 | INTG: SQLite for model roster tracking | Others: in-memory only | **In-memory default**, SQL export optional |
+| 5 | **Minimum banner display time** | 1:4 | TEST: enforce ≥1s display time | Others: no minimum specified | **Adopt TEST's recommendation** — prevents flash-and-vanish |
+
+---
+
+## 🧬 Model Roster
+
+| Layer | Role | Model | Domain | Performance |
+|---|---|---|---|---|
+| L0 | **Nexus** | claude-opus-4.6-1m | Orchestrator | — |
+| L1 | Commander | claude-opus-4.6 | Architecture | ✅ 0.92 confidence, 378s |
+| L1 | Commander | gpt-5.4 | Implementation | ✅ 0.96 confidence, 310s |
+| L1 | Commander | claude-sonnet-4.6 | Testing | ✅ 0.91 confidence, 132s |
+| L1 | Commander | gpt-5.2 | Documentation | ✅ 0.92 confidence, 57s |
+| L1 | Commander | claude-sonnet-4.5 | Integration | ✅ 0.90 confidence, 289s |
+| L4 | Reviewer | claude-opus-4.5 | TEST×DOCS | ✅ CONSENSUS 0.87 |
+| L4 | Reviewer | gpt-5.4 | ARCH×IMPL | ✅ CONSENSUS 0.88 |
+| L4 | Reviewer | claude-opus-4.6-1m | ARCH×INTG | ✅ CONSENSUS 0.89 |
+| L4 | Reviewer | claude-sonnet-4.6 | IMPL×TEST | ✅ CONSENSUS 0.90 |
+| L4 | Reviewer | gpt-5.2 | DOCS×INTG | ✅ CONSENSUS 0.86 |
+
+**Model Family Distribution**: Claude (6) · GPT (4) · Cross-family review pairs: 5/5
+
+### Agent Tally
+
+| Layer | Role | Count |
+|---|---|---|
+| L0 | Nexus | 1 |
+| L1 | Commanders | 5 |
+| L1 | Reviewers | 5 |
+| L2-L3 | Squad Leads + Workers | ~305 (via commanders) |
+| **Total** | | **~316** |
+
+---
+
+## Cross-Review Results
+
+| Review | Pair | Model | Tier | Score |
+|---|---|---|---|---|
+| REV-01 | TEST × DOCS | claude-opus-4.5 | CONSENSUS | 0.87 |
+| REV-02 | ARCH × IMPL | gpt-5.4 | CONSENSUS | 0.88 |
+| REV-03 | ARCH × INTG | claude-opus-4.6-1m | CONSENSUS | 0.89 |
+| REV-04 | IMPL × TEST | claude-sonnet-4.6 | CONSENSUS | 0.90 |
+| REV-05 | DOCS × INTG | gpt-5.2 | CONSENSUS | 0.86 |
+
+**Average**: 0.88 · **All pairs**: CONSENSUS tier
+
+---
+
+## 🛡️ Shadow Score Notes
+
+| Commander | Sealed | Passed | Failed | Shadow Score | Level |
+|---|---|---|---|---|---|
+| CMD-ARCH | 10 | 10 | 0 | 0.0% | ✅ Perfect |
+| CMD-IMPL | 10 | 9 | 1 | 10.0% | 🟢 Minor |
+| CMD-TEST | 10 | 10 | 0 | 0.0% | ✅ Perfect |
+| CMD-DOCS | 10 | 9 | 1 | 10.0% | 🟢 Minor (post-hardening) |
+| CMD-INTG | 10 | 9 | 1 | 10.0% | 🟢 Minor |
+
+**Pattern**: Terminal width compatibility was the most commonly missed criterion — should be prioritized in implementation.
+
+---
+
+## 📋 Gaps & Follow-Ups
+
+| # | Gap | Severity | Source |
+|---|-----|----------|--------|
+| 1 | No telemetry opt-out test coverage | Minor | CMD-TEST self-identified |
+| 2 | Terminal width compatibility not explicit in any implementation | Minor | Shadow scoring, 3 bundles |
+| 3 | TTY harness needed for automated visual regression testing | Minor | CMD-TEST noted limitation |
+
+---
+
+## 🚀 Pending Actions
+
+CMD-IMPL wrote changes directly to `SKILL.md` (750→1071 lines). Available actions:
+
+1. **🚀 Deploy changes** — Review and apply the SKILL.md updates
+2. **🔀 Smart merge analysis** — Analyze CMD-IMPL's changes vs. ARCH/DOCS/INTG recommendations for a unified SKILL.md
+3. **🔍 Deep dive into a domain** — Explore one commander's full output
+4. **📄 Export full report** — ✅ Done (this file)
+5. **✅ Done** — No action needed
+
+> ⚠️ No changes will be deployed without your explicit confirmation.
+
+---
+
+## 👑 Landing
+
+> *"You didn't get one model's answer — you got a verified chorus of
+> independent minds, plus the dissent that keeps you honest."*
+
+```
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+🐝 The swarm has spoken. 316 agents. 10 models. One consensus.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
@@ -92,6 +92,25 @@ swarm_command:
     rubber_stamp_delta_threshold: 0.05
     rubber_stamp_min_reviewers: 3
 
+  wave_deployment:
+    enabled: true
+    strategy: "canary-probe-rest"
+    waves:
+      wave_1: 1               # Canary (already exists)
+      wave_2: 3               # Probe wave — max 3 children
+      wave_3: "remainder"     # All remaining children
+    inter_wave_delay_s: 2     # Base delay between waves
+    adaptive_delay:
+      on_rate_limit: 8        # Extended delay if rate-limit detected
+      jitter_range_s: [0, 2]  # Random jitter added to squad-lead pod launches
+    gate_check:
+      proceed_if: "failure_rate < 0.50 AND rate_limited_count == 0"
+      failure_classes: ["rate_limited", "timeout", "unparseable", "scope_error", "unknown"]
+    scale_overrides:
+      ss-50:  { nexus_waves: false }   # 2-3 commanders — launch in parallel
+      ss-100: { nexus_waves: false }   # 3-5 commanders — launch in parallel
+      ss-250: { nexus_waves: true, nexus_wave_1: 2, nexus_wave_2: 3 }
+
   circuit_breaker:
     commander_failure_threshold: 0.60
     worker_failure_threshold: 0.50
 
@@ -262,15 +262,15 @@ For maximum insight diversity, models from different families are paired within
 
 ## Parallel Execution Design
 
-The architecture is designed for concurrent execution at scale. Wall-clock time grows slower than agent count because the expensive work runs in parallel:
+The architecture is designed for concurrent execution at scale with **wave deployment** to respect platform rate limits. Wall-clock time grows slower than agent count because the expensive work runs in parallel, but launches are staggered in waves (Canary → Probe → Remainder) to avoid concentrated bursts:
 
 ```text
 Agents     Wall-Clock     Ratio vs SS-50
   50         ~30s           1.0×
- 100         ~42s           1.4×
- 250         ~65s           2.2×
+ 100         ~45s           1.5×
+ 250         ~70s           2.3×
 ```
 
-These are design targets, not measured benchmarks. Actual performance depends on task decomposability and platform concurrency limits.
+These are design targets, not measured benchmarks. Actual performance depends on task decomposability and platform concurrency limits. Wave deployment adds ~4-6s per layer but prevents rate-limit failures that would cost more time in recovery.
 
-The main serial bottlenecks are Nexus decomposition (~2s), canary verification (~3s), and final synthesis (~10s). Everything else overlaps via hierarchical fan-out.
+The main serial bottlenecks are Nexus decomposition (~2s), canary verification (~3s), wave gate checks (~2s per gate), and final synthesis (~10s). Everything else overlaps via hierarchical fan-out.
@@ -206,7 +206,7 @@ Everything else overlaps via hierarchical fan-out and pipeline overlap.
 | 1 | Use `explore` / `task` workers | 60% cheaper | Worker types are significantly cheaper than `general-purpose` |
 | 2 | Haiku / Mini at L3 | 10× cheaper | Cheapest models handle the most atomic work |
 | 3 | Micro-brief compression | ~15% savings | Smaller inputs reduce per-agent cost at scale |
-| 4 | Wave deployment | ~20% savings on failure | Catch bad tasks before all 250 workers launch |
+| 4 | Wave deployment | ~20% savings on failure | Canary → Probe (max 3) → Remainder — with health gates between waves. Catches rate limits and bad tasks before full deployment. |
 | 5 | Canary verification | ~5% savings on failure | One cheap canary prevents many expensive failures |
 | 6 | Timeout cascade | Cost protection | Stop slow work before it burns budget |
 | 7 | Cost ceiling | Absolute protection | $20 hard cap prevents runaway bills |