Maintainer note (2026-06-14): This assessment was generated against
codex/v0.8.61BEFORE the in-flight Train 3 (worker/fleet) and Train 4 (goal mode) work was merged, so the "dead-code / prompt-only /swarm / orphaned IR" findings describe the pre-merge baseline. Coverage of the recommendations by in-flight work:
- Rec 2 (/swarm readiness gate) — DONE by Train 4 (reviewed, queued):
/swarmno longer dispatches prompt-only fanout; returns a gated redirect;swarm_is_gated_*test passes.- Rec 4 (goal_loop drives cross-turn) — substrate +
decide_continuationwiring + durable progress accrual landed by Train 4; the deep worker re-dispatch is the Train-3 seam.- Rec 1 (model-route per role at
agent_open) — targeted by Train 1 (route service) + Train 3 (WorkerRuntimeProfileat the spawn path); verify on merge.- Rec 3 (structured
NeedsInputworker contract) — targeted by Train 3 Bug 3.- Rec 5 (coordination substrate / un-orphan the whaleflow IR) and Rec 6 (swarm-UX roster) are the genuinely NET-NEW gaps — now filed as #3229 (coordination substrate) and #3230 (synthesis/reduce pass).
Maintainer framing: "WhaleFlow is more like ultracode is what we want … kind of like that + swarm from kimi-code."
Bottom line: The vision honors both targets and is arguably more ambitious than kimi (heterogeneous-model workers vs. kimi's single trained orchestrator; typed-IR safety vs. free-form). The v0.8.61 plan (the EXECUTION.md spine, Trains 3→4) is the correct sequence and correctly gates /swarm behind the durable substrate. The implementation, however, is almost entirely aspirational — the load-bearing seams are tested-but-unwired dead code or prompt-only strings. Verdict on both targets: partial.
| ultracode trait | CodeWhale today | Verdict |
|---|---|---|
| Orchestrator + worker fan-out, orchestrator stays free | Only fan-out is the parent model voluntarily emitting agent_open (subagent/mod.rs); dispatcher runs them parallel up to MAX_SUBAGENTS. No Rust orchestrator decomposes/steers. |
Partial (substrate only) |
| Isolated context per worker, built for its task | AgentWorkerSpec carries objective/role/workspace/tool_profile (mod.rs:660-683); fresh context per worker exists. |
Yes |
| Heterogeneous models (Opus conductor → GPT/Kimi/DeepSeek workers) | worker_profile.rs defines ModelRoute::{Inherit,Auto,Fixed} + provider override + non-escalation intersection — but #![allow(dead_code)]; the live spawn path never builds/enforces it and still uses the legacy AgentWorkerToolProfile + allow_shell bool. |
No (the key gap) |
| Structured handoffs, not freeform chat | Returns are JSON-ish (transcript_payload json! at mod.rs:2769, completion sentinel in turn_loop.rs), but the parent largely polls free text via agent_eval. No typed NeedsInput/result envelope yet (planned, Train 3 Bug 3). |
Partial |
| Control-flow primitives (pipeline / parallel-barrier / loop-until-dry / loop-until-budget) | All modeled in the orphaned whaleflow crate IR (WorkflowNode: Sequence/BranchSet/Reduce/LoopUntil/Cond/Expand; phase DAG; result-dependency ordering lib.rs:302-321) — zero workspace consumers, mock execution only. goal_loop.rs is the only live loop and its token/time budgets never fire. |
Partial (IR exists, dead) |
| Quality: adversarial verify / judge / completeness critic | Named only: goal_loop.rs GoalGate ("verifier confirmed"); Train 4 wants a verifier-as-judge gate (#2058). update_goal complete today takes free-text evidence, no independent verifier turn. |
No (planned) |
| Methodical merge (review+verify each output before integrating) | Used as the triage methodology (EXECUTION.md §3) but not as a runtime feature. No synthesis/reduce pass in code. | Partial |
| Isolation via git worktrees for parallel mutation | IsolationMode::Worktree in the IR (dead); validate_parallel_write_scope exists (dead). Live sub-agents share FS. |
Partial (typed, unwired) |
The single most important ultracode gap: no Rust orchestrator actually decomposes, routes-by-role, coordinates, or synthesizes, and the one trait the maintainer most wants — workers of different model types — is fully designed in worker_profile.rs but never reaches the agent_open spawn path.
Kimi's swarm = a central trainable orchestrator (not peer-to-peer) running a four-stage loop: decompose → spawn (≤300) → coordinate (dependency order, conflict resolution, reassign idle/failed) → synthesize, with live progress UX and PARL-trained critical-path parallelism. The credible-swarm bar (cross-source): many genuinely-parallel workers, a shared coordination substrate, claim/lock semantics, per-worker isolation + merge, roles, failure detection + reassignment, synthesis to one result, observability, and cost honesty.
| kimi swarm dimension | CodeWhale today | Verdict |
|---|---|---|
/swarm entry point |
Exists (commands/groups/core/mod.rs, aliases fanout/qun) — but a prompt-only string (mod.rs:379-402) and ungated (no cfg(feature), registered at :39, dispatched at :295). |
Stub only |
| Stage 1 decompose | None in code — the model is told to "decompose" in English. | No |
| Stage 2 spawn many parallel | Parent emits agent_open; bounded by MAX_SUBAGENTS + depth ceiling 3. Real but small + voluntary. |
Partial |
| Stage 3 coordinate (shared state, dependency order, reassign idle/failed) | None. No shared task list, no claim-by-status, no dependency scheduler, no idle/failed reassignment in the agent loop. (Fleet has a scheduler/ledger but is CLI-only and not wired to /swarm.) |
No (largest swarm gap) |
| Stage 4 synthesize to one deliverable | None — no reduction pass; parent improvises. | No |
| Live per-agent progress UX | Intended in docs/FLEET.md:48-52 (progress card + sidebar rows w/ worker counts, receipts, nested children) but status is hardcoded "running" (Train 3 Bug 2). Real 10-state AgentWorkerStatus not plumbed. |
No (planned) |
| Failure detection + recovery | AgentWorkerStatus has Failed/Cancelled; Fleet has retries — but not surfaced to or driven by /swarm/goal. |
Partial |
| Cost/scale honesty | Docs say "keep fanout proportional"; no token/cost meter on /swarm. |
Partial |
Where CodeWhale's plan is better than kimi: the docs correctly position Swarm as "high-fanout inside WhaleFlow that compiles onto the durable Fleet substrate" (FLEET.md:44-46) rather than reviving a model-visible agent_swarm tool, and the heterogeneous-model story (DeepSeek flash scout + pro synthesizer per role) is a genuine differentiator over kimi's uniform-model swarm — if it gets wired. CodeWhale is also honest that it is orchestrator+workers, not a true (stigmergic) swarm — which matches kimi's own "central orchestrator, not peer-to-peer."
Implemented & real: the sub-agent fan-out primitives (agent_open/eval/close/list), parallel dispatch, recursion/width bounds, completion handoff into the parent turn, the Fleet control plane (ledger/scheduler/executor, codewhale exec subprocess workers, CLI surface), and the pure tested foundations (goal_loop.rs decision core, worker_profile.rs capability+route contract, the whaleflow typed IR).
Aspirational (named in docs/comments, not driving anything):
/swarm"WhaleFlow-backed multi-agent swarm" → a prompt string.- WhaleFlow typed-IR runner → orphaned crate, mock execution.
- Heterogeneous model-route-per-role →
#![allow(dead_code)], not enforced at spawn. - Persistent goal loop → continuation cap resets every turn; budgets hardcoded
None;record_thread_goal_usagehas zero callers. - One-runtime convergence → two non-unified worker models (in-process tokio vs.
codewhale exec); Fleet's in-process registration is cosmetic.
The docs are commendably honest about this (every relevant header says "foundation only" / "follow-up that makes X real"). The risk is purely that the next release ships an ungated prompt-only /swarm that the design doc itself flags as the unsafe pattern.
The maintainer's distinguishing requirement is workers of different model types (DeepSeek/GLM/MiniMax/Moonshot-Kimi/OpenAI), with first-class DeepSeek preserved. The seam for this is real and correct in the type layer — ModelRoute + per-worker provider override + parent-non-escalation intersection in worker_profile.rs, and the EXECUTION.md spine explicitly lists "Per-role profiles + MODEL ROUTES #3217 #2027 #1768 #3205" as the substrate goal mode and /swarm ride on. But it is dead code. The highest-leverage single change is wiring agent_open to build a WorkerRuntimeProfile and route by role (flash scout = deepseek-v4-flash, the existing Fin lane; pro synthesizer = inherited session model; DeepSeek as the default inherit route to preserve first-class support). Until that lands, "ultracode where the workers are different model types" is a comment, not a capability — and it is also the per-worker model/provider that the swarm UX must display to show the heterogeneity that differentiates CodeWhale from kimi.
- Train 3 (heart of 0.8.61) — worker/fleet/sub-agent convergence: carries the durable runtime, the structured
NeedsInputhandoff (Bug 3), the real 10-state status plumbing (Bug 2), the one-runtime cutover (#3096), and theWorkerRuntimeProfile/model-route wiring (#3217). This train is where ultracode becomes real. - Train 4 — goal mode +
/swarmgating: makesgoal_loopcross-turn + durable (#3215, folding #891/#1976/#2058/#2029), adds the verifier gate, and keeps/swarmgated (#3218). This train is where the orchestrator loop + swarm honesty become real. - Deferred epics (correctly): #3154 Fleet EPIC, #3166/#3167 dogfood/org-chart, #2058 full goal-system port (→ v0.9.0 WhaleFlow LoopUntil), #3086 context-budget service.
- NEW issue to file (gap not currently owned): a WhaleFlow coordination substrate — use the Fleet ledger as the shared task list (claim-by-status, dependency order, idle/failed reassignment) and consume the orphaned
whaleflowIR for decomposition/dependency semantics. This is the kimi-stage-3 / Anthropic-shared-state primitive that neither Train 3 nor Train 4 fully owns today.
- Wire heterogeneous model-route-per-role at
agent_open(Train 3 / #3217 + #3205). The #1 differentiator; turnworker_profile.rson. Keep DeepSeek as default inherit. - Gate
/swarmbehind a readiness check NOW (Train 4 / #3218 — split a quick-fix slice). Don't ship an ungated prompt-only swarm. - Define the structured worker-return envelope (
NeedsInput/result) (Train 3 / #3226 + #3216 Bug 3). Structured handoffs, not chat. - Make
goal_loopdrive the engine cross-turn + durable (Train 4 / #3215). Turn the orchestrator on; callrecord_thread_goal_usage; bridge the three goal models. - Add a swarm coordination substrate (shared task list + claim/lock + consume the whaleflow IR) — NEW issue. The largest single gap vs. both targets.
- Add the swarm UX: per-agent roster with model/provider/cost/state + one synthesized output (Train 3 Bug 2 + Train 5 #3028/#3078/#2666). Show the model heterogeneity.
- Add an adversarial-verify/judge gate before completion (Train 4 / #2058 slice in #3215).
- Converge the two worker execution models behind the one-runtime contract (Train 3 / #3096, per
docs/AGENT_RUNTIME.md). - Be honest about cost + the coding-specific caveat (Train 4 #3218 + Train 5 telemetry). Refuse trivially-serial tasks; show a token meter; lean on the DeepSeek-flash scout lane as the honest cheap-parallelism story.
Branding/stewardship note: all of the above preserves CodeWhale branding and first-class DeepSeek support — indeed the flash-scout/pro-synthesizer split depends on the DeepSeek Fin lane, and the swarm UX should surface DeepSeek-on-a-worker as a visible feature.