Maintainer note (2026-06-14): This assessment was generated against codex/v0.8.61 BEFORE the in-flight Train 3 (worker/fleet) and Train 4 (goal mode) work was merged, so the "dead-code / prompt-only /swarm / orphaned IR" findings describe the pre-merge baseline. Coverage of the recommendations by in-flight work:

Rec 2 (/swarm readiness gate) — DONE by Train 4 (reviewed, queued): /swarm no longer dispatches prompt-only fanout; returns a gated redirect; swarm_is_gated_* test passes.

Rec 4 (goal_loop drives cross-turn) — substrate + decide_continuation wiring + durable progress accrual landed by Train 4; the deep worker re-dispatch is the Train-3 seam.

Rec 1 (model-route per role at agent_open) — targeted by Train 1 (route service) + Train 3 (WorkerRuntimeProfile at the spawn path); verify on merge.

Rec 3 (structured NeedsInput worker contract) — targeted by Train 3 Bug 3.

Rec 5 (coordination substrate / un-orphan the whaleflow IR) and Rec 6 (swarm-UX roster) are the genuinely NET-NEW gaps — now filed as #3229 (coordination substrate) and #3230 (synthesis/reduce pass).

WhaleFlow Alignment Assessment — ultracode + kimi-code swarm

Maintainer framing: "WhaleFlow is more like ultracode is what we want … kind of like that + swarm from kimi-code."

Bottom line: The vision honors both targets and is arguably more ambitious than kimi (heterogeneous-model workers vs. kimi's single trained orchestrator; typed-IR safety vs. free-form). The v0.8.61 plan (the EXECUTION.md spine, Trains 3→4) is the correct sequence and correctly gates /swarm behind the durable substrate. The implementation, however, is almost entirely aspirational — the load-bearing seams are tested-but-unwired dead code or prompt-only strings. Verdict on both targets: partial.

1. Scorecard against the ultracode pattern (target A)

ultracode trait	CodeWhale today	Verdict
Orchestrator + worker fan-out, orchestrator stays free	Only fan-out is the parent model voluntarily emitting `agent_open` (`subagent/mod.rs`); dispatcher runs them parallel up to `MAX_SUBAGENTS`. No Rust orchestrator decomposes/steers.	Partial (substrate only)
Isolated context per worker, built for its task	`AgentWorkerSpec` carries objective/role/workspace/tool_profile (`mod.rs:660-683`); fresh context per worker exists.	Yes
Heterogeneous models (Opus conductor → GPT/Kimi/DeepSeek workers)	`worker_profile.rs` defines `ModelRoute::{Inherit,Auto,Fixed}` + `provider` override + non-escalation intersection — but `#![allow(dead_code)]`; the live spawn path never builds/enforces it and still uses the legacy `AgentWorkerToolProfile` + `allow_shell` bool.	No (the key gap)
Structured handoffs, not freeform chat	Returns are JSON-ish (`transcript_payload` `json!` at `mod.rs:2769`, completion sentinel in `turn_loop.rs`), but the parent largely polls free text via `agent_eval`. No typed `NeedsInput`/result envelope yet (planned, Train 3 Bug 3).	Partial
Control-flow primitives (pipeline / parallel-barrier / loop-until-dry / loop-until-budget)	All modeled in the orphaned `whaleflow` crate IR (`WorkflowNode`: Sequence/BranchSet/Reduce/LoopUntil/Cond/Expand; phase DAG; result-dependency ordering `lib.rs:302-321`) — zero workspace consumers, mock execution only. `goal_loop.rs` is the only live loop and its token/time budgets never fire.	Partial (IR exists, dead)
Quality: adversarial verify / judge / completeness critic	Named only: `goal_loop.rs` `GoalGate` ("verifier confirmed"); Train 4 wants a verifier-as-judge gate (#2058). `update_goal complete` today takes free-text evidence, no independent verifier turn.	No (planned)
Methodical merge (review+verify each output before integrating)	Used as the triage methodology (EXECUTION.md §3) but not as a runtime feature. No synthesis/reduce pass in code.	Partial
Isolation via git worktrees for parallel mutation	`IsolationMode::Worktree` in the IR (dead); `validate_parallel_write_scope` exists (dead). Live sub-agents share FS.	Partial (typed, unwired)

The single most important ultracode gap: no Rust orchestrator actually decomposes, routes-by-role, coordinates, or synthesizes, and the one trait the maintainer most wants — workers of different model types — is fully designed in worker_profile.rs but never reaches the agent_open spawn path.

2. Scorecard against kimi-code's swarm (target B)

Kimi's swarm = a central trainable orchestrator (not peer-to-peer) running a four-stage loop: decompose → spawn (≤300) → coordinate (dependency order, conflict resolution, reassign idle/failed) → synthesize, with live progress UX and PARL-trained critical-path parallelism. The credible-swarm bar (cross-source): many genuinely-parallel workers, a shared coordination substrate, claim/lock semantics, per-worker isolation + merge, roles, failure detection + reassignment, synthesis to one result, observability, and cost honesty.

kimi swarm dimension	CodeWhale today	Verdict
`/swarm` entry point	Exists (`commands/groups/core/mod.rs`, aliases `fanout`/`qun`) — but a prompt-only string (`mod.rs:379-402`) and ungated (no `cfg(feature)`, registered at `:39`, dispatched at `:295`).	Stub only
Stage 1 decompose	None in code — the model is told to "decompose" in English.	No
Stage 2 spawn many parallel	Parent emits `agent_open`; bounded by `MAX_SUBAGENTS` + depth ceiling 3. Real but small + voluntary.	Partial
Stage 3 coordinate (shared state, dependency order, reassign idle/failed)	None. No shared task list, no claim-by-status, no dependency scheduler, no idle/failed reassignment in the agent loop. (Fleet has a scheduler/ledger but is CLI-only and not wired to `/swarm`.)	No (largest swarm gap)
Stage 4 synthesize to one deliverable	None — no reduction pass; parent improvises.	No
Live per-agent progress UX	Intended in `docs/FLEET.md:48-52` (progress card + sidebar rows w/ worker counts, receipts, nested children) but status is hardcoded "running" (Train 3 Bug 2). Real 10-state `AgentWorkerStatus` not plumbed.	No (planned)
Failure detection + recovery	`AgentWorkerStatus` has Failed/Cancelled; Fleet has retries — but not surfaced to or driven by `/swarm`/goal.	Partial
Cost/scale honesty	Docs say "keep fanout proportional"; no token/cost meter on `/swarm`.	Partial

Where CodeWhale's plan is better than kimi: the docs correctly position Swarm as "high-fanout inside WhaleFlow that compiles onto the durable Fleet substrate" (FLEET.md:44-46) rather than reviving a model-visible agent_swarm tool, and the heterogeneous-model story (DeepSeek flash scout + pro synthesizer per role) is a genuine differentiator over kimi's uniform-model swarm — if it gets wired. CodeWhale is also honest that it is orchestrator+workers, not a true (stigmergic) swarm — which matches kimi's own "central orchestrator, not peer-to-peer."

3. Aspirational vs. implemented (no overstating)

Implemented & real: the sub-agent fan-out primitives (agent_open/eval/close/list), parallel dispatch, recursion/width bounds, completion handoff into the parent turn, the Fleet control plane (ledger/scheduler/executor, codewhale exec subprocess workers, CLI surface), and the pure tested foundations (goal_loop.rs decision core, worker_profile.rs capability+route contract, the whaleflow typed IR).

Aspirational (named in docs/comments, not driving anything):

/swarm "WhaleFlow-backed multi-agent swarm" → a prompt string.
WhaleFlow typed-IR runner → orphaned crate, mock execution.
Heterogeneous model-route-per-role → #![allow(dead_code)], not enforced at spawn.
Persistent goal loop → continuation cap resets every turn; budgets hardcoded None; record_thread_goal_usage has zero callers.
One-runtime convergence → two non-unified worker models (in-process tokio vs. codewhale exec); Fleet's in-process registration is cosmetic.

The docs are commendably honest about this (every relevant header says "foundation only" / "follow-up that makes X real"). The risk is purely that the next release ships an ungated prompt-only /swarm that the design doc itself flags as the unsafe pattern.

4. Heterogeneous-model reality (the CodeWhale-specific crux)

The maintainer's distinguishing requirement is workers of different model types (DeepSeek/GLM/MiniMax/Moonshot-Kimi/OpenAI), with first-class DeepSeek preserved. The seam for this is real and correct in the type layer — ModelRoute + per-worker provider override + parent-non-escalation intersection in worker_profile.rs, and the EXECUTION.md spine explicitly lists "Per-role profiles + MODEL ROUTES #3217 #2027 #1768 #3205" as the substrate goal mode and /swarm ride on. But it is dead code. The highest-leverage single change is wiring agent_open to build a WorkerRuntimeProfile and route by role (flash scout = deepseek-v4-flash, the existing Fin lane; pro synthesizer = inherited session model; DeepSeek as the default inherit route to preserve first-class support). Until that lands, "ultracode where the workers are different model types" is a comment, not a capability — and it is also the per-worker model/provider that the swarm UX must display to show the heterogeneity that differentiates CodeWhale from kimi.

5. Train mapping (where the work already lives)

Train 3 (heart of 0.8.61) — worker/fleet/sub-agent convergence: carries the durable runtime, the structured NeedsInput handoff (Bug 3), the real 10-state status plumbing (Bug 2), the one-runtime cutover (#3096), and the WorkerRuntimeProfile/model-route wiring (#3217). This train is where ultracode becomes real.
Train 4 — goal mode + /swarm gating: makes goal_loop cross-turn + durable (#3215, folding #891/#1976/#2058/#2029), adds the verifier gate, and keeps /swarm gated (#3218). This train is where the orchestrator loop + swarm honesty become real.
Deferred epics (correctly): #3154 Fleet EPIC, #3166/#3167 dogfood/org-chart, #2058 full goal-system port (→ v0.9.0 WhaleFlow LoopUntil), #3086 context-budget service.
NEW issue to file (gap not currently owned): a WhaleFlow coordination substrate — use the Fleet ledger as the shared task list (claim-by-status, dependency order, idle/failed reassignment) and consume the orphaned whaleflow IR for decomposition/dependency semantics. This is the kimi-stage-3 / Anthropic-shared-state primitive that neither Train 3 nor Train 4 fully owns today.

6. Recommendations (prioritized)

Wire heterogeneous model-route-per-role at agent_open (Train 3 / #3217 + #3205). The #1 differentiator; turn worker_profile.rs on. Keep DeepSeek as default inherit.
Gate /swarm behind a readiness check NOW (Train 4 / #3218 — split a quick-fix slice). Don't ship an ungated prompt-only swarm.
Define the structured worker-return envelope (NeedsInput/result) (Train 3 / #3226 + #3216 Bug 3). Structured handoffs, not chat.
Make goal_loop drive the engine cross-turn + durable (Train 4 / #3215). Turn the orchestrator on; call record_thread_goal_usage; bridge the three goal models.
Add a swarm coordination substrate (shared task list + claim/lock + consume the whaleflow IR) — NEW issue. The largest single gap vs. both targets.
Add the swarm UX: per-agent roster with model/provider/cost/state + one synthesized output (Train 3 Bug 2 + Train 5 #3028/#3078/#2666). Show the model heterogeneity.
Add an adversarial-verify/judge gate before completion (Train 4 / #2058 slice in #3215).
Converge the two worker execution models behind the one-runtime contract (Train 3 / #3096, per docs/AGENT_RUNTIME.md).
Be honest about cost + the coding-specific caveat (Train 4 #3218 + Train 5 telemetry). Refuse trivially-serial tasks; show a token meter; lean on the DeepSeek-flash scout lane as the honest cheap-parallelism story.

Branding/stewardship note: all of the above preserves CodeWhale branding and first-class DeepSeek support — indeed the flash-scout/pro-synthesizer split depends on the DeepSeek Fin lane, and the swarm UX should surface DeepSeek-on-a-worker as a visible feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WhaleFlow Alignment Assessment — ultracode + kimi-code swarm

1. Scorecard against the ultracode pattern (target A)

2. Scorecard against kimi-code's swarm (target B)

3. Aspirational vs. implemented (no overstating)

4. Heterogeneous-model reality (the CodeWhale-specific crux)

5. Train mapping (where the work already lives)

6. Recommendations (prioritized)

FilesExpand file tree

WHALEFLOW_VISION_ALIGNMENT.md

Latest commit

History

WHALEFLOW_VISION_ALIGNMENT.md

File metadata and controls

WhaleFlow Alignment Assessment — ultracode + kimi-code swarm

1. Scorecard against the ultracode pattern (target A)

2. Scorecard against kimi-code's swarm (target B)

3. Aspirational vs. implemented (no overstating)

4. Heterogeneous-model reality (the CodeWhale-specific crux)

5. Train mapping (where the work already lives)

6. Recommendations (prioritized)