Skip to content

Latest commit

 

History

History
105 lines (77 loc) · 12.3 KB

File metadata and controls

105 lines (77 loc) · 12.3 KB

Maintainer note (2026-06-14): This assessment was generated against codex/v0.8.61 BEFORE the in-flight Train 3 (worker/fleet) and Train 4 (goal mode) work was merged, so the "dead-code / prompt-only /swarm / orphaned IR" findings describe the pre-merge baseline. Coverage of the recommendations by in-flight work:

  • Rec 2 (/swarm readiness gate) — DONE by Train 4 (reviewed, queued): /swarm no longer dispatches prompt-only fanout; returns a gated redirect; swarm_is_gated_* test passes.
  • Rec 4 (goal_loop drives cross-turn) — substrate + decide_continuation wiring + durable progress accrual landed by Train 4; the deep worker re-dispatch is the Train-3 seam.
  • Rec 1 (model-route per role at agent_open) — targeted by Train 1 (route service) + Train 3 (WorkerRuntimeProfile at the spawn path); verify on merge.
  • Rec 3 (structured NeedsInput worker contract) — targeted by Train 3 Bug 3.
  • Rec 5 (coordination substrate / un-orphan the whaleflow IR) and Rec 6 (swarm-UX roster) are the genuinely NET-NEW gaps — now filed as #3229 (coordination substrate) and #3230 (synthesis/reduce pass).

WhaleFlow Alignment Assessment — ultracode + kimi-code swarm

Maintainer framing: "WhaleFlow is more like ultracode is what we want … kind of like that + swarm from kimi-code."

Bottom line: The vision honors both targets and is arguably more ambitious than kimi (heterogeneous-model workers vs. kimi's single trained orchestrator; typed-IR safety vs. free-form). The v0.8.61 plan (the EXECUTION.md spine, Trains 3→4) is the correct sequence and correctly gates /swarm behind the durable substrate. The implementation, however, is almost entirely aspirational — the load-bearing seams are tested-but-unwired dead code or prompt-only strings. Verdict on both targets: partial.


1. Scorecard against the ultracode pattern (target A)

ultracode trait CodeWhale today Verdict
Orchestrator + worker fan-out, orchestrator stays free Only fan-out is the parent model voluntarily emitting agent_open (subagent/mod.rs); dispatcher runs them parallel up to MAX_SUBAGENTS. No Rust orchestrator decomposes/steers. Partial (substrate only)
Isolated context per worker, built for its task AgentWorkerSpec carries objective/role/workspace/tool_profile (mod.rs:660-683); fresh context per worker exists. Yes
Heterogeneous models (Opus conductor → GPT/Kimi/DeepSeek workers) worker_profile.rs defines ModelRoute::{Inherit,Auto,Fixed} + provider override + non-escalation intersection — but #![allow(dead_code)]; the live spawn path never builds/enforces it and still uses the legacy AgentWorkerToolProfile + allow_shell bool. No (the key gap)
Structured handoffs, not freeform chat Returns are JSON-ish (transcript_payload json! at mod.rs:2769, completion sentinel in turn_loop.rs), but the parent largely polls free text via agent_eval. No typed NeedsInput/result envelope yet (planned, Train 3 Bug 3). Partial
Control-flow primitives (pipeline / parallel-barrier / loop-until-dry / loop-until-budget) All modeled in the orphaned whaleflow crate IR (WorkflowNode: Sequence/BranchSet/Reduce/LoopUntil/Cond/Expand; phase DAG; result-dependency ordering lib.rs:302-321) — zero workspace consumers, mock execution only. goal_loop.rs is the only live loop and its token/time budgets never fire. Partial (IR exists, dead)
Quality: adversarial verify / judge / completeness critic Named only: goal_loop.rs GoalGate ("verifier confirmed"); Train 4 wants a verifier-as-judge gate (#2058). update_goal complete today takes free-text evidence, no independent verifier turn. No (planned)
Methodical merge (review+verify each output before integrating) Used as the triage methodology (EXECUTION.md §3) but not as a runtime feature. No synthesis/reduce pass in code. Partial
Isolation via git worktrees for parallel mutation IsolationMode::Worktree in the IR (dead); validate_parallel_write_scope exists (dead). Live sub-agents share FS. Partial (typed, unwired)

The single most important ultracode gap: no Rust orchestrator actually decomposes, routes-by-role, coordinates, or synthesizes, and the one trait the maintainer most wants — workers of different model types — is fully designed in worker_profile.rs but never reaches the agent_open spawn path.


2. Scorecard against kimi-code's swarm (target B)

Kimi's swarm = a central trainable orchestrator (not peer-to-peer) running a four-stage loop: decompose → spawn (≤300) → coordinate (dependency order, conflict resolution, reassign idle/failed) → synthesize, with live progress UX and PARL-trained critical-path parallelism. The credible-swarm bar (cross-source): many genuinely-parallel workers, a shared coordination substrate, claim/lock semantics, per-worker isolation + merge, roles, failure detection + reassignment, synthesis to one result, observability, and cost honesty.

kimi swarm dimension CodeWhale today Verdict
/swarm entry point Exists (commands/groups/core/mod.rs, aliases fanout/qun) — but a prompt-only string (mod.rs:379-402) and ungated (no cfg(feature), registered at :39, dispatched at :295). Stub only
Stage 1 decompose None in code — the model is told to "decompose" in English. No
Stage 2 spawn many parallel Parent emits agent_open; bounded by MAX_SUBAGENTS + depth ceiling 3. Real but small + voluntary. Partial
Stage 3 coordinate (shared state, dependency order, reassign idle/failed) None. No shared task list, no claim-by-status, no dependency scheduler, no idle/failed reassignment in the agent loop. (Fleet has a scheduler/ledger but is CLI-only and not wired to /swarm.) No (largest swarm gap)
Stage 4 synthesize to one deliverable None — no reduction pass; parent improvises. No
Live per-agent progress UX Intended in docs/FLEET.md:48-52 (progress card + sidebar rows w/ worker counts, receipts, nested children) but status is hardcoded "running" (Train 3 Bug 2). Real 10-state AgentWorkerStatus not plumbed. No (planned)
Failure detection + recovery AgentWorkerStatus has Failed/Cancelled; Fleet has retries — but not surfaced to or driven by /swarm/goal. Partial
Cost/scale honesty Docs say "keep fanout proportional"; no token/cost meter on /swarm. Partial

Where CodeWhale's plan is better than kimi: the docs correctly position Swarm as "high-fanout inside WhaleFlow that compiles onto the durable Fleet substrate" (FLEET.md:44-46) rather than reviving a model-visible agent_swarm tool, and the heterogeneous-model story (DeepSeek flash scout + pro synthesizer per role) is a genuine differentiator over kimi's uniform-model swarm — if it gets wired. CodeWhale is also honest that it is orchestrator+workers, not a true (stigmergic) swarm — which matches kimi's own "central orchestrator, not peer-to-peer."


3. Aspirational vs. implemented (no overstating)

Implemented & real: the sub-agent fan-out primitives (agent_open/eval/close/list), parallel dispatch, recursion/width bounds, completion handoff into the parent turn, the Fleet control plane (ledger/scheduler/executor, codewhale exec subprocess workers, CLI surface), and the pure tested foundations (goal_loop.rs decision core, worker_profile.rs capability+route contract, the whaleflow typed IR).

Aspirational (named in docs/comments, not driving anything):

  • /swarm "WhaleFlow-backed multi-agent swarm" → a prompt string.
  • WhaleFlow typed-IR runner → orphaned crate, mock execution.
  • Heterogeneous model-route-per-role → #![allow(dead_code)], not enforced at spawn.
  • Persistent goal loop → continuation cap resets every turn; budgets hardcoded None; record_thread_goal_usage has zero callers.
  • One-runtime convergence → two non-unified worker models (in-process tokio vs. codewhale exec); Fleet's in-process registration is cosmetic.

The docs are commendably honest about this (every relevant header says "foundation only" / "follow-up that makes X real"). The risk is purely that the next release ships an ungated prompt-only /swarm that the design doc itself flags as the unsafe pattern.


4. Heterogeneous-model reality (the CodeWhale-specific crux)

The maintainer's distinguishing requirement is workers of different model types (DeepSeek/GLM/MiniMax/Moonshot-Kimi/OpenAI), with first-class DeepSeek preserved. The seam for this is real and correct in the type layer — ModelRoute + per-worker provider override + parent-non-escalation intersection in worker_profile.rs, and the EXECUTION.md spine explicitly lists "Per-role profiles + MODEL ROUTES #3217 #2027 #1768 #3205" as the substrate goal mode and /swarm ride on. But it is dead code. The highest-leverage single change is wiring agent_open to build a WorkerRuntimeProfile and route by role (flash scout = deepseek-v4-flash, the existing Fin lane; pro synthesizer = inherited session model; DeepSeek as the default inherit route to preserve first-class support). Until that lands, "ultracode where the workers are different model types" is a comment, not a capability — and it is also the per-worker model/provider that the swarm UX must display to show the heterogeneity that differentiates CodeWhale from kimi.


5. Train mapping (where the work already lives)

  • Train 3 (heart of 0.8.61) — worker/fleet/sub-agent convergence: carries the durable runtime, the structured NeedsInput handoff (Bug 3), the real 10-state status plumbing (Bug 2), the one-runtime cutover (#3096), and the WorkerRuntimeProfile/model-route wiring (#3217). This train is where ultracode becomes real.
  • Train 4 — goal mode + /swarm gating: makes goal_loop cross-turn + durable (#3215, folding #891/#1976/#2058/#2029), adds the verifier gate, and keeps /swarm gated (#3218). This train is where the orchestrator loop + swarm honesty become real.
  • Deferred epics (correctly): #3154 Fleet EPIC, #3166/#3167 dogfood/org-chart, #2058 full goal-system port (→ v0.9.0 WhaleFlow LoopUntil), #3086 context-budget service.
  • NEW issue to file (gap not currently owned): a WhaleFlow coordination substrate — use the Fleet ledger as the shared task list (claim-by-status, dependency order, idle/failed reassignment) and consume the orphaned whaleflow IR for decomposition/dependency semantics. This is the kimi-stage-3 / Anthropic-shared-state primitive that neither Train 3 nor Train 4 fully owns today.

6. Recommendations (prioritized)

  1. Wire heterogeneous model-route-per-role at agent_open (Train 3 / #3217 + #3205). The #1 differentiator; turn worker_profile.rs on. Keep DeepSeek as default inherit.
  2. Gate /swarm behind a readiness check NOW (Train 4 / #3218 — split a quick-fix slice). Don't ship an ungated prompt-only swarm.
  3. Define the structured worker-return envelope (NeedsInput/result) (Train 3 / #3226 + #3216 Bug 3). Structured handoffs, not chat.
  4. Make goal_loop drive the engine cross-turn + durable (Train 4 / #3215). Turn the orchestrator on; call record_thread_goal_usage; bridge the three goal models.
  5. Add a swarm coordination substrate (shared task list + claim/lock + consume the whaleflow IR)NEW issue. The largest single gap vs. both targets.
  6. Add the swarm UX: per-agent roster with model/provider/cost/state + one synthesized output (Train 3 Bug 2 + Train 5 #3028/#3078/#2666). Show the model heterogeneity.
  7. Add an adversarial-verify/judge gate before completion (Train 4 / #2058 slice in #3215).
  8. Converge the two worker execution models behind the one-runtime contract (Train 3 / #3096, per docs/AGENT_RUNTIME.md).
  9. Be honest about cost + the coding-specific caveat (Train 4 #3218 + Train 5 telemetry). Refuse trivially-serial tasks; show a token meter; lean on the DeepSeek-flash scout lane as the honest cheap-parallelism story.

Branding/stewardship note: all of the above preserves CodeWhale branding and first-class DeepSeek support — indeed the flash-scout/pro-synthesizer split depends on the DeepSeek Fin lane, and the swarm UX should surface DeepSeek-on-a-worker as a visible feature.