Skip to content

Tracking: Claude-Code-native uplift for Nous (UX, quality, speed, token budget) #120

@sriumcp

Description

@sriumcp

What this is

A focused initiative to elevate Nous along four axes — user experience, campaign success and quality, speed, and token budget — by leaning hard into Claude Code primitives that Nous currently re-implements (or doesn't use) in plain Python. This issue tracks 15 child issues; each is independently shippable but they compose into a coherent rewrite plan.

Why now

Real-world friction (mined from ~3 days of recent Claude sessions across the inference-sim, well-baked, and saturation projects):

  • Visibility: the user typed "report progress" / "where is the campaign" / "how is this proceeding" dozens of times in a single afternoon (5/18). The agent answered every one by re-running the same five-line bash pipeline — sometimes mis-reading the live state because results files appeared between two ls calls.
  • Resume: timeouts on long EXECUTE_ANALYZE sessions led to manual state.json hand-editing and repeated full re-designs (now partly fixed by fix: resume mid-flight campaign at correct iteration after timeout #91, but the parallel-worktree race remained).
  • Connection drops: long Sonnet sessions drop against the LiteLLM proxy after ~10 min, and the previous --max-cli-retries 10 flag caused a second worktree to spawn while the first was still alive — two executors writing to the same iter-N/results/ directory. Solved partly by feat: retry transient claude -p subprocess errors with exponential backoff #71 + feat: pre-flight check + retry everything with failure persistence #111, but the architectural fragility (one giant session) remains.
  • Token bloat: handoff.md files in .nous/ range 8–18 KB and grow monotonically; principles.json reaches 26 entries on mech-design-enforcement. The 266-line design.md and 199-line execute_analyze.md are re-sent every call uncached.
  • Cross-campaign work: 33 campaigns on inference-sim alone. Asking "all campaigns about saturation detection, with results and patches" requires find … -name findings.json plumbing.

Recently shipped (this initiative builds on, does not duplicate)

PR Effect
#91 Resume mid-flight at correct iteration after timeout
#111 Pre-flight check + retry-everything with failure persistence
#71 Transient retry + exponential backoff
#52 Compact handoff designer→executor
#41 Token/cost tracking in dispatchers
#114 Unified nous CLI
#54 nous validate CLI; executor writes artifacts directly
#119 nous replay runs deterministic plan, no LLM

The 15 sub-issues below are explicitly complementary to the work above — none re-litigate it.

Strategic shape

Nous today shells out to claude -p and rebuilds, in Python, capabilities that Claude Code already provides natively: parallel subagents, prompt caching, deterministic Stop hooks, MCP-mediated context, asynchronous human-in-the-loop, scheduled routines. The shape of this initiative is therefore delete code while gaining capabilities: most of cli_dispatch.py and parts of engine.py / worktree.py go away once the orchestrator is rebuilt on the Claude Agent SDK.

The single highest-leverage change is #1 (Agent SDK port) because it makes #2, #3, #4, #6, #7 from "lift" to "configure."

Suggested ship order

Wave 1 — foundation Wave 2 — capabilities Wave 3 — ecosystem
#1 SDK port #3 Parallel-arm subagents #5 Plugin packaging
#2 Prompt caching #4 /goal-driven loop #6 MCP server
#7 Stream-json + status --watch #11 CLAUDE.md / auto-memory #14 Routines
#9 Stop hook for completion #12 Explore-subagent design #10 Channels gates
#13 Worktree-isolated subagents #15 Permission policies
#8 PreToolUse plan enforcer

Sub-issues

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions