Tracking: Claude-Code-native uplift for Nous (UX, quality, speed, token budget)

## What this is

A focused initiative to elevate Nous along four axes — **user experience**, **campaign success and quality**, **speed**, and **token budget** — by leaning hard into Claude Code primitives that Nous currently re-implements (or doesn't use) in plain Python. This issue tracks 15 child issues; each is independently shippable but they compose into a coherent rewrite plan.

## Why now

Real-world friction (mined from ~3 days of recent Claude sessions across the `inference-sim`, `well-baked`, and `saturation` projects):

- **Visibility**: the user typed "report progress" / "where is the campaign" / "how is this proceeding" dozens of times in a single afternoon (5/18). The agent answered every one by re-running the same five-line bash pipeline — sometimes mis-reading the live state because results files appeared between two `ls` calls.
- **Resume**: timeouts on long EXECUTE_ANALYZE sessions led to manual `state.json` hand-editing and repeated full re-designs (now partly fixed by #91, but the parallel-worktree race remained).
- **Connection drops**: long Sonnet sessions drop against the LiteLLM proxy after ~10 min, and the previous `--max-cli-retries 10` flag caused a *second* worktree to spawn while the first was still alive — two executors writing to the same `iter-N/results/` directory. Solved partly by #71 + #111, but the architectural fragility (one giant session) remains.
- **Token bloat**: handoff.md files in `.nous/` range 8–18 KB and grow monotonically; principles.json reaches 26 entries on `mech-design-enforcement`. The 266-line `design.md` and 199-line `execute_analyze.md` are re-sent every call uncached.
- **Cross-campaign work**: 33 campaigns on `inference-sim` alone. Asking "all campaigns about saturation detection, with results and patches" requires `find … -name findings.json` plumbing.

## Recently shipped (this initiative builds on, does not duplicate)

| PR | Effect |
|----|--------|
| #91 | Resume mid-flight at correct iteration after timeout |
| #111 | Pre-flight check + retry-everything with failure persistence |
| #71 | Transient retry + exponential backoff |
| #52 | Compact handoff designer→executor |
| #41 | Token/cost tracking in dispatchers |
| #114 | Unified `nous` CLI |
| #54 | `nous validate` CLI; executor writes artifacts directly |
| #119 | `nous replay` runs deterministic plan, no LLM |

The 15 sub-issues below are explicitly **complementary** to the work above — none re-litigate it.

## Strategic shape

Nous today shells out to `claude -p` and rebuilds, in Python, capabilities that Claude Code already provides natively: parallel subagents, prompt caching, deterministic Stop hooks, MCP-mediated context, asynchronous human-in-the-loop, scheduled routines. The shape of this initiative is therefore **delete code while gaining capabilities**: most of `cli_dispatch.py` and parts of `engine.py` / `worktree.py` go away once the orchestrator is rebuilt on the Claude Agent SDK.

The single highest-leverage change is **#1 (Agent SDK port)** because it makes #2, #3, #4, #6, #7 from "lift" to "configure."

## Suggested ship order

| Wave 1 — foundation | Wave 2 — capabilities | Wave 3 — ecosystem |
|---|---|---|
| #1 SDK port              | #3 Parallel-arm subagents | #5 Plugin packaging |
| #2 Prompt caching        | #4 `/goal`-driven loop    | #6 MCP server |
| #7 Stream-json + status --watch | #11 CLAUDE.md / auto-memory | #14 Routines |
| #9 Stop hook for completion | #12 Explore-subagent design | #10 Channels gates |
|                          | #13 Worktree-isolated subagents | #15 Permission policies |
|                          | #8 PreToolUse plan enforcer |  |

## Sub-issues



## Sub-issues

- [ ] #121 — Port to Claude Agent SDK (foundation)
- [ ] #122 — Cache static methodology prompts (`cache_control: ephemeral`)
- [ ] #123 — Parallelize arm execution via SDK subagents
- [ ] #124 — `/goal`-driven campaign loop
- [ ] #125 — Package as Claude Code plugin (skills + slash commands)
- [ ] #126 — MCP server `nous-mcp` for campaigns
- [ ] #127 — Stream-json + `nous status --watch` TUI
- [ ] #128 — PreToolUse hook to enforce `experiment_plan.yaml`
- [ ] #129 — Deterministic Stop hook for iteration completion
- [ ] #130 — Channels integration for async gate approval
- [ ] #131 — Per-campaign CLAUDE.md + auto-memory
- [ ] #132 — Explore-subagent design phase
- [ ] #133 — Harness-managed worktree isolation
- [ ] #134 — Routines for scheduled overnight campaigns
- [ ] #135 — Per-campaign permission policy (kill `--dangerously-skip-permissions`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: Claude-Code-native uplift for Nous (UX, quality, speed, token budget) #120

What this is

Why now

Recently shipped (this initiative builds on, does not duplicate)

Strategic shape

Suggested ship order

Sub-issues

Sub-issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

PR	Effect
#91	Resume mid-flight at correct iteration after timeout
#111	Pre-flight check + retry-everything with failure persistence
#71	Transient retry + exponential backoff
#52	Compact handoff designer→executor
#41	Token/cost tracking in dispatchers
#114	Unified `nous` CLI
#54	`nous validate` CLI; executor writes artifacts directly
#119	`nous replay` runs deterministic plan, no LLM

Wave 1 — foundation	Wave 2 — capabilities	Wave 3 — ecosystem
#1 SDK port	#3 Parallel-arm subagents	#5 Plugin packaging
#2 Prompt caching	#4 `/goal`-driven loop	#6 MCP server
#7 Stream-json + status --watch	#11 CLAUDE.md / auto-memory	#14 Routines
#9 Stop hook for completion	#12 Explore-subagent design	#10 Channels gates
	#13 Worktree-isolated subagents	#15 Permission policies
	#8 PreToolUse plan enforcer

Tracking: Claude-Code-native uplift for Nous (UX, quality, speed, token budget) #120

Description

What this is

Why now

Recently shipped (this initiative builds on, does not duplicate)

Strategic shape

Suggested ship order

Sub-issues

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions