Tracking: paper-burst friction-test findings (post-#189)

## What this is

A consolidated tracking issue for friction surfaced by a clean-room friction-test of the `paper-burst` campaign on top of the post-#189 nous (PR #192 → `reflective` @ `c6a6ee6`).

The friction-test goal: launch `paper-burst` (the smallest of six SIGMETRICS paper-reproduction campaigns), use nous as a real campaign author would, and surface every operational paper-cut without filtering. The campaign reached **iter-1 DONE with `ground_truth_held: false`** — but only after 2h+ wall-clock, manual sandbox bypass, and an external `nous stop` + kill once a hung BLIS subprocess wedged iter-2's EXECUTE_ANALYZE. The findings cluster into 10 nous-side improvements, captured below.

## Friction surface (paper-burst, 2026-05-26)

```
Launch:                         nous run … --max-iterations 1 --auto-approve --agent sdk
Iter-1 DESIGN:                  ✓ bundle.yaml + problem.md + handoff_snapshot.md (high-quality)
Iter-1 DESIGN gate:             ✓ auto-approved
Iter-1 EXECUTE_ANALYZE (try 1): ✗ SDK sandbox blocked BLIS writes → /tmp workarounds → killed
Workaround applied:             permission_mode="bypassPermissions" in sdk_dispatch.py
Iter-1 EXECUTE_ANALYZE (try 2): ✓ 49/50 BLIS results, analysis_summary.json valid, ground_truth_held=false
Iter-2 DESIGN:                  ✓ started (campaign continued past intended cap because resume
                                  doesn't preserve --max-iterations)
Stop:                           nous stop + kill (sentinel honored at iter boundary, but iter-2
                                  was already mid-flight — phase boundaries needed)
```

## Subissues

**Critical (block any nontrivial campaign):**

- [ ] #193 — SDK sandbox blocks campaign-write paths; needs configurable bypass / auto-allowlist [`bug`, `foundation`]
- [ ] #194 — `state.json.iteration` off-by-one during iter-1 (reads 0 throughout) [`bug`]

**High-priority operator UX:**

- [ ] #195 — `nous status` 'last tool' field permanently empty (extracts wrong attribute) [`bug`]
- [ ] #197 — `--max-iterations` doesn't survive `nous resume`; silently defaults to 10 [`bug`, `ux`]
- [ ] #198 — `nous stop` should honor phase boundaries, not just iteration boundaries [`enhancement`, `ux`]

**Medium structural / extensibility:**

- [ ] #199 — Validator's iter-root whitelist needs a per-campaign extension mechanism [`enhancement`]
- [ ] #200 — ExecuteAnalyzeIncompleteError analog of #187 for the EXECUTE phase [`enhancement`, `ux`]
- [ ] #201 — Detect SDK turn silence (no streaming events for N min) and surface to retry_log [`enhancement`]

**Cosmetic / polish:**

- [ ] #196 — Dispatch log misattributes SDKDispatcher runs as 'CLIDispatcher' [`bug`, `ux`]
- [ ] #202 — Resume's 'iteration=0; starting fresh' warning misleads operators [`ux`]

## Out of scope (campaign-spec friction; belongs in inference-sim)

The friction-test also surfaced campaign-spec issues that live in the **target repo** (`inference-sim/.nous/paper-burst-brief.md` + `paper-conventions.md`), not in nous. Captured here for completeness:

- **Probe brief's CLI command can't activate enforcement policies.** BLIS's `sim/simulator.go:811` requires `req.TenantID != ""`; the brief's flat `--rate 1 --num-requests 5 …` produces anonymous requests, so all 5 policies degenerate to FCFS and produce byte-identical outputs. Probe assertion (a) "ordering ≠ FCFS" cannot pass.
- **`paper-conventions.md` says probe should be 2-tenant A:light + B:heavy; `paper-burst-brief.md`'s CLI is single-class homogeneous.** Brief vs. conventions disagree.
- **Brief's `parallel -j` fan-out pattern is broken at runtime.** GNU parallel single-quotes `{}` substitution, so BLIS receives one quoted string and `--metrics-path` is mangled. Agent had to switch to Python `concurrent.futures`. (Aside: nous's [#199] would let the campaign's documented commands be validated mechanically before launch.)
- **Brief's time budget is wildly off.** "iter-2: 5–10 min wall." Reality: 2h+ for one iter, including a 40+ min single-arm hang.
- **Brief lacks a kill-after-T policy for stuck runs.** Pairs with nous-side [#201].
- **Probe gate effectively bypassed.** Agent never wrote `probe_report.md`; advanced directly to `bundle.yaml`. Pairs with nous-side [#199] (whitelist extension) and [#200] (incomplete diagnostic).
- **`analysis_summary.json` and `manifest.json` end up in `runs/iter-1/results/`** instead of iter-root. Pairs with [#199]; agent's response to the iter-root whitelist forced this.

These will need their own tracker against `inference-sim`. Cross-linking from there to this issue is appropriate.

## Recommended landing order

1. **#193** first — without a sandbox fix, every SDK campaign is broken or burns turns adapting.
2. **#194** + **#202** — paired off-by-one fix + warning rephrase. Small, high-value.
3. **#199** — unblocks paper-* (and any future custom-artifact campaigns). Foundation for #200.
4. **#200** + **#201** — together they make EXECUTE_ANALYZE failures self-diagnosing.
5. **#197** + **#198** — operator quality-of-life, lower urgency.
6. **#195** + **#196** — log/status cosmetics; polish-tier.

## Forward references

- **PR #192** — the cleanup PR this friction-test was run against (merged to `reflective` as `c6a6ee6`).
- **Issue #189** (closed) — the previous tracking epic.
- **`paper-burst` artifacts on disk** — preserved at `inference-sim/.nous/paper-burst/runs/iter-1/` for reference (49/50 BLIS results + valid `analysis_summary.json`).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: paper-burst friction-test findings (post-#189) #203

What this is

Friction surface (paper-burst, 2026-05-26)

Subissues

Out of scope (campaign-spec friction; belongs in inference-sim)

Recommended landing order

Forward references

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tracking: paper-burst friction-test findings (post-#189) #203

Description

What this is

Friction surface (paper-burst, 2026-05-26)

Subissues

Out of scope (campaign-spec friction; belongs in inference-sim)

Recommended landing order

Forward references

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions