Skip to content

Tracking: paper-burst friction-test findings (post-#189) #203

@sriumcp

Description

@sriumcp

What this is

A consolidated tracking issue for friction surfaced by a clean-room friction-test of the paper-burst campaign on top of the post-#189 nous (PR #192reflective @ c6a6ee6).

The friction-test goal: launch paper-burst (the smallest of six SIGMETRICS paper-reproduction campaigns), use nous as a real campaign author would, and surface every operational paper-cut without filtering. The campaign reached iter-1 DONE with ground_truth_held: false — but only after 2h+ wall-clock, manual sandbox bypass, and an external nous stop + kill once a hung BLIS subprocess wedged iter-2's EXECUTE_ANALYZE. The findings cluster into 10 nous-side improvements, captured below.

Friction surface (paper-burst, 2026-05-26)

Launch:                         nous run … --max-iterations 1 --auto-approve --agent sdk
Iter-1 DESIGN:                  ✓ bundle.yaml + problem.md + handoff_snapshot.md (high-quality)
Iter-1 DESIGN gate:             ✓ auto-approved
Iter-1 EXECUTE_ANALYZE (try 1): ✗ SDK sandbox blocked BLIS writes → /tmp workarounds → killed
Workaround applied:             permission_mode="bypassPermissions" in sdk_dispatch.py
Iter-1 EXECUTE_ANALYZE (try 2): ✓ 49/50 BLIS results, analysis_summary.json valid, ground_truth_held=false
Iter-2 DESIGN:                  ✓ started (campaign continued past intended cap because resume
                                  doesn't preserve --max-iterations)
Stop:                           nous stop + kill (sentinel honored at iter boundary, but iter-2
                                  was already mid-flight — phase boundaries needed)

Subissues

Critical (block any nontrivial campaign):

High-priority operator UX:

Medium structural / extensibility:

Cosmetic / polish:

Out of scope (campaign-spec friction; belongs in inference-sim)

The friction-test also surfaced campaign-spec issues that live in the target repo (inference-sim/.nous/paper-burst-brief.md + paper-conventions.md), not in nous. Captured here for completeness:

These will need their own tracker against inference-sim. Cross-linking from there to this issue is appropriate.

Recommended landing order

  1. SDK sandbox blocks campaign-write paths; needs configurable bypass / auto-allowlist #193 first — without a sandbox fix, every SDK campaign is broken or burns turns adapting.
  2. state.json.iteration off-by-one during iter-1 (reads 0 throughout) #194 + Resume's 'iteration=0; starting fresh' warning misleads operators #202 — paired off-by-one fix + warning rephrase. Small, high-value.
  3. Validator's iter-root whitelist needs a per-campaign extension mechanism #199 — unblocks paper-* (and any future custom-artifact campaigns). Foundation for ExecuteAnalyzeIncompleteError analog of #187 for the EXECUTE phase #200.
  4. ExecuteAnalyzeIncompleteError analog of #187 for the EXECUTE phase #200 + Detect SDK turn silence (no streaming events for N min) and surface to retry_log #201 — together they make EXECUTE_ANALYZE failures self-diagnosing.
  5. --max-iterations doesn't survive nous resume; silently defaults to 10 #197 + nous stop should honor phase boundaries, not just iteration boundaries #198 — operator quality-of-life, lower urgency.
  6. nous status 'last tool' field permanently empty (extracts wrong attribute) #195 + Dispatch log misattributes SDKDispatcher runs as 'CLIDispatcher' #196 — log/status cosmetics; polish-tier.

Forward references

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions