You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A consolidated tracking issue for friction surfaced by a clean-room friction-test of the paper-burst campaign on top of the post-#189 nous (PR #192 → reflective @ c6a6ee6).
The friction-test goal: launch paper-burst (the smallest of six SIGMETRICS paper-reproduction campaigns), use nous as a real campaign author would, and surface every operational paper-cut without filtering. The campaign reached iter-1 DONE with ground_truth_held: false — but only after 2h+ wall-clock, manual sandbox bypass, and an external nous stop + kill once a hung BLIS subprocess wedged iter-2's EXECUTE_ANALYZE. The findings cluster into 10 nous-side improvements, captured below.
Friction surface (paper-burst, 2026-05-26)
Launch: nous run … --max-iterations 1 --auto-approve --agent sdk
Iter-1 DESIGN: ✓ bundle.yaml + problem.md + handoff_snapshot.md (high-quality)
Iter-1 DESIGN gate: ✓ auto-approved
Iter-1 EXECUTE_ANALYZE (try 1): ✗ SDK sandbox blocked BLIS writes → /tmp workarounds → killed
Workaround applied: permission_mode="bypassPermissions" in sdk_dispatch.py
Iter-1 EXECUTE_ANALYZE (try 2): ✓ 49/50 BLIS results, analysis_summary.json valid, ground_truth_held=false
Iter-2 DESIGN: ✓ started (campaign continued past intended cap because resume
doesn't preserve --max-iterations)
Stop: nous stop + kill (sentinel honored at iter boundary, but iter-2
was already mid-flight — phase boundaries needed)
Out of scope (campaign-spec friction; belongs in inference-sim)
The friction-test also surfaced campaign-spec issues that live in the target repo (inference-sim/.nous/paper-burst-brief.md + paper-conventions.md), not in nous. Captured here for completeness:
Probe brief's CLI command can't activate enforcement policies. BLIS's sim/simulator.go:811 requires req.TenantID != ""; the brief's flat --rate 1 --num-requests 5 … produces anonymous requests, so all 5 policies degenerate to FCFS and produce byte-identical outputs. Probe assertion (a) "ordering ≠ FCFS" cannot pass.
paper-conventions.md says probe should be 2-tenant A:light + B:heavy; paper-burst-brief.md's CLI is single-class homogeneous. Brief vs. conventions disagree.
Brief's parallel -j fan-out pattern is broken at runtime. GNU parallel single-quotes {} substitution, so BLIS receives one quoted string and --metrics-path is mangled. Agent had to switch to Python concurrent.futures. (Aside: nous's [Validator's iter-root whitelist needs a per-campaign extension mechanism #199] would let the campaign's documented commands be validated mechanically before launch.)
Brief's time budget is wildly off. "iter-2: 5–10 min wall." Reality: 2h+ for one iter, including a 40+ min single-arm hang.
paper-burst artifacts on disk — preserved at inference-sim/.nous/paper-burst/runs/iter-1/ for reference (49/50 BLIS results + valid analysis_summary.json).
What this is
A consolidated tracking issue for friction surfaced by a clean-room friction-test of the
paper-burstcampaign on top of the post-#189 nous (PR #192 →reflective@c6a6ee6).The friction-test goal: launch
paper-burst(the smallest of six SIGMETRICS paper-reproduction campaigns), use nous as a real campaign author would, and surface every operational paper-cut without filtering. The campaign reached iter-1 DONE withground_truth_held: false— but only after 2h+ wall-clock, manual sandbox bypass, and an externalnous stop+ kill once a hung BLIS subprocess wedged iter-2's EXECUTE_ANALYZE. The findings cluster into 10 nous-side improvements, captured below.Friction surface (paper-burst, 2026-05-26)
Subissues
Critical (block any nontrivial campaign):
bug,foundation]state.json.iterationoff-by-one during iter-1 (reads 0 throughout) [bug]High-priority operator UX:
nous status'last tool' field permanently empty (extracts wrong attribute) #195 —nous status'last tool' field permanently empty (extracts wrong attribute) [bug]--max-iterationsdoesn't survivenous resume; silently defaults to 10 #197 —--max-iterationsdoesn't survivenous resume; silently defaults to 10 [bug,ux]nous stopshould honor phase boundaries, not just iteration boundaries #198 —nous stopshould honor phase boundaries, not just iteration boundaries [enhancement,ux]Medium structural / extensibility:
enhancement]enhancement,ux]enhancement]Cosmetic / polish:
bug,ux]ux]Out of scope (campaign-spec friction; belongs in inference-sim)
The friction-test also surfaced campaign-spec issues that live in the target repo (
inference-sim/.nous/paper-burst-brief.md+paper-conventions.md), not in nous. Captured here for completeness:sim/simulator.go:811requiresreq.TenantID != ""; the brief's flat--rate 1 --num-requests 5 …produces anonymous requests, so all 5 policies degenerate to FCFS and produce byte-identical outputs. Probe assertion (a) "ordering ≠ FCFS" cannot pass.paper-conventions.mdsays probe should be 2-tenant A:light + B:heavy;paper-burst-brief.md's CLI is single-class homogeneous. Brief vs. conventions disagree.parallel -jfan-out pattern is broken at runtime. GNU parallel single-quotes{}substitution, so BLIS receives one quoted string and--metrics-pathis mangled. Agent had to switch to Pythonconcurrent.futures. (Aside: nous's [Validator's iter-root whitelist needs a per-campaign extension mechanism #199] would let the campaign's documented commands be validated mechanically before launch.)probe_report.md; advanced directly tobundle.yaml. Pairs with nous-side [Validator's iter-root whitelist needs a per-campaign extension mechanism #199] (whitelist extension) and [ExecuteAnalyzeIncompleteError analog of #187 for the EXECUTE phase #200] (incomplete diagnostic).analysis_summary.jsonandmanifest.jsonend up inruns/iter-1/results/instead of iter-root. Pairs with [Validator's iter-root whitelist needs a per-campaign extension mechanism #199]; agent's response to the iter-root whitelist forced this.These will need their own tracker against
inference-sim. Cross-linking from there to this issue is appropriate.Recommended landing order
--max-iterationsdoesn't survivenous resume; silently defaults to 10 #197 +nous stopshould honor phase boundaries, not just iteration boundaries #198 — operator quality-of-life, lower urgency.nous status'last tool' field permanently empty (extracts wrong attribute) #195 + Dispatch log misattributes SDKDispatcher runs as 'CLIDispatcher' #196 — log/status cosmetics; polish-tier.Forward references
reflectiveasc6a6ee6).paper-burstartifacts on disk — preserved atinference-sim/.nous/paper-burst/runs/iter-1/for reference (49/50 BLIS results + validanalysis_summary.json).