Problem
nous's REPORT phase emits findings.json (synthesized analysis) and per-arm result JSONs in runs/iter-N/results/. It does NOT emit the figures a paper would publish — those are the campaign-author's responsibility to generate after the campaign completes. For SIGMETRICS-style artifact evaluation, reviewers want a single tarball that reproduces the paper's figures end-to-end; nous gives the data, not the figures.
Every nous-driven paper hits this gap and rebuilds the same scaffolding.
Desired behavior
Three additions:
(1) plot_specs block in campaign.yaml — declarative figure declarations:
plot_specs:
- id: figure-1-mirage-ratio
consumes: [h-main] # which arms produce input data
metrics: [memorytime_share_ratio]
script: plots/figure_1_mirage.py # path to the figure-generation script
outputs: [figures/figure-1.pdf, figures/figure-1.png]
caption: "Mirage manifests under WFQ; KV-time corrects."
The REPORT phase invokes each script after findings.json is written, passing the per-arm results/*.json paths.
(2) runs/iter-N/figures/ directory automatically populated alongside results/ — the structural counterpart to results, but for reviewer-facing artifacts.
(3) nous package <work_dir> command producing a tarball containing:
- the work_dir
- a
reproduce.sh template invoking the campaign
- a Dockerfile pinning captured deps (uses F17's reproducibility metadata)
- a README describing what's inside
Solves the artifact-evaluation use case in one command rather than each paper author writing their own glue.
Suggested implementation sketch
For (1) and (2):
- Add
plot_specs to the campaign.yaml schema.
- In the REPORT phase, after
findings.json, iterate plot_specs and invoke each script via subprocess. Place outputs under runs/iter-N/figures/.
- Pass the script a stable contract: e.g., environment variable
NOUS_RESULTS_DIR and NOUS_FIGURES_DIR.
For (3):
- Add a
package subcommand to orchestrator/cli.py.
- Tarball the work_dir with bundled
reproduce.sh, Dockerfile, README.md.
- The
Dockerfile consumes F17's reproducibility_metadata to pin language versions and target-repo commit.
Acceptance criteria
Severity
MEDIUM — paper authors must hand-roll figure generation; reviewers can't directly inspect figures from artifacts.
Source
friction-report.md F18, paper-memorytime-mirage campaign (2026-05). Depends on F17 for reproducibility metadata.
Part of friction-report tracking issue #245.
Problem
nous's REPORT phase emits
findings.json(synthesized analysis) and per-arm result JSONs inruns/iter-N/results/. It does NOT emit the figures a paper would publish — those are the campaign-author's responsibility to generate after the campaign completes. For SIGMETRICS-style artifact evaluation, reviewers want a single tarball that reproduces the paper's figures end-to-end; nous gives the data, not the figures.Every nous-driven paper hits this gap and rebuilds the same scaffolding.
Desired behavior
Three additions:
(1)
plot_specsblock in campaign.yaml — declarative figure declarations:The REPORT phase invokes each script after
findings.jsonis written, passing the per-armresults/*.jsonpaths.(2)
runs/iter-N/figures/directory automatically populated alongsideresults/— the structural counterpart to results, but for reviewer-facing artifacts.(3)
nous package <work_dir>command producing a tarball containing:reproduce.shtemplate invoking the campaignSolves the artifact-evaluation use case in one command rather than each paper author writing their own glue.
Suggested implementation sketch
For (1) and (2):
plot_specsto the campaign.yaml schema.findings.json, iterateplot_specsand invoke each script via subprocess. Place outputs underruns/iter-N/figures/.NOUS_RESULTS_DIRandNOUS_FIGURES_DIR.For (3):
packagesubcommand toorchestrator/cli.py.reproduce.sh,Dockerfile,README.md.Dockerfileconsumes F17'sreproducibility_metadatato pin language versions and target-repo commit.Acceptance criteria
plot_specsis documented in the campaign.yaml schema.plot_specsis present.runs/iter-N/figures/is created and populated.nous package <work_dir>produces a self-contained tarball.Severity
MEDIUM — paper authors must hand-roll figure generation; reviewers can't directly inspect figures from artifacts.
Source
friction-report.mdF18, paper-memorytime-mirage campaign (2026-05). Depends on F17 for reproducibility metadata.Part of friction-report tracking issue #245.