Skip to content

[F18] plot_specs declarative figure pipeline + nous package for paper artifact tarball #263

@sriumcp

Description

@sriumcp

Problem

nous's REPORT phase emits findings.json (synthesized analysis) and per-arm result JSONs in runs/iter-N/results/. It does NOT emit the figures a paper would publish — those are the campaign-author's responsibility to generate after the campaign completes. For SIGMETRICS-style artifact evaluation, reviewers want a single tarball that reproduces the paper's figures end-to-end; nous gives the data, not the figures.

Every nous-driven paper hits this gap and rebuilds the same scaffolding.

Desired behavior

Three additions:

(1) plot_specs block in campaign.yaml — declarative figure declarations:

plot_specs:
  - id: figure-1-mirage-ratio
    consumes: [h-main]                 # which arms produce input data
    metrics: [memorytime_share_ratio]
    script: plots/figure_1_mirage.py   # path to the figure-generation script
    outputs: [figures/figure-1.pdf, figures/figure-1.png]
    caption: "Mirage manifests under WFQ; KV-time corrects."

The REPORT phase invokes each script after findings.json is written, passing the per-arm results/*.json paths.

(2) runs/iter-N/figures/ directory automatically populated alongside results/ — the structural counterpart to results, but for reviewer-facing artifacts.

(3) nous package <work_dir> command producing a tarball containing:

  • the work_dir
  • a reproduce.sh template invoking the campaign
  • a Dockerfile pinning captured deps (uses F17's reproducibility metadata)
  • a README describing what's inside

Solves the artifact-evaluation use case in one command rather than each paper author writing their own glue.

Suggested implementation sketch

For (1) and (2):

  1. Add plot_specs to the campaign.yaml schema.
  2. In the REPORT phase, after findings.json, iterate plot_specs and invoke each script via subprocess. Place outputs under runs/iter-N/figures/.
  3. Pass the script a stable contract: e.g., environment variable NOUS_RESULTS_DIR and NOUS_FIGURES_DIR.

For (3):

  1. Add a package subcommand to orchestrator/cli.py.
  2. Tarball the work_dir with bundled reproduce.sh, Dockerfile, README.md.
  3. The Dockerfile consumes F17's reproducibility_metadata to pin language versions and target-repo commit.

Acceptance criteria

  • plot_specs is documented in the campaign.yaml schema.
  • REPORT phase auto-invokes plot scripts when plot_specs is present.
  • runs/iter-N/figures/ is created and populated.
  • nous package <work_dir> produces a self-contained tarball.
  • Friction report F18 row in the tracking issue checks off.

Severity

MEDIUM — paper authors must hand-roll figure generation; reviewers can't directly inspect figures from artifacts.

Source

friction-report.md F18, paper-memorytime-mirage campaign (2026-05). Depends on F17 for reproducibility metadata.


Part of friction-report tracking issue #245.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfriction-reportFrom external campaign-author friction reports

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions