chore(perf-changelog): re-trigger sweep for per-worker power aggregation

arygupt · claude · arygupt · commit f5b5c774a460 · 2026-05-28T11:08:07.000-07:00
Workflow's paths: filter only fires on perf-changelog.yaml. This bumps the dsv4-fp4-gb300-dynamo-sglang entry so the sweep picks up the new per-worker power + per-stage J/token aggregation from 24f46ff. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
diff --git a/perf-changelog.yaml b/perf-changelog.yaml
@@ -3177,4 +3177,5 @@
   description:
     - "Smoke run validating multinode measured-power aggregation (PR #1574). No config change; entry exists to trigger a sweep that produces the first multinode agg JSON with avg_power_w + joules_per_*_token populated from per-node srt-slurm perfmon CSVs. Validates per-source GPU-id namespacing in aggregate_power.py (without it, 14 nodes × 4 GPUs would report num_gpus=4 instead of 56) and the GPU_METRICS_CSV_GLOB env var bridge in process_result.py. Only the gb300-cw runner has the perfmon launcher changes; any gb300-nv runs in the sweep will succeed normally without power fields, which the dashboard handles gracefully (chart gates on field presence)."
     - "Re-run after launcher recurse-glob fix (6da2f1b6) — prior sweep (#26548110246) completed green at the workflow level but produced 0 measured-power rows because the flat *.yaml glob in the monitoring-injection loop matched zero recipes (recipes live in 8k1k/ subdir). Fix uses `find -type f -name '*.yaml'`. Also re-pointed SemiAnalysisAI/srt-slurm@feat/inferencex-perfmon onto current NVIDIA/srt-slurm main so the launcher's `default_bash_preamble:` srtslurm.yaml field is accepted by srtctl schema."
+    - "Re-run after per-worker aggregation (24f46ffe) — validates new agg JSON fields: power_by_worker[] with role labels (prefill/decode/agg/frontend) parsed from srt-slurm perfmon CSV filenames, and joules_per_input_token using per-stage energy attribution (prefill_energy / input_tokens). joules_per_output_token and joules_per_total_token now use per-stage math for disagg runs. Backward compatible: single-node and non-disagg multinode keep cluster-wide ratios."
   pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1574