Skip to content

Commit c08f658

Browse files
aryguptclaude
andcommitted
feat(inference): measured-power Y-axis metrics on scatter chart
Adds two new options under a new "Measured Energy" dropdown group on both the "vs. Interactivity" and "vs. E2E Latency" charts: - Measured Avg Power per GPU (W) — no roofline (no universal "better" direction) - Measured J per Output Token (J/tok) — roofline lower_right (interactivity) / lower_left (e2e) Distinct from the existing y_jTotal/y_jOutput/y_jInput which derive joules from each GPU's spec-sheet TDP. The new metrics are sourced from runner GPU telemetry averaged over the exact bench load window (see aggregate_power.py in semianalysisai/InferenceX). Wiring: - packages/constants/src/metric-keys.ts: register avg_power_w, joules_per_output_token in the canonical metric key set so the ETL auto-capture warning doesn't fire. - packages/app/src/lib/benchmark-transform.ts: pass the two raw fields through rowToAggDataEntry. Left undefined when absent so downstream code can distinguish "no measurement" from "0 W". - packages/app/src/components/inference/types.ts: extend AggDataEntry, InferenceData, YAxisMetricKey, and ChartDefinition. - packages/app/src/lib/chart-utils.ts: extend Y_AXIS_METRICS, createChartDataPoint (gated on typeof===number), calculateRoofline and computeAllRooflines yKey union, markRooflinePoints init+mark blocks. - packages/app/src/components/inference/inference-chart-config.json: add y_measured* entries to both chartTypes. - packages/app/src/components/inference/ui/ChartControls.tsx: add "Measured Energy" group to METRIC_GROUPS. The overlay (unofficial run) path is automatic — transformBenchmarkRows is shared between official and overlay rendering, so the new metrics flow to ?unofficialrun= URLs once the runner-side PR is merged and benchmarks ingest the new fields. For rows without measured-power data (historical runs, runs predating aggregate_power.py, runs where the SMI poller didn't start), points are simply omitted from the new charts — the existing TDP-derived y_jTotal/y_jOutput/y_jInput stay visible as a comparable fallback. Verification: - pnpm typecheck: clean - pnpm lint: 0 warnings, 0 errors - pnpm test:unit: 1921/1921 passing (+7 new tests covering rowToAggDataEntry pass-through, createChartDataPoint field gating, zero-value preservation, missing-field handling) - Dev-server smoke: confirmed "Measured Energy" group label and both metric labels are present in the served JS bundle at /_next/static/chunks/ Follow-up: Cypress E2E covering both the official path and ?unofficialrun= overlay path for the two new metrics, to be added once the runner PR ships real data to the DB. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent e7d022d commit c08f658

8 files changed

Lines changed: 149 additions & 3 deletions

File tree

packages/app/src/components/inference/inference-chart-config.json

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,13 @@
8888
"y_jInput_label": "All-in Provisioned J per Input Token (J/tok)",
8989
"y_jInput_title": "All-in Provisioned Joules per Input Token",
9090
"y_jInput_roofline": "lower_right",
91+
"y_measuredAvgPower": "measuredAvgPower.y",
92+
"y_measuredAvgPower_label": "Measured Avg Power per GPU (W)",
93+
"y_measuredAvgPower_title": "Measured Average Power per GPU",
94+
"y_measuredJPerOutputToken": "measuredJPerOutputToken.y",
95+
"y_measuredJPerOutputToken_label": "Measured J per Output Token (J/tok)",
96+
"y_measuredJPerOutputToken_title": "Measured Joules per Output Token",
97+
"y_measuredJPerOutputToken_roofline": "lower_right",
9198
"y_cost_limit": 5,
9299
"y_latency_limit": 60
93100
},
@@ -179,6 +186,13 @@
179186
"y_jInput_label": "All-in Provisioned J per Input Token (J/tok)",
180187
"y_jInput_title": "All-in Provisioned Joules per Input Token",
181188
"y_jInput_roofline": "lower_left",
189+
"y_measuredAvgPower": "measuredAvgPower.y",
190+
"y_measuredAvgPower_label": "Measured Avg Power per GPU (W)",
191+
"y_measuredAvgPower_title": "Measured Average Power per GPU",
192+
"y_measuredJPerOutputToken": "measuredJPerOutputToken.y",
193+
"y_measuredJPerOutputToken_label": "Measured J per Output Token (J/tok)",
194+
"y_measuredJPerOutputToken_title": "Measured Joules per Output Token",
195+
"y_measuredJPerOutputToken_roofline": "lower_left",
182196
"y_cost_limit": 5,
183197
"y_latency_limit": 60
184198
}

packages/app/src/components/inference/types.ts

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,10 @@ export interface AggDataEntry {
6767
median_e2el: number;
6868
std_e2el: number;
6969
p99_e2el: number;
70+
// Measured GPU telemetry (emitted by runner's aggregate_power.py).
71+
// Optional because historical runs predate the field.
72+
avg_power_w?: number;
73+
joules_per_output_token?: number;
7074
disagg: boolean;
7175
num_prefill_gpu: number;
7276
num_decode_gpu: number;
@@ -152,6 +156,12 @@ export interface InferenceData extends Partial<Omit<AggDataEntry, AggDataConflic
152156
jTotal?: { y: number; roof: boolean };
153157
jOutput?: { y: number; roof: boolean };
154158
jInput?: { y: number; roof: boolean };
159+
160+
// Measured power / energy from runner GPU telemetry. Optional because
161+
// pre-aggregate_power.py runs (and runs with monitoring disabled) won't
162+
// emit these fields.
163+
measuredAvgPower?: { y: number; roof: boolean };
164+
measuredJPerOutputToken?: { y: number; roof: boolean };
155165
}
156166

157167
/**
@@ -177,7 +187,9 @@ export type YAxisMetricKey =
177187
| 'powerUser'
178188
| 'jTotal'
179189
| 'jOutput'
180-
| 'jInput';
190+
| 'jInput'
191+
| 'measuredAvgPower'
192+
| 'measuredJPerOutputToken';
181193

182194
/**
183195
* Defines the configuration and labels for a specific chart.
@@ -277,6 +289,18 @@ export interface ChartDefinition {
277289
y_jInput_label?: string;
278290
y_jInput_title?: string;
279291
y_jInput_roofline?: 'upper_right' | 'upper_left' | 'lower_left' | 'lower_right';
292+
// Measured power / energy from runner GPU telemetry
293+
y_measuredAvgPower?: string;
294+
y_measuredAvgPower_label?: string;
295+
y_measuredAvgPower_title?: string;
296+
// Intentionally no roofline for avg power: there's no universal "better"
297+
// direction for absolute draw. Omitting roofline causes computeAllRooflines
298+
// to skip the metric (it requires a direction); points render plain.
299+
y_measuredAvgPower_roofline?: 'upper_right' | 'upper_left' | 'lower_left' | 'lower_right';
300+
y_measuredJPerOutputToken?: string;
301+
y_measuredJPerOutputToken_label?: string;
302+
y_measuredJPerOutputToken_title?: string;
303+
y_measuredJPerOutputToken_roofline?: 'upper_right' | 'upper_left' | 'lower_left' | 'lower_right';
280304
y_cost_limit?: number;
281305
y_latency_limit?: number;
282306
}

packages/app/src/components/inference/ui/ChartControls.tsx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,10 @@ const METRIC_GROUPS = [
4646
},
4747
{ label: 'Cost per Million Input Tokens', metrics: ['y_costhi', 'y_costni', 'y_costri'] },
4848
{ label: 'All-in Provisioned Energy per Token', metrics: ['y_jTotal', 'y_jOutput', 'y_jInput'] },
49+
{
50+
label: 'Measured Energy',
51+
metrics: ['y_measuredAvgPower', 'y_measuredJPerOutputToken'],
52+
},
4953
{ label: 'Custom User Values', metrics: ['y_costUser', 'y_powerUser'] },
5054
];
5155

packages/app/src/lib/benchmark-transform.test.ts

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,24 @@ describe('rowToAggDataEntry', () => {
115115
const entryNull = rowToAggDataEntry(makeRow({ image: null }));
116116
expect(entryNull.image).toBeUndefined();
117117
});
118+
119+
it('passes through measured power telemetry fields when present', () => {
120+
const entry = rowToAggDataEntry(
121+
makeRow({
122+
metrics: { tput_per_gpu: 100, avg_power_w: 685.5, joules_per_output_token: 8.4 },
123+
}),
124+
);
125+
expect(entry.avg_power_w).toBe(685.5);
126+
expect(entry.joules_per_output_token).toBe(8.4);
127+
});
128+
129+
it('leaves measured power fields undefined for rows that predate the metric', () => {
130+
// Distinguishing "no measurement" from "0 W" matters: createChartDataPoint
131+
// uses typeof===number to decide whether to emit the measuredAvgPower field.
132+
const entry = rowToAggDataEntry(makeRow({ metrics: {} }));
133+
expect(entry.avg_power_w).toBeUndefined();
134+
expect(entry.joules_per_output_token).toBeUndefined();
135+
});
118136
});
119137

120138
describe('transformBenchmarkRows', () => {

packages/app/src/lib/benchmark-transform.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,11 @@ export function rowToAggDataEntry(row: BenchmarkRow): AggDataEntry {
4949
median_e2el: m.median_e2el ?? 0,
5050
std_e2el: m.std_e2el ?? 0,
5151
p99_e2el: m.p99_e2el ?? 0,
52+
// Measured GPU telemetry (runner's aggregate_power.py). Left undefined for
53+
// rows predating the field so downstream chart code can distinguish
54+
// "no measurement" from "0 W" via createChartDataPoint's typeof guard.
55+
avg_power_w: m.avg_power_w,
56+
joules_per_output_token: m.joules_per_output_token,
5257
disagg: row.disagg,
5358
num_prefill_gpu: row.num_prefill_gpu,
5459
num_decode_gpu: row.num_decode_gpu,

packages/app/src/lib/chart-utils.test.ts

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1218,6 +1218,55 @@ describe('createChartDataPoint energy fields', () => {
12181218
});
12191219
});
12201220

1221+
// ===========================================================================
1222+
// createChartDataPoint — measured power / energy fields (from runner telemetry)
1223+
// ===========================================================================
1224+
describe('createChartDataPoint measured power fields', () => {
1225+
it('emits measuredAvgPower when avg_power_w is present on the entry', () => {
1226+
const e = entry({ avg_power_w: 685.5 });
1227+
const point = createChartDataPoint('2025-01-01', e, 'median_e2el', 'tput_per_gpu', 'h100');
1228+
expect(point.measuredAvgPower).toBeDefined();
1229+
expect(point.measuredAvgPower!.y).toBe(685.5);
1230+
expect(point.measuredAvgPower!.roof).toBe(false);
1231+
});
1232+
1233+
it('emits measuredJPerOutputToken when joules_per_output_token is present', () => {
1234+
const e = entry({ joules_per_output_token: 8.4 });
1235+
const point = createChartDataPoint('2025-01-01', e, 'median_e2el', 'tput_per_gpu', 'h100');
1236+
expect(point.measuredJPerOutputToken).toBeDefined();
1237+
expect(point.measuredJPerOutputToken!.y).toBe(8.4);
1238+
});
1239+
1240+
it('omits both fields when neither is on the entry', () => {
1241+
// Legacy runs predating aggregate_power.py.
1242+
const point = createChartDataPoint(
1243+
'2025-01-01',
1244+
entry(),
1245+
'median_e2el',
1246+
'tput_per_gpu',
1247+
'h100',
1248+
);
1249+
expect(point.measuredAvgPower).toBeUndefined();
1250+
expect(point.measuredJPerOutputToken).toBeUndefined();
1251+
});
1252+
1253+
it('emits one and omits the other when only one is present', () => {
1254+
// Defensive: aggregator can patch only avg_power_w if total_output_tokens=0.
1255+
const e = entry({ avg_power_w: 500 });
1256+
const point = createChartDataPoint('2025-01-01', e, 'median_e2el', 'tput_per_gpu', 'h100');
1257+
expect(point.measuredAvgPower).toBeDefined();
1258+
expect(point.measuredJPerOutputToken).toBeUndefined();
1259+
});
1260+
1261+
it('preserves a zero measured power value (not falsy-coerced away)', () => {
1262+
// Guards against a refactor switching the gate from typeof===number to truthiness.
1263+
const e = entry({ avg_power_w: 0 });
1264+
const point = createChartDataPoint('2025-01-01', e, 'median_e2el', 'tput_per_gpu', 'h100');
1265+
expect(point.measuredAvgPower).toBeDefined();
1266+
expect(point.measuredAvgPower!.y).toBe(0);
1267+
});
1268+
});
1269+
12211270
// ===========================================================================
12221271
// createChartDataPoint — boolean narrowing for prefill/decode dp_attention, is_multinode
12231272
// ===========================================================================

packages/app/src/lib/chart-utils.ts

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,10 @@ export const Y_AXIS_METRICS = [
148148
'y_jTotal',
149149
'y_jOutput',
150150
'y_jInput',
151+
// Measured power / energy (sourced from runner's aggregate_power.py output;
152+
// distinct from the spec-sheet TDP-derived jTotal/jOutput/jInput above).
153+
'y_measuredAvgPower',
154+
'y_measuredJPerOutputToken',
151155
] as const;
152156

153157
export type YAxisMetric = (typeof Y_AXIS_METRICS)[number];
@@ -389,6 +393,16 @@ export function createChartDataPoint(
389393
},
390394
}
391395
: {}),
396+
397+
// Measured power / energy from runner's aggregate_power.py. Gated on the
398+
// raw fields existing so points from runs predating the measurement land
399+
// without these keys and the chart correctly filters them out.
400+
...(typeof entry.avg_power_w === 'number'
401+
? { measuredAvgPower: { y: entry.avg_power_w, roof: false } }
402+
: {}),
403+
...(typeof entry.joules_per_output_token === 'number'
404+
? { measuredJPerOutputToken: { y: entry.joules_per_output_token, roof: false } }
405+
: {}),
392406
};
393407
}
394408

@@ -549,7 +563,9 @@ export const calculateRoofline = (
549563
| `costri.y`
550564
| `jTotal.y`
551565
| `jOutput.y`
552-
| `jInput.y`,
566+
| `jInput.y`
567+
| `measuredAvgPower.y`
568+
| `measuredJPerOutputToken.y`,
553569
rooflineDirection: 'upper_right' | 'upper_left' | 'lower_left' | 'lower_right',
554570
): InferenceData[] => {
555571
const pointsForRoofline = points.map((p) => {
@@ -619,7 +635,9 @@ export function computeAllRooflines(
619635
| `costri.y`
620636
| `jTotal.y`
621637
| `jOutput.y`
622-
| `jInput.y`,
638+
| `jInput.y`
639+
| `measuredAvgPower.y`
640+
| `measuredJPerOutputToken.y`,
623641
rooflineDirection,
624642
);
625643
}
@@ -663,6 +681,8 @@ export function markRooflinePoints(
663681
if (newPoint.jTotal) newPoint.jTotal.roof = false;
664682
if (newPoint.jOutput) newPoint.jOutput.roof = false;
665683
if (newPoint.jInput) newPoint.jInput.roof = false;
684+
if (newPoint.measuredAvgPower) newPoint.measuredAvgPower.roof = false;
685+
if (newPoint.measuredJPerOutputToken) newPoint.measuredJPerOutputToken.roof = false;
666686

667687
for (const chartDefYKey of Y_AXIS_METRICS) {
668688
const rooflinePoints = computedRooflines[hwKey]?.[chartDefYKey];
@@ -722,6 +742,13 @@ export function markRooflinePoints(
722742
newPoint.jOutput.roof = onCurrentRoofline;
723743
} else if (chartDefYKey === 'y_jInput' && newPoint.jInput) {
724744
newPoint.jInput.roof = onCurrentRoofline;
745+
} else if (chartDefYKey === 'y_measuredAvgPower' && newPoint.measuredAvgPower) {
746+
newPoint.measuredAvgPower.roof = onCurrentRoofline;
747+
} else if (
748+
chartDefYKey === 'y_measuredJPerOutputToken' &&
749+
newPoint.measuredJPerOutputToken
750+
) {
751+
newPoint.measuredJPerOutputToken.roof = onCurrentRoofline;
725752
}
726753
}
727754
finalProcessedData.push(newPoint);

packages/constants/src/metric-keys.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,9 @@ export const METRIC_KEYS = new Set([
4343
'p99_intvty',
4444
'p99.9_intvty',
4545
'std_intvty',
46+
// measured power / energy (emitted by runner's aggregate_power.py)
47+
// avg_power_w: mean per-GPU draw (W) during the load window
48+
// joules_per_output_token: avg_power_w * num_gpus * duration / total_output_tokens
49+
'avg_power_w',
50+
'joules_per_output_token',
4651
]);

0 commit comments

Comments
 (0)