Skip to content

Commit fc5a792

Browse files
cquil11claude
andcommitted
benchmarks(agentic): disable DCGM gpu_telemetry in aiperf invocation
aiperf's GpuMetricTimeSeries.append_snapshot freezes the metric schema on the first DCGM scrape; any optional field that's None on the first scrape (xid_errors most commonly, also power_violation, encoder_utilization) then raises KeyError when it first appears mid-run. The exception is caught at records_manager.py:609 so the run completes, but every late telemetry sample is dropped silently and the error count grows. We don't consume the gpu_telemetry_export.jsonl artifact in downstream processing (process_agentic_result.py only reads aiperf's server-metrics output and the per-request profile export). Server-side /metrics from vLLM/sglang flows through a separate path and is unaffected — KV cache usage, prefix cache hit rate, throughput etc. still populate. Until the aiperf upstream patch lands (dynamic schema extension in telemetry_models.py), --no-gpu-telemetry sidesteps the bug entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d7841d8 commit fc5a792

1 file changed

Lines changed: 8 additions & 0 deletions

File tree

benchmarks/benchmark_lib.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1048,6 +1048,14 @@ build_replay_cmd() {
10481048
# CPU on minimax-m2.5 at high concurrency. Lossless for vLLM (server
10491049
# usage is authoritative).
10501050
REPLAY_CMD+=" --use-server-token-count"
1051+
# Disable DCGM GPU telemetry collection. aiperf's GpuMetricTimeSeries
1052+
# freezes its metric schema on the first DCGM scrape, then KeyErrors when
1053+
# an optional field (xid_errors, power_violation, encoder_utilization)
1054+
# first appears mid-run. We don't consume the gpu_telemetry artifact in
1055+
# downstream processing, and the server-metrics path (Prometheus /metrics
1056+
# from vLLM) is unaffected by this flag and still gives us KV usage,
1057+
# prefix cache hit rate, etc.
1058+
REPLAY_CMD+=" --no-gpu-telemetry"
10511059
# aiperf's dataset manager (separate from the inference parser) loads
10521060
# the model's tokenizer for trace-prompt tokenization regardless of
10531061
# --use-server-token-count. Models like kimi (amd/Kimi-K2.5-MXFP4,

0 commit comments

Comments
 (0)