agentic: bump aiperf Configure Profiling timeout 900s -> 1800s

cquil11 · claude · cquil11 · commit 405a86ab27ff · 2026-05-14T11:11:11.000-05:00
R3 H200 marathon hit aiperf's 900s Configure Profiling timeout on every
job (14/14). Root cause: H200 /tmp is ~3x slower than B300's, and 14
parallel jobs contending for the same shared /tmp pushed dataset config
(load + reconstruct + 62GB mmap + 4min inputs.json) past the 900s
budget. vLLM started cleanly in every case — aiperf itself timed out
during its own dataset materialization step.

Bump the timeout to 1800s (30 min) so worst-case slow-/tmp + contention
scenarios still fit. Post-setup measurement window unaffected.

Co-Authored-By: Claude Opus 4.7 &lt;noreply@anthropic.com&gt;
diff --git a/benchmarks/benchmark_lib.sh b/benchmarks/benchmark_lib.sh
@@ -956,14 +956,15 @@ build_replay_cmd() {
 
     export AIPERF_DATASET_WEKA_LIVE_ASSISTANT_RESPONSES=1
     # Dataset configuration (load + reconstruct + inputs.json + mmap)
-    # routinely takes 4-5 min for the 949-trace weka corpus. The default
-    # 300s timeout flips parallel jobs into TimeoutError mid-setup when
-    # many launchers contend for the shared HF cache + tmpfs. Bump to
-    # 900s — the post-setup measurement window is unaffected.
-    export AIPERF_DATASET_CONFIGURATION_TIMEOUT=900
+    # routinely takes 4-5 min for the 949-trace weka corpus on fast /tmp
+    # (B300) but can stretch to 14 min on slower /tmp + parallel contention
+    # (observed on H200 where all 14 R3 jobs hit aiperf's 900s Configure
+    # Profiling timeout simultaneously). Bump to 1800s to absorb 3x
+    # worst-case slowdown — the post-setup measurement window is unaffected.
+    export AIPERF_DATASET_CONFIGURATION_TIMEOUT=1800
     # aiperf validates that SERVICE_PROFILE_CONFIGURE_TIMEOUT >=
     # DATASET_CONFIGURATION_TIMEOUT at startup. Bump it in lockstep.
-    export AIPERF_SERVICE_PROFILE_CONFIGURE_TIMEOUT=900
+    export AIPERF_SERVICE_PROFILE_CONFIGURE_TIMEOUT=1800
     REPLAY_CMD="aiperf profile --scenario inferencex-agentx-mvp"
     REPLAY_CMD+=" --url http://localhost:$PORT"
     REPLAY_CMD+=" --endpoint /v1/chat/completions"