Skip to content

Commit 9ea7370

Browse files
cquil11claude
andcommitted
benchmarks(agentic): switch to with-subagents corpus + idle-gap cap
Roll the aiperf submodule to dde0cc53, which: - Adds the semianalysis_cc_traces_weka_with_subagents public-dataset entry pointing at semianalysisai/cc-traces-weka-with-subagents-051926 - Switches the inferencex-agentx-mvp scenario to that corpus and to the new --trace-idle-gap-cap-seconds=60.0 lock (drops the legacy --use-think-time-only + --inter-turn-delay-cap-seconds pair) Update benchmark_lib.sh's resolve_trace_source() to download the new dataset and pass --public-dataset semianalysis_cc_traces_weka_with_subagents, and refresh the build_replay_cmd() comment to reflect the new lock. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Cam Quilici <cjquilici@gmail.com>
1 parent bd290a0 commit 9ea7370

2 files changed

Lines changed: 15 additions & 11 deletions

File tree

benchmarks/benchmark_lib.sh

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -902,16 +902,17 @@ ensure_hf_cli() {
902902
}
903903

904904
resolve_trace_source() {
905-
local dataset="semianalysisai/cc-traces-weka-no-subagents-051226"
905+
local dataset="semianalysisai/cc-traces-weka-with-subagents-051926"
906906
# aiperf reads the corpus via its public-dataset registry. The
907907
# inferencex-agentx-mvp scenario hard-requires loader=one of
908-
# ['semianalysis_cc_traces_weka_no_subagents', 'weka_trace'] (see
908+
# ['semianalysis_cc_traces_weka_with_subagents', 'weka_trace'] (see
909909
# aiperf src/aiperf/common/scenario/inferencex_agentx_mvp.py's
910-
# `require_loader`). The bare `semianalysis_cc_traces_weka` loader
911-
# points at the older 042026 corpus with subagent fan-out and is no
912-
# longer accepted as of upstream PR #875.
913-
TRACE_SOURCE_FLAG="--public-dataset semianalysis_cc_traces_weka_no_subagents"
914-
echo "Loading traces via aiperf public-dataset: semianalysis_cc_traces_weka_no_subagents ($dataset)"
910+
# `require_loader`). The with-subagents corpus captures the parent +
911+
# Task-tool sub-agent fan-out structure of real Claude Code sessions
912+
# (219 traces, v5-only, CC >= 2.1.139, classifier-call OSL spike
913+
# filtered).
914+
TRACE_SOURCE_FLAG="--public-dataset semianalysis_cc_traces_weka_with_subagents"
915+
echo "Loading traces via aiperf public-dataset: semianalysis_cc_traces_weka_with_subagents ($dataset)"
915916
# Pre-download the dataset into the shared HF_HUB_CACHE (same mount used
916917
# for model weights) so subsequent runs read from cache instead of
917918
# re-downloading every job.
@@ -955,9 +956,12 @@ build_replay_cmd() {
955956
# the just-generated KV blocks at the cost of hash-id fidelity past
956957
# turn 0 — which is exactly what we want for benchmark numbers.
957958
#
958-
# The scenario plugin locks: --cache-bust first_turn_prefix,
959-
# --inter-turn-delay-cap-seconds 60, etc., and auto-injects them — so
960-
# we do not pass them. See utils/aiperf/docs/tutorials/agentx-mvp.md.
959+
# The scenario plugin locks: --cache-bust first_turn_prefix and
960+
# --trace-idle-gap-cap-seconds 60 (per-trace idle-gap compression
961+
# against parent + subagent request-start timestamps; supersedes the
962+
# legacy --use-think-time-only / --inter-turn-delay-cap-seconds path),
963+
# and auto-injects them — so we do not pass them. See
964+
# utils/aiperf/docs/tutorials/agentx-mvp.md.
961965
local result_dir="$1"
962966
local duration="${DURATION:-1800}"
963967

0 commit comments

Comments
 (0)