Skip to content

Commit dbd970c

Browse files
cquil11claude
andcommitted
agentic: switch to no-subagents loader + sudo git install for non-root containers
R17 surfaced two distinct failures, one per cluster: 1) gb300-cw (all 3 shards): aiperf rejected --public-dataset semianalysis_cc_traces_weka with "Scenario invariants violated ... required loader=any of ['semianalysis_cc_traces_weka_no_subagents', 'weka_trace']". Yesterday's aiperf merge (PR #875 commit fef78a96) switched the inferencex-agentx-mvp scenario's default corpus to the 051226 no-subagents 949-trace variant and tightened the loader contract. The old name is no longer accepted. Fix: resolve_trace_source emits --public-dataset semianalysis_cc_traces_weka_no_subagents. 2) gb300-nv (all 3 shards): "dpkg: error: requested operation requires superuser privilege" from yesterday's install_agentic_deps git install path. The gb300-nv pyxis/enroot setup maps the calling user (sa-shared) into the container as non-root, while gb300-cw runs as root. The git install needs sudo on nv; cw is fine without. Fix: branch on `id -u` — apt-get directly when root, sudo apt-get otherwise. The vllm-base layer installs `sudo` so the binary is available, and the typical enroot config grants the calling user passwordless sudo. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 9d91647 commit dbd970c

1 file changed

Lines changed: 19 additions & 6 deletions

File tree

benchmarks/benchmark_lib.sh

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -906,11 +906,15 @@ ensure_hf_cli() {
906906

907907
resolve_trace_source() {
908908
local dataset="semianalysisai/cc-traces-weka-no-subagents-051226"
909-
# aiperf reads the corpus via its public-dataset registry; the loader
910-
# under the hood pulls from semianalysisai/cc-traces-weka-no-subagents-051226
911-
# (949 traces, no-subagents variant — see plugins.yaml).
912-
TRACE_SOURCE_FLAG="--public-dataset semianalysis_cc_traces_weka"
913-
echo "Loading traces via aiperf public-dataset: semianalysis_cc_traces_weka ($dataset)"
909+
# aiperf reads the corpus via its public-dataset registry. The
910+
# inferencex-agentx-mvp scenario hard-requires loader=one of
911+
# ['semianalysis_cc_traces_weka_no_subagents', 'weka_trace'] (see
912+
# aiperf src/aiperf/common/scenario/inferencex_agentx_mvp.py's
913+
# `require_loader`). The bare `semianalysis_cc_traces_weka` loader
914+
# points at the older 042026 corpus with subagent fan-out and is no
915+
# longer accepted as of upstream PR #875.
916+
TRACE_SOURCE_FLAG="--public-dataset semianalysis_cc_traces_weka_no_subagents"
917+
echo "Loading traces via aiperf public-dataset: semianalysis_cc_traces_weka_no_subagents ($dataset)"
914918
# Pre-download the dataset into the shared HF_HUB_CACHE (same mount used
915919
# for model weights) so subsequent runs read from cache instead of
916920
# re-downloading every job.
@@ -926,8 +930,17 @@ install_agentic_deps() {
926930
# and in your PATH?
927931
# Install on demand; cheap no-op when git is already present
928932
# (e.g. on AMD images that ship it).
933+
#
934+
# Some pyxis/enroot setups map the calling user into the container
935+
# as non-root (gb300-nv does this; gb300-cw runs as root). Use sudo
936+
# when not root — the vllm-base layer installs `sudo` and the typical
937+
# enroot config grants the calling user passwordless sudo.
929938
if ! command -v git >/dev/null 2>&1; then
930-
apt-get update -qq && apt-get install -y -qq git
939+
if [ "$(id -u)" -eq 0 ]; then
940+
apt-get update -qq && apt-get install -y -qq git
941+
else
942+
sudo apt-get update -qq && sudo apt-get install -y -qq git
943+
fi
931944
fi
932945
agentic_pip_install --quiet urllib3 requests 2>/dev/null || true
933946
agentic_pip_install -q -r "$AGENTIC_DIR/requirements.txt"

0 commit comments

Comments
 (0)