Skip to content

Commit f89cdfe

Browse files
cquil11claude
andcommitted
Merge origin/main into chore/agentx-v0.3 with fixed_seq_len/ reorg fixups
Resolutions: - perf-changelog.yaml: took main verbatim. - runners/launch_b300-nv.sh: took main (drops --nodelist pin entirely; supersedes our narrower 017-019 fix). - benchmarks/single_node/fixed_seq_len/dsv4_fp8_mi355x{,_vllm}.sh: accepted main's deletes (orphan recipes removed in #1374, #1501). - .github/configs/amd-master.yaml: took main as the base, then re-applied our agentic-only additions on top: * qwen3.5-fp8-mi355x-sglang-agentic-hicache (new entry) * dsv4-fp4-mi355x-vllm-agentic (new entry) * dsv4-fp4-mi355x-sglang-agentic (new entry) * kimik2.5-fp4-mi355x-vllm-agentic (cpu -> lmcache) Dropped our comment-path edit for dsv4_fp8_mi355x_vllm.sh since main deleted that entry. Fixed_seq_len reorg fixups for files added on main during our branch's lifetime: - git mv 14 stranded scripts from benchmarks/single_node/*.sh into benchmarks/single_node/fixed_seq_len/ (dsr1_fp4_b200_mtp, dsr1_fp4_mi355x_mtp, dsr1_fp8_h200_mtp, dsr1_fp8_mi325x_mtp, dsr1_fp8_mi355x_mtp, dsv4_fp4_mi355x_vllm, glm5_fp8_h200_mtp, glm5_fp8_mi325x, glm5_fp8_mi325x_mtp, qwen3.5_bf16_mi325x_mtp, qwen3.5_fp4_mi355x_mtp, qwen3.5_fp8_h100, qwen3.5_fp8_h100_mtp, qwen3.5_fp8_mi325x_mtp). Patched their source paths from ../benchmark_lib.sh to ../../benchmark_lib.sh. - runners/launch_mi355x-amds.sh: multinode-non-disagg BENCHMARK_SUBDIR bumped from `single_node` to `single_node/fixed_seq_len`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 parents 8eec0d4 + c5ff8da commit f89cdfe

91 files changed

Lines changed: 8059 additions & 1407 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/commands/klaud-pr-status-html.md

Lines changed: 296 additions & 0 deletions
Large diffs are not rendered by default.

.github/configs/amd-master.yaml

Lines changed: 265 additions & 74 deletions
Large diffs are not rendered by default.

.github/configs/nvidia-master.yaml

Lines changed: 656 additions & 153 deletions
Large diffs are not rendered by default.

.github/workflows/docker-tag-monitor.yml

Lines changed: 0 additions & 289 deletions
This file was deleted.

.github/workflows/e2e-tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -319,7 +319,7 @@ jobs:
319319
with:
320320
submodules: true
321321

322-
- uses: actions/setup-python@v5
322+
- uses: actions/setup-python@v6
323323
with:
324324
python-version: '3.11'
325325

AGENTS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
Guidance for AI agents working with InferenceX.
44

5+
> **Before debugging a failing Klaud-Cold / claude/* image-bump PR, read [`KLAUD_DEBUG.md`](KLAUD_DEBUG.md).** It captures recurring failure modes (vLLM CUDA-graph OOM, B300 sglang regressions, cluster docker/perms/disk issues), the exact workarounds, and gh-CLI gotchas — most cron-PR failures are already cataloged there.
6+
57
## Project Overview
68

79
InferenceX is an open-source automated benchmarking system that tracks LLM inference performance across hardware (NVIDIA B200/H100/H200/GB200, AMD MI300X/MI325X/MI355X) and software stacks (vLLM, SGLang, TensorRT-LLM, ATOM). Results published to https://inferencex.com/.
@@ -62,6 +64,8 @@ PRs do not run the sweep automatically - `run-sweep.yml` is gated on a label. Pi
6264
- `sweep-enabled` - runs the sweep with `--trim-conc` (each parallelism config reduced to its single highest concurrency). Default for most PRs.
6365
- `full-sweep-enabled` - runs the full intermediate concurrency sweep, identical to push-to-main. Use when intermediate points matter (e.g. a recipe change shifts the throughput/latency curve, not just its endpoints).
6466

67+
**The sweep does not trigger while the PR has merge conflicts.** Even with `sweep-enabled` / `full-sweep-enabled` applied, the `run-sweep.yml` workflow will not start until the PR cleanly merges into main — a stale claude/* or update-* branch with a `perf-changelog.yaml` conflict (the common case) will sit in NO_SWEEP / NO_SUCCESS until rebased. Resolution recipe is documented in `KLAUD_DEBUG.md §1.1`: `git merge origin/main`, then `git checkout origin/main -- perf-changelog.yaml`, then re-append the PR's own changelog entry at the tail. Don't 3-way merge `perf-changelog.yaml`; whitespace edits silently re-trigger the deletion check.
68+
6569
Push-to-main always runs the full untrimmed sweep unless `[skip-sweep]` is in the commit message. Trim logic lives in `trim_conc()` in `utils/process_changelog.py`: single-node entries are grouped by every non-`conc` field and only the highest-`conc` entry per group is kept; multi-node entries have their `conc` list collapsed to `[max(conc)]`.
6670

6771
## Common Tasks

0 commit comments

Comments
 (0)