Skip to content

Commit 7d40518

Browse files
haic0cursoragent
andcommitted
Merge main into fixed-AR MTP benchmark branch
Resolve the MI355X launcher conflict by preserving upstream fixed_seq_len script layout while keeping Eagle3 and fixed-AR MTP script resolution for this PR. Co-authored-by: Cursor <cursoragent@cursor.com>
2 parents e68ee0a + c138338 commit 7d40518

378 files changed

Lines changed: 27742 additions & 3235 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/commands/klaud-pr-status-html.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ State buckets:
3030
- **RUNNING** — no failed checks; at least one is `QUEUED` / `IN_PROGRESS` / `PENDING`.
3131
- **READY** — no failed, no pending, and at least one `Run Sweep` check is `SUCCESS`.
3232
- **NO_SUCCESS** — sweep ran but never produced a `SUCCESS` (e.g. all matrix jobs got SKIPPED).
33-
- **NO_SWEEP** — no `Run Sweep` check exists for this head SHA at all (sweep never triggered — usually missing `full-sweep-enabled` label).
33+
- **NO_SWEEP** — no `Run Sweep` check exists for this head SHA at all (sweep never triggered — usually missing a sweep label such as `full-sweep-enabled` or `non-canary-full-sweep-enabled`).
3434

3535
```bash
3636
: > /tmp/klaud_pr_status.tsv

.claude/commands/nuke.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
---
2+
description: Bump single-node inference-engine image tags (vLLM or SGLang) across recipes, one [Klaud Cold] PR per model+precision+SKU
3+
argument-hint: <vllm|sglang> <target-tag> [model/sku filter]
4+
---
5+
6+
Bump the container image tag for single-node benchmark recipes that use a given
7+
inference engine, opening **one PR per recipe family** with the grouping rules below.
8+
9+
Arguments (`$ARGUMENTS`): `<engine> <target-tag> [filter]`
10+
- `engine``vllm` or `sglang`
11+
- `target-tag` — e.g. `v0.22.0` (NVIDIA/CUDA) ; for SGLang the NVIDIA and AMD tag
12+
strings usually differ (CUDA `…-cu130` vs ROCm `…-rocm720-mi35x-…`), so confirm
13+
the exact tag per image repo with the user before editing.
14+
- `filter` (optional) — restrict to a model and/or SKU substring (e.g. `kimik2.5`,
15+
`b300`, `minimaxm2.5 mi355x`). If omitted, all matching recipes are in scope.
16+
17+
## Image repos by engine + vendor
18+
19+
| engine | NVIDIA image | AMD/ROCm image | master config |
20+
|--------|--------------|----------------|---------------|
21+
| vllm | `vllm/vllm-openai` | `vllm/vllm-openai-rocm` | `.github/configs/nvidia-master.yaml` / `amd-master.yaml` |
22+
| sglang | `lmsysorg/sglang` | `lmsysorg/sglang` (rocm-suffixed tag) | same two files |
23+
24+
## Grouping rules (NON-NEGOTIABLE)
25+
26+
1. **One PR per `model + precision + SKU` recipe family.** The config-key shape is
27+
`<model>-<precision>-<sku>-<engine>` (e.g. `kimik2.5-int4-b300-vllm`).
28+
2. **Fold the `-mtp` (and non-mtp) sibling into the SAME PR** as its base recipe.
29+
This is the *only* thing you may combine.
30+
3. **Never** put two different models, two different precisions, or two different
31+
SKUs in the same PR. (fp4 vs fp8 vs int4 are different precisions → separate PRs.)
32+
4. Skip `*-agentic` recipes unless the user explicitly opts in — they are
33+
deliberately diverged/pinned.
34+
35+
## Step 1 — discover candidate recipes
36+
37+
Parse both master YAMLs for top-level keys whose `framework:` matches `engine`, and
38+
record each key's current `image:`. Keep only single-node keys (they carry a SKU like
39+
`b200/b300/h100/h200/mi300x/mi325x/mi355x` and map to `benchmarks/single_node/*`); drop
40+
multi-node/disagg keys. Apply the `filter` if given. Then collapse `-mtp` siblings into
41+
their base family.
42+
43+
## Step 2 — verify the target tag(s) EXIST before bumping
44+
45+
Per standing guidance, never invent a tag. Check each image repo you'll touch:
46+
47+
```bash
48+
for repo in vllm/vllm-openai vllm/vllm-openai-rocm; do # or lmsysorg/sglang
49+
code=$(curl -s -o /dev/null -w "%{http_code}" "https://hub.docker.com/v2/repositories/${repo}/tags/<TAG>")
50+
echo "$repo:<TAG> -> $code" # want 200
51+
done
52+
```
53+
54+
If a vendor-specific variant 404s (e.g. `…-cu130` for a version that only ships
55+
plain), confirm the correct tag string with the user before proceeding.
56+
57+
## Step 3 — confirm scope with the user (AskUserQuestion)
58+
59+
Before creating anything, present the full recipe list (count + current→target per
60+
family) and confirm:
61+
- **Vendor scope**: NVIDIA, AMD, or both.
62+
- **Agentic**: include `*-agentic` siblings? (default: exclude)
63+
- **Special pins**: call out any recipe currently on a nightly/non-stable/special tag
64+
(e.g. `nightly-…`, `…-cu129`, a one-off build) and ask whether to override it.
65+
66+
Each PR triggers a full GPU sweep, so surface the total PR count explicitly.
67+
68+
## Step 4 — create one PR per family
69+
70+
Use these helpers (write them to /tmp) for precise, per-config-key edits — a blind
71+
`sed` is unsafe because the same old tag appears under many keys.
72+
73+
`/tmp/edit_image.py`:
74+
```python
75+
#!/usr/bin/env python3
76+
# Usage: edit_image.py <yaml_file> <new_image> <key1> [key2 ...]
77+
import re, sys
78+
f, new_image, keys = sys.argv[1], sys.argv[2], sys.argv[3:]
79+
lines = open(f).read().split('\n')
80+
for key in keys:
81+
kre = re.compile(r'^' + re.escape(key) + r':\s*$')
82+
start = next((i for i,l in enumerate(lines) if kre.match(l)), None)
83+
if start is None: sys.exit(f"ERROR: key not found: {key}")
84+
img_i = None
85+
for j in range(start+1, len(lines)):
86+
if re.match(r'^[A-Za-z0-9._-]+:\s*$', lines[j]): break # next top-level key
87+
m = re.match(r'^(\s+)image:\s*(.+?)\s*$', lines[j])
88+
if m: img_i, indent, old = j, m.group(1), m.group(2); break
89+
if img_i is None: sys.exit(f"ERROR: no image: line for key {key}")
90+
if old != new_image: lines[img_i] = f"{indent}image: {new_image}"; print(f"{key}: {old} -> {new_image}")
91+
else: print(f"{key}: already {new_image} (no change)")
92+
open(f,'w').write('\n'.join(lines))
93+
```
94+
95+
`/tmp/append_changelog.py`:
96+
```python
97+
#!/usr/bin/env python3
98+
# Usage: append_changelog.py <changelog> <description> <key1> [key2 ...]
99+
import sys
100+
f, desc, keys = sys.argv[1], sys.argv[2], sys.argv[3:]
101+
content = open(f).read().rstrip('\n')
102+
block = ["", "- config-keys:"] + [f" - {k}" for k in keys]
103+
block += [" description:", f' - "{desc}"', " pr-link: PRLINK_PLACEHOLDER"]
104+
open(f,'w').write(content + '\n' + '\n'.join(block) + '\n')
105+
```
106+
107+
For each family (run strictly sequentially — git checkouts can't be parallel):
108+
109+
```bash
110+
git checkout main -q && git reset --hard origin/main -q
111+
branch="klaud-cold/<basekey>-<TAG>"
112+
git checkout -b "$branch" -q
113+
python3 /tmp/edit_image.py <master.yaml> <NEW_IMAGE> <key> [<key>-mtp]
114+
python3 /tmp/append_changelog.py perf-changelog.yaml "<DESC>" <key> [<key>-mtp]
115+
git add -A
116+
git commit -q -m "[Klaud Cold] Update <basekey>[ (+mtp)] <PHRASE> to <TAG>" \
117+
-m "Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"
118+
git push -u origin "$branch" -q --force-with-lease
119+
url=$(gh pr create --repo SemiAnalysisAI/InferenceX --base main --head "$branch" \
120+
--title "[Klaud Cold] Update <basekey>[ (+mtp)] <PHRASE> to <TAG>" \
121+
--body "<BODY>" --label full-sweep-enabled | grep -o 'https://github.com/[^ ]*')
122+
# patch the changelog pr-link with the real URL, then amend + force-push
123+
python3 - perf-changelog.yaml "$url" <<'PY'
124+
import sys; f,u=sys.argv[1],sys.argv[2]
125+
open(f,'w').write(open(f).read().replace("PRLINK_PLACEHOLDER",u,1))
126+
PY
127+
git add perf-changelog.yaml && git commit -q --amend --no-edit && git push -q --force-with-lease
128+
```
129+
130+
Conventions:
131+
- `<PHRASE>` = `vLLM image` / `vLLM ROCm image` / `SGLang image` / `SGLang ROCm image`.
132+
- Title gets `(+mtp)` only when the family has an mtp sibling.
133+
- Every PR carries the **`full-sweep-enabled`** label so CI kicks off.
134+
- `<DESC>` = `Update <PHRASE> from <old-tag> to <TAG>` (note both tags when the
135+
base/mtp differ, e.g. base already on target).
136+
- PR body:
137+
```
138+
## Summary
139+
<DESC>
140+
141+
Recipes touched: `key1`, `key2`
142+
143+
## Test plan
144+
- [ ] full-sweep-enabled sweep passes.
145+
146+
🤖 Generated with [Claude Code](https://claude.com/claude-code)
147+
```
148+
149+
## Step 5 — finish
150+
151+
Return to a clean `main` (`git checkout main && git reset --hard origin/main`).
152+
Report a table of every PR created (number + URL + recipe), flag any special-pin
153+
overrides, and note that each PR's sweep will run via the `full-sweep-enabled` label.

0 commit comments

Comments
 (0)