You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When making changes to benchmark scripts or master config files that affect image tags, environment variables, or configuration parameters, you MUST add an entry to `perf-changelog.yaml`.
224
-
225
-
**When to update perf-changelog.yaml:**
226
-
- Updating image tags in `.github/configs/*-master.yaml` or `benchmarks/*.sh` scripts
227
-
- Adding or modifying environment variables in benchmark configurations
228
-
- Changing configuration parameters that affect performance
229
-
230
-
**Entry format:**
231
-
```yaml
232
-
- config-keys:
233
-
- dsr1-fp8-*-vllm # Use wildcards to match multiple configs
- Use wildcards (`*`) in config-keys to match multiple related configurations
242
-
- Each description item should be a concise change summary
243
-
- The pr-link should reference the PR number (use XXX as placeholder until PR is created)
224
+
See `AGENTS.md` → "Updating Docker Images" for entry format and rules. Required whenever you change image tags, env vars, or perf-affecting params in `.github/configs/*-master.yaml` or `benchmarks/*.sh`. Use `XXX` as the PR-link placeholder until the PR exists.
244
225
245
226
## Spawning Additional Workers:
246
227
You CAN spawn additional Claude workers by commenting "@claude" with a specific task.
@@ -271,8 +252,7 @@ jobs:
271
252
272
253
### Additional Knowledge
273
254
- MI355 is gfx950 not gfx1201
274
-
- **STP (Single Token Prediction)**: Standard autoregressive decoding — one token per forward pass. No speculative decoding or MTP. Benchmarks labeled "STP only" use vanilla decoding.
275
-
- **MTP (Multi-Token Prediction)**: Predicts multiple tokens per forward pass using speculative decoding (e.g., EAGLE, NEXTN).
255
+
- STP/MTP terminology: see `AGENTS.md` → "Terminology"
276
256
277
257
### Expert Parallelism in Benchmark Scripts
278
258
vLLM and SGLang handle expert parallelism differently. When writing or reviewing benchmark scripts for MoE models:
echo "_Source: https://inferencex-ci-tracker.vercel.app/api/ci at $(date -u +%Y-%m-%dT%H:%M:%SZ)_"
201
+
echo ""
202
+
cat "$RUNNERS_TABLE"
203
+
echo ""
204
+
if [[ -n "$ALLOWED_SKUS" ]]; then
205
+
echo "**Allowed SKUs for this run:** \`$ALLOWED_SKUS\`"
206
+
else
207
+
echo "**Allowed SKUs for this run:** _none — skip PR creation and post a comment explaining the runner shortage._"
208
+
fi
209
+
echo ""
156
210
echo "---"
157
211
echo ""
158
212
echo "@claude Please update the configurations:"
159
213
echo ""
160
214
echo "1. Update image tags in \`.github/configs/nvidia-master.yaml\` and/or \`.github/configs/amd-master.yaml\`"
161
215
echo "2. Add entries to \`perf-changelog.yaml\` documenting the version changes"
162
-
echo "3. Create separate PRs grouped by framework and image family, and link each PR."
216
+
echo "3. For each eligible config-key, push a branch and actually open a PR — do not stop at the \"Create a pull request for ...\" remote hint that \`git push\` prints. Run \`gh pr create\` (or the equivalent MCP tool) and verify the returned PR URL. Link every PR back to this issue in a comment."
217
+
echo ""
218
+
echo "**Required PR label:** Every PR you open from this issue MUST carry the \`full-sweep-enabled\` label. Apply it at creation time via \`gh pr create --label full-sweep-enabled\` (or add it immediately after with \`gh pr edit <num> --add-label full-sweep-enabled\`). Do not skip this — downstream automation keys off the label."
219
+
echo ""
220
+
echo "**PR title / commit message formatting:** Multi-line titles and bodies MUST use a heredoc, not \`\\n\` escapes and not \`\$'...'\` ANSI-C quoting. A prior run produced commits literally starting with \`\$\` and containing \`\\n\\n\` as text because of mis-quoted ANSI-C strings. Use this pattern instead:"
163
221
echo ""
164
-
echo "Group by framework plus CUDA/ROCm image family (for example sglang-cuda, sglang-rocm, vllm-cuda, vllm-rocm, atom, trt), not by individual GPU. Split by GPU only when an update is genuinely hardware-specific."
222
+
echo "\`\`\`bash"
223
+
echo "git commit -m \"\$(cat <<'EOF'"
224
+
echo "Update qwen3.5-bf16-b300-sglang-mtp SGLang image to v0.5.11-cu130"
165
225
echo ""
166
-
echo "Focus on updating single-node configurations. For each framework/image family, check if there are multiple CUDA/ROCm versions available and choose appropriately based on current usage patterns in the configs."
echo "Updates the SGLang image tag for \`qwen3.5-bf16-b300-sglang-mtp\` to v0.5.11-cu130."
234
+
echo ""
235
+
echo "Ref #<this issue's number>"
236
+
echo "EOF"
237
+
echo ")\""
238
+
echo "\`\`\`"
239
+
echo ""
240
+
echo "PR titles must be a single line (no newlines). Bodies should contain real newlines (use a heredoc), not the literal characters \`\\n\`. Never put \`\$\` in front of a quoted message string."
241
+
echo ""
242
+
echo "**Runner gating:** Only open PRs for config-keys whose runner SKU is in the allowed list (\`$ALLOWED_SKUS\`). The runner SKU is the hardware segment in the config-key (e.g. \`dsr1-fp4-b200-sglang\` → \`b200\`). For any config-key whose SKU is not in the allowed list, skip it and list the skipped keys plus the reason (not clear / all-offline / no idle capacity) in a single comment on this issue."
243
+
echo ""
244
+
echo "**Single-node only:** Skip any config-key whose master-config entry has \`multinode: true\` or otherwise targets a multinode runner. Only update single-node configurations."
245
+
echo ""
246
+
echo "**Per-SKU cap (max 5):** For each allowed SKU, work on at most 5 config-keys. If a SKU has more than 5 eligible config-keys, pick 5 in alphabetical order and list the deferred remainder in your wrap-up comment on this issue so a future run can pick them up."
247
+
echo ""
248
+
echo "**Sequential execution per SKU:** Within a single SKU, process config-keys one at a time. For each config-key: open the PR, dispatch its e2e test (\`mcp__github__run_workflow\` against \`e2e-tests.yml\` on the PR branch), poll with exponential backoff until the run reaches a terminal state (success/failure/cancelled), then move on to the next config-key for that SKU. Do not dispatch the next e2e run for the same SKU until the previous one has finished. Different SKUs may be processed in parallel since they target disjoint hardware, but each SKU's queue must stay serial."
249
+
echo ""
250
+
echo "One PR per config-key — do not bundle multiple config-keys into one PR even when they share a framework or image family."
251
+
echo ""
252
+
echo "**Exception — MTP pairs:** When a config-key and its \`-mtp\` sibling exist for the same model/precision/runner/framework (e.g. \`qwen3.5-fp4-b300-sglang\` and \`qwen3.5-fp4-b300-sglang-mtp\`), bundle both into one PR. Treat the pair as a single unit for the per-SKU cap (counts as 1, not 2) and the sequential e2e queue. If only one side of the pair is present in the updates, open a PR for just that one."
253
+
echo ""
254
+
echo "If Docker Hub lists multiple variants for the same base version (e.g. \`cu128\` vs \`cu130\`, \`rocm70\` vs \`rocm72\`), pick the variant whose suffix matches what the config-key's current image entry already uses — don't switch CUDA/ROCm minor versions in this update."
0 commit comments