Skip to content

Commit 40fa217

Browse files
authored
Misc touchups (#1380)
* Misc touchups * Refine AGENTS.md content and fix minor typos in the documentation.
1 parent 29ac412 commit 40fa217

5 files changed

Lines changed: 391 additions & 405 deletions

File tree

.github/workflows/claude.yml

Lines changed: 2 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -221,27 +221,7 @@ jobs:
221221
222222
## Updating perf-changelog.yaml
223223
224-
When making changes to benchmark scripts or master config files that affect image tags, environment variables, or configuration parameters, you MUST add an entry to `perf-changelog.yaml`.
225-
226-
**When to update perf-changelog.yaml:**
227-
- Updating image tags in `.github/configs/*-master.yaml` or `benchmarks/*.sh` scripts
228-
- Adding or modifying environment variables in benchmark configurations
229-
- Changing configuration parameters that affect performance
230-
231-
**Entry format:**
232-
```yaml
233-
- config-keys:
234-
- dsr1-fp8-*-vllm # Use wildcards to match multiple configs
235-
description:
236-
- "Update vLLM image from v0.11.2 to v0.13.0"
237-
- "Add VLLM_MXFP4_USE_MARLIN=1 environment variable"
238-
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX
239-
```
240-
241-
**Guidelines:**
242-
- Use wildcards (`*`) in config-keys to match multiple related configurations
243-
- Each description item should be a concise change summary
244-
- The pr-link should reference the PR number (use XXX as placeholder until PR is created)
224+
See `AGENTS.md` → "Updating Docker Images" for entry format and rules. Required whenever you change image tags, env vars, or perf-affecting params in `.github/configs/*-master.yaml` or `benchmarks/*.sh`. Use `XXX` as the PR-link placeholder until the PR exists.
245225
246226
## Spawning Additional Workers:
247227
You CAN spawn additional Claude workers by commenting "@claude" with a specific task.
@@ -272,8 +252,7 @@ jobs:
272252

273253
### Additional Knowledge
274254
- MI355 is gfx950 not gfx1201
275-
- **STP (Single Token Prediction)**: Standard autoregressive decoding — one token per forward pass. No speculative decoding or MTP. Benchmarks labeled "STP only" use vanilla decoding.
276-
- **MTP (Multi-Token Prediction)**: Predicts multiple tokens per forward pass using speculative decoding (e.g., EAGLE, NEXTN).
255+
- STP/MTP terminology: see `AGENTS.md` → "Terminology"
277256

278257
### Expert Parallelism in Benchmark Scripts
279258
vLLM and SGLang handle expert parallelism differently. When writing or reviewing benchmark scripts for MoE models:

.github/workflows/docker-tag-monitor.yml

Lines changed: 17 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
name: Docker Tag Monitor
22

3-
# trigger: re-run after token/allowed_bots fix
3+
# Downstream merge note (human-only — intentionally NOT in the @claude prompt):
4+
# Once the per-config-key PRs this workflow asks Claude to open have a
5+
# green run-sweep.yml run and the `full-sweep-enabled` label, merge them
6+
# with `utils/merge_with_reuse.sh <pr-number>` instead of the GitHub UI.
7+
# That script posts /reuse-sweep-run, auto-resolves perf-changelog.yaml
8+
# conflicts, cancels the merge-triggered sweep, and squash-merges with
9+
# --admin so the post-merge run-sweep run reuses the PR's prior sweep.
10+
# Claude doesn't have admin merge rights and shouldn't be told about this
11+
# path — it's a maintainer-only finalization step.
412

513
on:
614
schedule:
@@ -207,29 +215,29 @@ jobs:
207215
echo "2. Add entries to \`perf-changelog.yaml\` documenting the version changes"
208216
echo "3. For each eligible config-key, push a branch and actually open a PR — do not stop at the \"Create a pull request for ...\" remote hint that \`git push\` prints. Run \`gh pr create\` (or the equivalent MCP tool) and verify the returned PR URL. Link every PR back to this issue in a comment."
209217
echo ""
210-
echo "**PR title / commit message formatting:** Multi-line titles and bodies MUST use a heredoc, not \`\\n\` escapes and not \`\$'...'\` ANSI-C quoting. Last run produced commits literally starting with \`\$\` and containing \`\\n\\n\` as text because of mis-quoted ANSI-C strings. Use this pattern instead:"
218+
echo "**Required PR label:** Every PR you open from this issue MUST carry the \`full-sweep-enabled\` label. Apply it at creation time via \`gh pr create --label full-sweep-enabled\` (or add it immediately after with \`gh pr edit <num> --add-label full-sweep-enabled\`). Do not skip this — downstream automation keys off the label."
219+
echo ""
220+
echo "**PR title / commit message formatting:** Multi-line titles and bodies MUST use a heredoc, not \`\\n\` escapes and not \`\$'...'\` ANSI-C quoting. A prior run produced commits literally starting with \`\$\` and containing \`\\n\\n\` as text because of mis-quoted ANSI-C strings. Use this pattern instead:"
211221
echo ""
212222
echo "\`\`\`bash"
213223
echo "git commit -m \"\$(cat <<'EOF'"
214224
echo "Update qwen3.5-bf16-b300-sglang-mtp SGLang image to v0.5.11-cu130"
215225
echo ""
216-
echo "Ref #${ISSUE_NUMBER}"
226+
echo "Ref #<this issue's number>"
217227
echo "EOF"
218228
echo ")\""
219229
echo ""
220230
echo "gh pr create --title \"Update qwen3.5-bf16-b300-sglang-mtp SGLang image to v0.5.11-cu130\" \\"
221231
echo " --label full-sweep-enabled \\"
222232
echo " --body \"\$(cat <<'EOF'"
223-
echo "Updates the SGLang image tag for \\\`qwen3.5-bf16-b300-sglang-mtp\\\` to v0.5.11-cu130."
233+
echo "Updates the SGLang image tag for \`qwen3.5-bf16-b300-sglang-mtp\` to v0.5.11-cu130."
224234
echo ""
225-
echo "Ref #${ISSUE_NUMBER}"
235+
echo "Ref #<this issue's number>"
226236
echo "EOF"
227237
echo ")\""
228238
echo "\`\`\`"
229239
echo ""
230-
echo "PR titles must be a single line (no newlines). Bodies are multi-line and must contain real \\\\n, not the literal characters \`\\\\n\`. Never put \`\$\` in front of a quoted message string."
231-
echo ""
232-
echo "**Required PR label:** Every PR you open from this issue MUST carry the \`full-sweep-enabled\` label. Apply it at creation time via \`gh pr create --label full-sweep-enabled\` (or add it immediately after with \`gh pr edit <num> --add-label full-sweep-enabled\`). Do not skip this — downstream automation keys off the label."
240+
echo "PR titles must be a single line (no newlines). Bodies should contain real newlines (use a heredoc), not the literal characters \`\\n\`. Never put \`\$\` in front of a quoted message string."
233241
echo ""
234242
echo "**Runner gating:** Only open PRs for config-keys whose runner SKU is in the allowed list (\`$ALLOWED_SKUS\`). The runner SKU is the hardware segment in the config-key (e.g. \`dsr1-fp4-b200-sglang\` → \`b200\`). For any config-key whose SKU is not in the allowed list, skip it and list the skipped keys plus the reason (not clear / all-offline / no idle capacity) in a single comment on this issue."
235243
echo ""
@@ -243,18 +251,7 @@ jobs:
243251
echo ""
244252
echo "**Exception — MTP pairs:** When a config-key and its \`-mtp\` sibling exist for the same model/precision/runner/framework (e.g. \`qwen3.5-fp4-b300-sglang\` and \`qwen3.5-fp4-b300-sglang-mtp\`), bundle both into one PR. Treat the pair as a single unit for the per-SKU cap (counts as 1, not 2) and the sequential e2e queue. If only one side of the pair is present in the updates, open a PR for just that one."
245253
echo ""
246-
echo "For each eligible config-key, check if there are multiple CUDA/ROCm versions available and choose appropriately based on current usage patterns in the configs."
247-
echo ""
248-
echo "**Slack ping when done:** After all PRs have been opened, all e2e runs have reached a terminal state, and the wrap-up comment with any deferred config-keys has been posted, send a single summary message to Slack channel \`C09PULGMVNG\` via the Bash tool. The \`SLACK_BOT_TOKEN\` env var is available. Use:"
249-
echo ""
250-
echo "\`\`\`bash"
251-
echo "curl -sS -X POST https://slack.com/api/chat.postMessage \\"
252-
echo " -H \"Authorization: Bearer \$SLACK_BOT_TOKEN\" \\"
253-
echo " -H \"Content-Type: application/json; charset=utf-8\" \\"
254-
echo " --data \"\$(jq -n --arg ch C09PULGMVNG --arg t \"<text>\" '{channel: \$ch, text: \$t}')\""
255-
echo "\`\`\`"
256-
echo ""
257-
echo "Message text should include: this issue link, count of PRs opened (per SKU), count of e2e runs passed/failed, list of skipped or deferred config-keys, and link to this workflow run. Send exactly one Slack message — do not post per-SKU or per-PR."
254+
echo "If Docker Hub lists multiple variants for the same base version (e.g. \`cu128\` vs \`cu130\`, \`rocm70\` vs \`rocm72\`), pick the variant whose suffix matches what the config-key's current image entry already uses — don't switch CUDA/ROCm minor versions in this update."
258255
} > "$BODY_FILE"
259256
260257
rm -f "$RUNNERS_TABLE" "$ALLOWED_SKUS_FILE" /tmp/ci.json

0 commit comments

Comments
 (0)