Skip to content

Commit 70bf3a2

Browse files
authored
docker-tag-monitor: ask claude to read upstream release notes (#1538)
1 parent f4b05bc commit 70bf3a2

1 file changed

Lines changed: 56 additions & 1 deletion

File tree

.github/workflows/docker-tag-monitor.yml

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,42 @@ jobs:
179179
fi
180180
ALLOWED_SKUS=$(cat "$ALLOWED_SKUS_FILE")
181181
182+
# Snapshot per-config submission staleness from the production frontend.
183+
# Helps Claude prioritize bumps for configs that haven't been refreshed in a long time.
184+
STALENESS_TABLE=$(mktemp)
185+
if curl -sf --max-time 15 --compressed "https://inferencex.semianalysis.com/api/v1/submissions" -o /tmp/sub.json; then
186+
jq -r --arg today "$(date -u +%Y-%m-%d)" '
187+
def days_between($a; $b):
188+
(($a | strptime("%Y-%m-%d") | mktime) as $ta
189+
| ($b | strptime("%Y-%m-%d") | mktime) as $tb
190+
| (($ta - $tb) / 86400) | floor);
191+
[
192+
"| Config | Days since |",
193+
"|--------|------------|"
194+
]
195+
+ (
196+
(.summary // [])
197+
| map(select(.is_multinode == false and (.model | startswith("llama") | not)))
198+
| group_by([.model, .hardware, .framework, .precision, .spec_method, .disagg])
199+
| map({
200+
key: (
201+
.[0].model + "-" + .[0].precision + "-" + .[0].hardware + "-" + .[0].framework
202+
+ (if .[0].spec_method != "none" then "-mtp" else "" end)
203+
+ (if .[0].disagg then "-disagg" else "" end)
204+
),
205+
last_date: ([.[].date] | max)
206+
})
207+
| map(. + {days_since: days_between($today; .last_date)})
208+
| sort_by(-.days_since)
209+
| .[0:20]
210+
| map("| `\(.key)` | \(.days_since) |")
211+
)
212+
| .[]
213+
' /tmp/sub.json > "$STALENESS_TABLE"
214+
else
215+
echo "_Could not reach https://inferencex.semianalysis.com/api/v1/submissions — submission staleness data not available._" > "$STALENESS_TABLE"
216+
fi
217+
182218
# Build issue body and write to file
183219
BODY_FILE=$(mktemp)
184220
{
@@ -207,6 +243,17 @@ jobs:
207243
echo "**Allowed SKUs for this run:** _none — skip PR creation and post a comment explaining the runner shortage._"
208244
fi
209245
echo ""
246+
echo "### Submission Staleness (single-node configs)"
247+
echo ""
248+
echo "_Source: https://inferencex.semianalysis.com/api/v1/submissions at $(date -u +%Y-%m-%dT%H:%M:%SZ). Sorted oldest-first — older = higher priority for refresh._"
249+
echo ""
250+
echo "<details>"
251+
echo "<summary>Show staleness table</summary>"
252+
echo ""
253+
cat "$STALENESS_TABLE"
254+
echo ""
255+
echo "</details>"
256+
echo ""
210257
echo "---"
211258
echo ""
212259
echo "@claude Please update the configurations:"
@@ -215,6 +262,14 @@ jobs:
215262
echo "2. Add entries to \`perf-changelog.yaml\` documenting the version changes"
216263
echo "3. For each eligible config-key, push a branch and actually open a PR — do not stop at the \"Create a pull request for ...\" remote hint that \`git push\` prints. Run \`gh pr create\` (or the equivalent MCP tool) and verify the returned PR URL. Link every PR back to this issue in a comment."
217264
echo ""
265+
echo "**Pre-flight research (required before opening any PR):** For each image being bumped, read the upstream release notes for every version between the current tag and the new one — vLLM at \`https://github.com/vllm-project/vllm/releases\` (or \`gh api repos/vllm-project/vllm/releases\`), SGLang at \`https://github.com/sgl-project/sglang/releases\`. You are looking for two specific failure modes that have bitten prior runs:"
266+
echo ""
267+
echo "1. **Suffix convention changes.** The default CUDA/ROCm build can shift between versions, which changes which Docker tag to pick. Concrete example: vLLM v0.21.0 promoted CUDA 13 to the default build, so the bare \`v0.21.0\` tag *is* the cu13 image and no \`v0.21.0-cu13\` tag exists — mechanically reusing the old suffix would point at a 404. Before settling on a tag, confirm it actually exists on Docker Hub (\`curl -sf https://hub.docker.com/v2/repositories/<repo>/tags/<tag>/\`) and that its build matches the runner's accelerator. If the convention shifted, use the new correct tag and call it out explicitly in the PR body."
268+
echo ""
269+
echo "2. **CLI flag deprecations or removals.** Server flags get removed between minors and the container exits with an error on startup, so this only surfaces during the e2e run. Concrete example: vLLM removed \`--disable-log-requests\` (silent-by-default; passing it now errors). Before opening the PR, \`grep -rn\` the repo for the launch flags actually passed to this image (check launch scripts, sbatch recipes, and any per-config templates — not just the master config). For every flag in use, check the release notes between current and new version for deprecations/removals/renames. If a flag was removed or renamed, fix it in the same PR and add a perf-changelog entry noting the flag change. If a flag has no replacement and is load-bearing, skip the config-key and explain in your wrap-up comment."
270+
echo ""
271+
echo "List the release-notes URLs you consulted in the PR body so reviewers can audit the research. PRs that bump a tag without evidence of this check will be rejected."
272+
echo ""
218273
echo "**Required PR label:** Every PR you open from this issue MUST carry the \`full-sweep-enabled\` label. Apply it at creation time via \`gh pr create --label full-sweep-enabled\` (or add it immediately after with \`gh pr edit <num> --add-label full-sweep-enabled\`). Do not skip this — downstream automation keys off the label."
219274
echo ""
220275
echo "**PR title / commit message formatting:** Multi-line titles and bodies MUST use a heredoc, not \`\\n\` escapes and not \`\$'...'\` ANSI-C quoting. A prior run produced commits literally starting with \`\$\` and containing \`\\n\\n\` as text because of mis-quoted ANSI-C strings. Use this pattern instead:"
@@ -254,7 +309,7 @@ jobs:
254309
echo "If Docker Hub lists multiple variants for the same base version (e.g. \`cu128\` vs \`cu130\`, \`rocm70\` vs \`rocm72\`), pick the variant whose suffix matches what the config-key's current image entry already uses — don't switch CUDA/ROCm minor versions in this update."
255310
} > "$BODY_FILE"
256311
257-
rm -f "$RUNNERS_TABLE" "$ALLOWED_SKUS_FILE" /tmp/ci.json
312+
rm -f "$RUNNERS_TABLE" "$ALLOWED_SKUS_FILE" "$STALENESS_TABLE" /tmp/ci.json /tmp/sub.json
258313
259314
echo "body_file=$BODY_FILE" >> "$GITHUB_OUTPUT"
260315

0 commit comments

Comments
 (0)