Skip to content

fix(launcher): propagate dflash training failures and fast-fail vllm smoke test on missing draft #35

fix(launcher): propagate dflash training failures and fast-fail vllm smoke test on missing draft

fix(launcher): propagate dflash training failures and fast-fail vllm smoke test on missing draft #35

Workflow file for this run

name: Claude Code Review
on:
issue_comment:
types: [created]
jobs:
# ──────────────────────────────────────────────────────────────────
# Deep ModelOpt-focused review covering numerical correctness,
# mode/state composition, export compatibility, and backward
# compatibility for saved checkpoints / recipes.
#
# CodeRabbit (.coderabbit.yaml) auto-reviews every PR for routine
# bugs, typos, style, and security anti-patterns — this Claude job
# is on-demand and complements that with deeper architectural
# analysis. Trigger: /claude review
# ──────────────────────────────────────────────────────────────────
review:
name: Claude Review
if: |
github.event_name == 'issue_comment' &&
github.event.issue.pull_request &&
contains(github.event.comment.body, '/claude review') &&
contains(fromJson('["OWNER", "MEMBER", "COLLABORATOR"]'), github.event.comment.author_association)
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
contents: read
pull-requests: write
issues: write
id-token: write
env:
GH_TOKEN: ${{ github.token }}
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.issue.number }}
steps:
- name: Get PR info
id: pr-info
run: |
PR_DATA=$(gh pr view $PR_NUMBER --repo $REPO --json headRefOid,baseRefName)
echo "sha=$(echo $PR_DATA | jq -r .headRefOid)" >> $GITHUB_OUTPUT
echo "base_ref=$(echo $PR_DATA | jq -r .baseRefName)" >> $GITHUB_OUTPUT
- name: Checkout repository
uses: actions/checkout@v6
with:
fetch-depth: 1
ref: ${{ steps.pr-info.outputs.sha }}
- name: Fetch base branch for diff analysis
run: git fetch origin ${{ steps.pr-info.outputs.base_ref }}
- name: React to trigger comment
run: |
gh api repos/$REPO/issues/comments/${{ github.event.comment.id }}/reactions \
--method POST \
-f content='eyes'
- name: Run Claude Review
uses: anthropics/claude-code-action@v1
env:
ANTHROPIC_BASE_URL: ${{ secrets.ANTHROPIC_BASE_URL }}
# NVIDIA inference proxy (LiteLLM-based) rejects two fields
# the Claude Code SDK sends by default. Set per NVIDIA/OSMO's
# workflow which has hit and solved both issues:
# - `context_management` → disable experimental betas
# - `cache_control.ephemeral.scope` → disable prompt caching
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
DISABLE_PROMPT_CACHING: "1"
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
trigger_phrase: "/claude review"
show_full_output: true
claude_args: |
--allowedTools "mcp__github_inline_comment__create_inline_comment,Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*),Bash(gh pr review:*),Bash(git diff:*),Bash(git show:*),Bash(git log:*),Read,Grep,Glob"
--model "${{ vars.CLAUDE_MODEL }}"
prompt: |
REPO: ${{ env.REPO }}
PR NUMBER: ${{ env.PR_NUMBER }}
BASE REF: origin/${{ steps.pr-info.outputs.base_ref }}
Mandatory workflow — never skip or reorder:
1. Read the PR diff first (gh pr diff).
2. Read CLAUDE.md and CONTRIBUTING.md for project conventions and architecture.
3. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
`__init__.py` plus any `mode.py` / `config.py` to understand mode registration
and config schema.
4. Only then perform the review using that context.
You are performing a deep code review on a **NVIDIA Model Optimizer (ModelOpt)** PR.
ModelOpt is NVIDIA's open-source library for model optimization (quantization, pruning,
distillation, sparsity, speculative decoding, NAS, PEFT) targeting PyTorch / ONNX /
HuggingFace / Megatron, with deployment into TRT-LLM / vLLM / SGLang.
Apply general correctness reasoning to whichever ModelOpt sub-package the diff touches —
you already know the algorithms (quantization formulas, distillation losses, N:M sparsity
mask selection, speculative draft-token alignment, etc.). The prompt below covers the
**ModelOpt-specific structural concerns** that you can't infer from the diff alone.
## Division of labor with CodeRabbit
CodeRabbit (`.coderabbit.yaml`) already auto-reviews every PR with the `chill` profile
and runs a hard pre-merge gate on security anti-patterns. **Do NOT duplicate its work.**
Specifically, do NOT comment on:
- Style, formatting, or naming nits (handled by ruff + CodeRabbit)
- Simple typos in code/comments/strings (CodeRabbit catches these)
- The security anti-patterns enumerated in `.coderabbit.yaml`:
`torch.load(weights_only=False)` without justification, `numpy.load(allow_pickle=True)`
without justification, hardcoded `trust_remote_code=True`, `eval`/`exec` on external
input, `# nosec` bypasses, non-permissive new PIP dependencies — these are already
gated. Skip them entirely.
- Generic "consider adding a test" suggestions for trivial changes.
Your value is in things CodeRabbit's pattern-matching cannot do well: tracing dataflow
across multiple files, reasoning about mode/state composition, judging export
compatibility, and catching algorithm-level correctness bugs.
## Review Procedure
1. Get PR metadata: `gh pr view $PR_NUMBER --repo $REPO --json title,body,baseRefName,headRefName,files,additions,deletions,changedFiles,author`
2. Get the full diff: `gh pr diff $PR_NUMBER --repo $REPO`
- For large PRs (>50 files), prioritize source code over config/lock/auto-generated files.
3. For each significant changed file, read the full file for surrounding context.
4. Trace the algorithm end-to-end through the diff. Verify the math/logic matches the
intended technique (whatever sub-package it belongs to).
5. For each newly introduced variable/argument/field, verify it has a meaningful runtime
use path — not just declaration/docstring or discard assignment (`_ = new_arg`).
Use Grep to search for usage beyond declaration sites.
6. Post findings as inline comments with severity and category tags.
## Critical Issues (Must Fix)
### Algorithm Correctness
- Verify the implementation matches the intended technique. Apply your knowledge of the
relevant algorithm family (quantization scales/rounding/saturation, distillation loss
composition, sparsity mask selection, pruning importance scoring, speculative draft
acceptance, NAS supernet weight sharing, etc.).
- Watch for silent numerical bugs: missing fp32 upcast in reductions, wrong reduction
dimension, division-by-zero guards, casts that wrap instead of saturate, gradient
flow through stop_gradient boundaries.
- Watch for state corruption across calibration / search / training passes — leftover
statistics from a previous run are a common foot-gun.
### Mode & State Composability (ModelOpt-specific)
- **Mode registration**: New modes must register correctly with `apply_mode()` /
`restore()`, declare their dependencies, and produce a `modelopt_state` entry that
round-trips through save/restore.
- **State dict schema**: Modified `modelopt_state` schema must include a migration path
or version bump — silently changing keys breaks restore for existing checkpoints.
- **Restore fidelity**: After `restore(model, state)`, the model must be functionally
identical to the saved one. Verify module replacements, hooks, and parameters are
re-applied.
- **Plugin laziness**: Optional integrations (HF, Megatron, TRT-LLM, ONNX) must not
hard-import at module load — gate behind `import_plugin()` so users without those
extras don't break.
### Export Compatibility (ModelOpt-specific)
- HF export (`unified_export_hf.py`) must produce a checkpoint that loads cleanly in
transformers and matches the on-device dtype.
- TRT-LLM export (`model_config_export.py`) must emit a valid `config.json` with
correct `quant_algo`, `kv_cache_quant_algo`, scale tensor names, and weight layout.
- ONNX export must use opsets and operator versions supported by the target consumer
(TRT, ORT).
## Important Issues (Should Fix)
### Backward Compatibility
- Renamed or removed arguments / config fields without deprecation path — breaks
existing user scripts.
- `modelopt_recipes/*.yaml` schema changes without a version bump — old recipes
silently misparse.
- Changed defaults silently alter behavior for users relying on them.
- Changed function signatures, return types, or side effects in
`modelopt/torch/*/__init__.py` (public API) without a backward-compat shim.
- Modified `modelopt_state` keys/structure without migration — makes existing
optimized checkpoints unloadable.
### Performance
- Unnecessary CPU-GPU synchronization in hot paths: `.item()`, `.cpu()`,
`torch.cuda.synchronize()`, Python-side tensor value checks.
- Memory regressions: double-allocating weights, holding tensors past their lifetime.
## Suggestions (Nice to Have)
- Stale, imprecise, or misleading comments/docstrings — a wrong docstring is worse
than none.
- Missing shape/dtype assertions at module/parallelism boundaries where they would
catch real bugs.
- Functions mixing many unrelated responsibilities that would benefit from splitting.
## Comment Format
Prefix each comment with severity and category tag:
- `**[CRITICAL Algorithm]**`, `**[CRITICAL ModeState]**`, `**[CRITICAL Export]**`
- `**[IMPORTANT Compatibility]**`, `**[IMPORTANT Performance]**`
- `**[SUGGESTION]**`
For each finding, explain: (1) what the issue is, (2) why it matters (impact/risk), (3) specific suggestion for fix.
Only use inline ```suggestion blocks for simple, self-contained line replacements (typos,
renames, single-line fixes). For structural changes that add, remove, or reorganize blocks
of code, use a top-level PR comment with a code block showing the proposed change instead.
## Completion
After posting all inline comments, post a summary PR comment:
- List total findings by severity (CRITICAL: N, IMPORTANT: N, SUGGESTION: N)
- Highlight the most impactful findings
- Overall assessment of the PR's risk level
If no significant issues are found, approve the PR:
gh pr review $PR_NUMBER --repo $REPO --approve --body "Claude review passed — no significant issues found. LGTM"