fix(launcher): propagate dflash training failures and fast-fail vllm smoke test on missing draft #35

Workflow file for this run

.github/workflows/claude_review.yml at ed1d6a9

	name: Claude Code Review

	on:
	issue_comment:
	types: [created]

	jobs:
	# ──────────────────────────────────────────────────────────────────
	# Deep ModelOpt-focused review covering numerical correctness,
	# mode/state composition, export compatibility, and backward
	# compatibility for saved checkpoints / recipes.
	#
	# CodeRabbit (.coderabbit.yaml) auto-reviews every PR for routine
	# bugs, typos, style, and security anti-patterns — this Claude job
	# is on-demand and complements that with deeper architectural
	# analysis. Trigger: /claude review
	# ──────────────────────────────────────────────────────────────────
	review:
	name: Claude Review
	if: \|
	github.event_name == 'issue_comment' &&
	github.event.issue.pull_request &&
	contains(github.event.comment.body, '/claude review') &&
	contains(fromJson('["OWNER", "MEMBER", "COLLABORATOR"]'), github.event.comment.author_association)
	runs-on: ubuntu-latest
	timeout-minutes: 10
	permissions:
	contents: read
	pull-requests: write
	issues: write
	id-token: write
	env:
	GH_TOKEN: ${{ github.token }}
	REPO: ${{ github.repository }}
	PR_NUMBER: ${{ github.event.issue.number }}
	steps:
	- name: Get PR info
	id: pr-info
	run: \|
	PR_DATA=$(gh pr view $PR_NUMBER --repo $REPO --json headRefOid,baseRefName)
	echo "sha=$(echo $PR_DATA \| jq -r .headRefOid)" >> $GITHUB_OUTPUT
	echo "base_ref=$(echo $PR_DATA \| jq -r .baseRefName)" >> $GITHUB_OUTPUT

	- name: Checkout repository
	uses: actions/checkout@v6
	with:
	fetch-depth: 1
	ref: ${{ steps.pr-info.outputs.sha }}

	- name: Fetch base branch for diff analysis
	run: git fetch origin ${{ steps.pr-info.outputs.base_ref }}

	- name: React to trigger comment
	run: \|
	gh api repos/$REPO/issues/comments/${{ github.event.comment.id }}/reactions \
	--method POST \
	-f content='eyes'

	- name: Run Claude Review
	uses: anthropics/claude-code-action@v1
	env:
	ANTHROPIC_BASE_URL: ${{ secrets.ANTHROPIC_BASE_URL }}
	# NVIDIA inference proxy (LiteLLM-based) rejects two fields
	# the Claude Code SDK sends by default. Set per NVIDIA/OSMO's
	# workflow which has hit and solved both issues:
	# - `context_management` → disable experimental betas
	# - `cache_control.ephemeral.scope` → disable prompt caching
	CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
	DISABLE_PROMPT_CACHING: "1"
	with:
	anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
	trigger_phrase: "/claude review"
	show_full_output: true
	claude_args: \|
	--allowedTools "mcp__github_inline_comment__create_inline_comment,Bash(gh pr comment:),Bash(gh pr diff:),Bash(gh pr view:),Bash(gh pr review:),Bash(git diff:),Bash(git show:),Bash(git log:*),Read,Grep,Glob"
	--model "${{ vars.CLAUDE_MODEL }}"
	prompt: \|
	REPO: ${{ env.REPO }}
	PR NUMBER: ${{ env.PR_NUMBER }}
	BASE REF: origin/${{ steps.pr-info.outputs.base_ref }}

	Mandatory workflow — never skip or reorder:
	1. Read the PR diff first (gh pr diff).
	2. Read CLAUDE.md and CONTRIBUTING.md for project conventions and architecture.
	3. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
	`__init__.py` plus any `mode.py` / `config.py` to understand mode registration
	and config schema.
	4. Only then perform the review using that context.

	You are performing a deep code review on a NVIDIA Model Optimizer (ModelOpt) PR.
	ModelOpt is NVIDIA's open-source library for model optimization (quantization, pruning,
	distillation, sparsity, speculative decoding, NAS, PEFT) targeting PyTorch / ONNX /
	HuggingFace / Megatron, with deployment into TRT-LLM / vLLM / SGLang.

	Apply general correctness reasoning to whichever ModelOpt sub-package the diff touches —
	you already know the algorithms (quantization formulas, distillation losses, N:M sparsity
	mask selection, speculative draft-token alignment, etc.). The prompt below covers the
	ModelOpt-specific structural concerns that you can't infer from the diff alone.

	## Division of labor with CodeRabbit

	CodeRabbit (`.coderabbit.yaml`) already auto-reviews every PR with the `chill` profile
	and runs a hard pre-merge gate on security anti-patterns. Do NOT duplicate its work.
	Specifically, do NOT comment on:
	- Style, formatting, or naming nits (handled by ruff + CodeRabbit)
	- Simple typos in code/comments/strings (CodeRabbit catches these)
	- The security anti-patterns enumerated in `.coderabbit.yaml`:
	`torch.load(weights_only=False)` without justification, `numpy.load(allow_pickle=True)`
	without justification, hardcoded `trust_remote_code=True`, `eval`/`exec` on external
	input, `# nosec` bypasses, non-permissive new PIP dependencies — these are already
	gated. Skip them entirely.
	- Generic "consider adding a test" suggestions for trivial changes.

	Your value is in things CodeRabbit's pattern-matching cannot do well: tracing dataflow
	across multiple files, reasoning about mode/state composition, judging export
	compatibility, and catching algorithm-level correctness bugs.

	## Review Procedure

	1. Get PR metadata: `gh pr view $PR_NUMBER --repo $REPO --json title,body,baseRefName,headRefName,files,additions,deletions,changedFiles,author`
	2. Get the full diff: `gh pr diff $PR_NUMBER --repo $REPO`
	- For large PRs (>50 files), prioritize source code over config/lock/auto-generated files.
	3. For each significant changed file, read the full file for surrounding context.
	4. Trace the algorithm end-to-end through the diff. Verify the math/logic matches the
	intended technique (whatever sub-package it belongs to).
	5. For each newly introduced variable/argument/field, verify it has a meaningful runtime
	use path — not just declaration/docstring or discard assignment (`_ = new_arg`).
	Use Grep to search for usage beyond declaration sites.
	6. Post findings as inline comments with severity and category tags.

	## Critical Issues (Must Fix)

	### Algorithm Correctness
	- Verify the implementation matches the intended technique. Apply your knowledge of the
	relevant algorithm family (quantization scales/rounding/saturation, distillation loss
	composition, sparsity mask selection, pruning importance scoring, speculative draft
	acceptance, NAS supernet weight sharing, etc.).
	- Watch for silent numerical bugs: missing fp32 upcast in reductions, wrong reduction
	dimension, division-by-zero guards, casts that wrap instead of saturate, gradient
	flow through stop_gradient boundaries.
	- Watch for state corruption across calibration / search / training passes — leftover
	statistics from a previous run are a common foot-gun.

	### Mode & State Composability (ModelOpt-specific)
	- Mode registration: New modes must register correctly with `apply_mode()` /
	`restore()`, declare their dependencies, and produce a `modelopt_state` entry that
	round-trips through save/restore.
	- State dict schema: Modified `modelopt_state` schema must include a migration path
	or version bump — silently changing keys breaks restore for existing checkpoints.
	- Restore fidelity: After `restore(model, state)`, the model must be functionally
	identical to the saved one. Verify module replacements, hooks, and parameters are
	re-applied.
	- Plugin laziness: Optional integrations (HF, Megatron, TRT-LLM, ONNX) must not
	hard-import at module load — gate behind `import_plugin()` so users without those
	extras don't break.

	### Export Compatibility (ModelOpt-specific)
	- HF export (`unified_export_hf.py`) must produce a checkpoint that loads cleanly in
	transformers and matches the on-device dtype.
	- TRT-LLM export (`model_config_export.py`) must emit a valid `config.json` with
	correct `quant_algo`, `kv_cache_quant_algo`, scale tensor names, and weight layout.
	- ONNX export must use opsets and operator versions supported by the target consumer
	(TRT, ORT).

	## Important Issues (Should Fix)

	### Backward Compatibility
	- Renamed or removed arguments / config fields without deprecation path — breaks
	existing user scripts.
	- `modelopt_recipes/*.yaml` schema changes without a version bump — old recipes
	silently misparse.
	- Changed defaults silently alter behavior for users relying on them.
	- Changed function signatures, return types, or side effects in
	`modelopt/torch/*/__init__.py` (public API) without a backward-compat shim.
	- Modified `modelopt_state` keys/structure without migration — makes existing
	optimized checkpoints unloadable.

	### Performance
	- Unnecessary CPU-GPU synchronization in hot paths: `.item()`, `.cpu()`,
	`torch.cuda.synchronize()`, Python-side tensor value checks.
	- Memory regressions: double-allocating weights, holding tensors past their lifetime.

	## Suggestions (Nice to Have)
	- Stale, imprecise, or misleading comments/docstrings — a wrong docstring is worse
	than none.
	- Missing shape/dtype assertions at module/parallelism boundaries where they would
	catch real bugs.
	- Functions mixing many unrelated responsibilities that would benefit from splitting.

	## Comment Format

	Prefix each comment with severity and category tag:
	- `[CRITICAL Algorithm]`, `[CRITICAL ModeState]`, `[CRITICAL Export]`
	- `[IMPORTANT Compatibility]`, `[IMPORTANT Performance]`
	- `[SUGGESTION]`

	For each finding, explain: (1) what the issue is, (2) why it matters (impact/risk), (3) specific suggestion for fix.

	Only use inline ```suggestion blocks for simple, self-contained line replacements (typos,
	renames, single-line fixes). For structural changes that add, remove, or reorganize blocks
	of code, use a top-level PR comment with a code block showing the proposed change instead.

	## Completion

	After posting all inline comments, post a summary PR comment:
	- List total findings by severity (CRITICAL: N, IMPORTANT: N, SUGGESTION: N)
	- Highlight the most impactful findings
	- Overall assessment of the PR's risk level

	If no significant issues are found, approve the PR:
	gh pr review $PR_NUMBER --repo $REPO --approve --body "Claude review passed — no significant issues found. LGTM"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(launcher): propagate dflash training failures and fast-fail vllm smoke test on missing draft #35

Workflow file

fix(launcher): propagate dflash training failures and fast-fail vllm smoke test on missing draft #35

Uh oh!

Workflow file for this run