[OMNIML-4740] synth_support by ChenhanYu · Pull Request #1496 · NVIDIA/Model-Optimizer

ChenhanYu · 2026-05-14T20:28:44Z

Summary

Adds the EAGLE3 offline pipeline YAML for moonshotai/Kimi-K2.5-DFlash, adapted from Qwen/Qwen3-8B's offline YAML. task_0 cluster-tested green on cw-dfw (Slurm 11782946, experiment cicd_1778864959, elapsed 1:02:11).

Driven by /babysit-jira on OMNIML-4740. Replaces the original pensieve-intern synth_support agent draft (1b02102) which had three structural issues:

Set global_vars.hf_model to the output path (Kimi-K2.5 DFlash) instead of the input checkpoint.
Used a TRT-LLM container (release:1.2.0) that doesn't register KimiK25ForConditionalGeneration as of 2026-05-14.
Committed VERIFICATION_COMMENT.txt (a runner sidecar, not artifact code) and uv.lock regen (+1037/-81, the source of the prior mergeable_state: dirty).

This PR is the cleaned + cluster-validated replacement.

Changes

Directory Kimi-K2.5 DFlash → Kimi-K2.5-DFlash (Slurm tar packaging breaks on spaces in job_name/path).
global_vars.hf_model: /hf-local/moonshotai/Kimi-K2.6 — the canonical Kimi-K2.5 input stand-in staged by the operator on cw-dfw.
task_0 migrated from TRT-LLM to vLLM:
- script: common/vllm/query.sh
- container: vllm/vllm-openai:latest
- ntasks_per_node: 1 (vLLM is single-process)
- --tensor-parallel-size 8, --trust-remote-code
- --enforce-eager (vllm-openai:latest is missing torch/bin/ptxas for inductor autotuning)
- --gpu-memory-utilization 0.95 + --max-model-len 4096 (Kimi weights are 595 GB bf16 on 8×80 GB = 93% weight occupancy; default 0.9 leaves -1.1 GiB for KV cache)
- VLLM_STARTUP_TIMEOUT=1800 env (Kimi load is ~7.7 min, default 600s in query.sh is not enough)
--data switched to the in-repo synthetic_conversations_1k.jsonl — the canonical Speculative-Decoding-Prompt-Samples isn't on cw-dfw; the in-repo dataset is the portable, packager-shipped input for smoke testing.
Sidecars + uv.lock removed from the diff.

Cluster-test evidence (mandatory per the refined synth_support spec)

$ SLURM_CLUSTER=cw_dfw uv run slurm.py \
    --yaml '.../moonshotai/Kimi-K2.5-DFlash/hf_offline_eagle3.yaml' \
    pipeline.task_1.skip=true pipeline.task_2.skip=true pipeline.task_3.skip=true \
    --yes --detach

Slurm 11782946: COMPLETED, elapsed 1:02:11
Loading weights took 461.45 seconds
Model loading took 71.44 GiB memory and 465.10 seconds
Map (num_proc=32): 100%|██████████| 100/100 [06:42<00:00, 4.02s/example]
Saved 10 shards: /scratchspace/data/train-{1..10}-00010.jsonl

Real assistant response verified end-to-end — Kimi correctly answered the "bat and ball" CRT problem:

The ball costs $0.05 (5 cents). Here's why: If the ball costs $0.05, then the bat costs $1.05 (which is $1.00 more). Together they cost $1.10.

Test plan

Dry-run validation: slurm.py --yaml ... --dry-run exits 0
Cluster test on cw-dfw: task_0 only (task_1/2/3.skip=true) — green, 1000 prompts processed, 10 synth-data shards written
Downstream run_pipeline stage of OMNIML-4735 will exercise task_1..3 end-to-end with the produced synth data

🤖 Generated with Claude Code

Summary by CodeRabbit

Improvements
- Enhanced rank initialization with extended fallback support for various resource scheduling systems, improving compatibility across distributed environments.
- Improved cross-rank coordination mechanism for dependency installation, ensuring reliable and efficient setup in multi-node deployments.
New Features
- Added configuration example for Kimi-K2.5 offline speculative decoding pipeline with vLLM integration.

copy-pr-bot · 2026-05-14T20:28:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-14T20:28:50Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds a new vLLM offline speculative decoding pipeline configuration for Kimi-K2.5 and enhances launcher service utilities with Slurm-aware rank detection and cross-rank coordinated dependency installation to prevent race conditions in distributed execution environments.

Changes

Launcher examples and service utilities

Layer / File(s)	Summary
vLLM offline pipeline YAML configuration `tools/launcher/examples/Kimi-K2.5/hf_offline_dflash.yaml`	New `Kimi-K2.5_DFlash_offline` pipeline metadata and `task_0` definition invoke `common/vllm/query.sh` with vLLM runtime parameters (model, tensor parallelism, sequence lengths), environment variables (`HF_LOCAL`, `VLLM_STARTUP_TIMEOUT`), dataset I/O paths, and Slurm/container resource settings (1 node, 1 task, 8 GPUs).
Service utilities: MPI rank fallbacks and coordinated installs `tools/launcher/common/service_utils.sh`	`mpi_rank` and `mpi_local_rank` now resolve through nested fallbacks: `PMIX_RANK`/`PMIX_LOCAL_RANK` → native MPI → `SLURM_PROCID`/`SLURM_LOCALID` → default `0`. `util_install_extra_dep` coordinates one-time installation across local ranks via `/tmp/.nmm_extra_dep_installed` marker file: rank 0 installs `diskcache` and clones/installs `NVIDIA/nvidia-resiliency-ext`; other ranks poll for marker up to 600 seconds.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title '[OMNIML-4740] synth_support' uses a ticket reference and generic term that don't clearly convey the main change—adding a vLLM-based offline speculative decoding pipeline for Kimi-K2.5-DFlash.	Consider a more descriptive title such as 'Add EAGLE3 offline vLLM pipeline for Kimi-K2.5-DFlash' to clarify the primary purpose without requiring ticket context.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR changes are a bash script and YAML config file, not Python. The security check covers Python coding practices only. No Python code present, so check is not applicable.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pensieve-intern/OMNIML-4735/synth-support

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ChenhanYu · 2026-05-14T20:33:46Z

Closing — OMNIML-4735 Epic wrap-up. Used as the first real-release dispatch on the fully-consolidated nmm-sandbox stack (Phase 3 production validation); not pursuing the YAML merge today.

github-actions · 2026-05-14T20:34:14Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-14 20:33 UTC

ChenhanYu · 2026-05-14T20:43:12Z

Re-opening — closing this PR earlier was premature cleanup. OMNIML-4735 is now being driven to an actual release; this PR is the synth_support deliverable awaiting real review + merge.

copy-pr-bot · 2026-05-14T22:48:52Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tools/launcher/examples/moonshotai/Kimi-K2.5 DFlash/hf_offline_eagle3.yaml (1)
104-104: ⚡ Quick win

Pin the VLLM container to an immutable version/digest.

Using vllm/vllm-openai:latest makes benchmark results non-reproducible across runs; pin to a tested tag or digest.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/examples/moonshotai/Kimi-K2.5` DFlash/hf_offline_eagle3.yaml
at line 104, The container image is pinned to the floating tag
"vllm/vllm-openai:latest" under the container key; replace this with an
immutable tag or digest (for example a specific semver tag or a sha256 digest)
that you've tested for benchmarks so results are reproducible, e.g. change the
container value from "vllm/vllm-openai:latest" to a concrete string like
"vllm/vllm-openai:<tested-tag>" or "vllm/vllm-openai@sha256:<digest>".

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/launcher/examples/moonshotai/Kimi-K2.5` DFlash/hf_offline_eagle3.yaml:
- Line 73: The training output path (training.output_dir) is set to
/scratchspace/eagle3 but later steps (task_3 / the draft weights reader) expect
artifacts at /scratchspace/export; change training.output_dir to match the
benchmark input path (/scratchspace/export) or update the task_3 draft weights
path to /scratchspace/eagle3 so both produce and consume the same directory
(refer to the training.output_dir key and the task_3 draft weights
reference/train_eagle.sh usage to locate where to update).
- Around line 39-40: The YAML is missing required environment variables for the
new model config: add MLM_MODEL_CFG (set to the HuggingFace repo ID for the
model) and QUANT_CFG (e.g., NVFP4_DEFAULT_CFG or INT8_DEFAULT_CFG) to every
environment block that defines task envs (the existing blocks that contain
HF_LOCAL) and create an environment block for task_2 that includes HF_LOCAL plus
MLM_MODEL_CFG and QUANT_CFG so the launcher YAML conforms to the model-config
contract; ensure the variable names are spelled exactly as MLM_MODEL_CFG and
QUANT_CFG in each task's environment list.

---

Nitpick comments:
In `@tools/launcher/examples/moonshotai/Kimi-K2.5` DFlash/hf_offline_eagle3.yaml:
- Line 104: The container image is pinned to the floating tag
"vllm/vllm-openai:latest" under the container key; replace this with an
immutable tag or digest (for example a specific semver tag or a sha256 digest)
that you've tested for benchmarks so results are reproducible, e.g. change the
container value from "vllm/vllm-openai:latest" to a concrete string like
"vllm/vllm-openai:<tested-tag>" or "vllm/vllm-openai@sha256:<digest>".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 66ea5a13-65b0-4b34-81d6-f8fa01cc5581

📥 Commits

Reviewing files that changed from the base of the PR and between e27f76f and 006e8b2.

📒 Files selected for processing (1)

tools/launcher/examples/moonshotai/Kimi-K2.5 DFlash/hf_offline_eagle3.yaml

coderabbitai · 2026-05-14T22:54:51Z

+    environment:
+      - HF_LOCAL: /hf-local


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add required MLM_MODEL_CFG and QUANT_CFG env vars for this new model config.

Line 39, Line 57, and Line 97 define task envs, but MLM_MODEL_CFG and QUANT_CFG are missing, and task_2 (Line 67-83) has no environment block at all. This violates the model-config contract for launcher YAMLs.

Proposed patch

global_vars: hf_model: /hf-local/moonshotai/Kimi-K2.5 DFlash + hf_repo_id: moonshotai/Kimi-K2.5-DFlash @@ task_0: @@ environment: - HF_LOCAL: /hf-local + - MLM_MODEL_CFG: <<global_vars.hf_repo_id>> + - QUANT_CFG: NVFP4_DEFAULT_CFG @@ task_1: @@ environment: - HF_MODEL_CKPT: <<global_vars.hf_model>> + - MLM_MODEL_CFG: <<global_vars.hf_repo_id>> + - QUANT_CFG: NVFP4_DEFAULT_CFG @@ task_2: @@ + environment: + - MLM_MODEL_CFG: <<global_vars.hf_repo_id>> + - QUANT_CFG: NVFP4_DEFAULT_CFG @@ task_3: @@ environment: - HF_MODEL_CKPT: <<global_vars.hf_model>> + - MLM_MODEL_CFG: <<global_vars.hf_repo_id>> + - QUANT_CFG: NVFP4_DEFAULT_CFG

As per coding guidelines, "Set MLM_MODEL_CFG environment variable to the HuggingFace repo ID when adding a new model config" and "Set QUANT_CFG environment variable (e.g., NVFP4_DEFAULT_CFG, INT8_DEFAULT_CFG) when adding a new model config".

Also applies to: 57-58, 67-83, 97-99

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/examples/moonshotai/Kimi-K2.5` DFlash/hf_offline_eagle3.yaml around lines 39 - 40, The YAML is missing required environment variables for the new model config: add MLM_MODEL_CFG (set to the HuggingFace repo ID for the model) and QUANT_CFG (e.g., NVFP4_DEFAULT_CFG or INT8_DEFAULT_CFG) to every environment block that defines task envs (the existing blocks that contain HF_LOCAL) and create an environment block for task_2 that includes HF_LOCAL plus MLM_MODEL_CFG and QUANT_CFG so the launcher YAML conforms to the model-config contract; ensure the variable names are spelled exactly as MLM_MODEL_CFG and QUANT_CFG in each task's environment list.

coderabbitai · 2026-05-14T22:54:51Z

+      - --config modules/Model-Optimizer/modelopt_recipes/general/speculative_decoding/eagle3.yaml
+      - model.model_name_or_path=<<global_vars.hf_model>>
+      - data.offline_data_path=/scratchspace/offline_hidden_states
+      - training.output_dir=/scratchspace/eagle3


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align the training output path with benchmark input path.

Line 73 writes artifacts to /scratchspace/eagle3, but Line 88 reads draft weights from /scratchspace/export. Unless train_eagle.sh exports to /scratchspace/export implicitly, task_3 will not find the draft model.

Proposed patch

- - --draft_model_dir /scratchspace/export + - --draft_model_dir /scratchspace/eagle3

Also applies to: 88-88

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/examples/moonshotai/Kimi-K2.5` DFlash/hf_offline_eagle3.yaml at line 73, The training output path (training.output_dir) is set to /scratchspace/eagle3 but later steps (task_3 / the draft weights reader) expect artifacts at /scratchspace/export; change training.output_dir to match the benchmark input path (/scratchspace/export) or update the task_3 draft weights path to /scratchspace/eagle3 so both produce and consume the same directory (refer to the training.output_dir key and the task_3 draft weights reference/train_eagle.sh usage to locate where to update).

codecov · 2026-05-14T23:01:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.93%. Comparing base (e27f76f) to head (b661cef).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1496      +/-   ##
==========================================
- Coverage   77.44%   76.93%   -0.51%     
==========================================
  Files         473      473              
  Lines       51418    52473    +1055     
==========================================
+ Hits        39819    40370     +551     
- Misses      11599    12103     +504

Flag	Coverage Δ
regression	`14.98% <ø> (+0.07%)`	⬆️
unit	`52.67% <ø> (+0.12%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ChenhanYu · 2026-05-14T23:10:13Z

🚧 Blocked on input checkpoint — not yet ready to merge.

Status

Diff cleaned: removed VERIFICATION_COMMENT.txt (sidecar) and uv.lock (incidental regen). Now just the 104-line YAML.
slurm.py --yaml ... --dry-run exits 0 — schema valid, all 4 task blocks parse, all <<global_vars.X>> interpolations resolve.
DCO sign-off added.
Converted back to draft + auto-merge disabled while cluster-test evidence is pending.

What's blocking

This PR's YAML references /hf-local/moonshotai/Kimi-K2.5 DFlash as the input target model in global_vars.hf_model. That's wrong on two counts:

DFlash is the output of this pipeline, not the input. The input is plain Kimi-K2.5.
The Kimi-K2.5 input checkpoint isn't on any reachable cluster yet:
- cw-dfw: moonshotai/Kimi-K2-Instruct, moonshotai/Kimi-K2-Thinking only — no K2.5.
- oci-hsg: has Kimi-K2.5-nolayer (test variant), Kimi-K2.6, Kimi-K2-Thinking — but no plain K2.5.
- NAS (team-swdl-nas:hf-local/moonshotai/): empty.

Unblock path

A human needs to:

Stage Kimi-K2.5 (the input checkpoint) to cw-dfw's /hf-local/moonshotai/Kimi-K2.5/ (or confirm the agreed-upon input name).
Either edit this PR's global_vars.hf_model to the corrected path, OR re-fire the synth_support agent against the now-present checkpoint via @Pensieve intern_retry OMNIML-4740.
The agent (or human) re-runs slurm.py --yaml ... --dry-run AND launches task_0 on cw-dfw with task_1/2/3.skip=true to produce real cluster-test evidence before this PR is marked ready and merged.

The pre-merge mandate of "real cluster test evidence" is the policy this babysit run is enforcing — and is being codified in pensieve-intern MR !4 so the next synth_support agent run can't ship a YAML without it.

(Driven by /babysit-jira on OMNIML-4740.)

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

tools/launcher/examples/moonshotai/Kimi-K2.5-DFlash/hf_offline_eagle3.yaml (2)

85-85: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix task_2 → task_3 artifact path mismatch.

Line 85 writes to /scratchspace/eagle3, but Line 100 reads from /scratchspace/export. This can break stage 4 input resolution.

Proposed patch

-      - --draft_model_dir /scratchspace/export
+      - --draft_model_dir /scratchspace/eagle3

Also applies to: 100-100

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/examples/moonshotai/Kimi-K2.5-DFlash/hf_offline_eagle3.yaml`
at line 85, The artifact path for the training output is inconsistent between
tasks: training.output_dir is set to /scratchspace/eagle3 while later steps
expect /scratchspace/export, causing task_2 → task_3 input resolution to fail;
update the value of training.output_dir (the training.output_dir key in the
YAML) to match the downstream artifact path (/scratchspace/export) or change the
downstream read path to /scratchspace/eagle3 so both producer
(training.output_dir) and consumer use the same artifact directory.

47-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add required MLM_MODEL_CFG and QUANT_CFG to every task env.

These required model-config env vars are missing in task_0, task_1, and task_3, and task_2 has no environment block at all.

Proposed patch

   global_vars:
     hf_model: /hf-local/moonshotai/Kimi-K2.6
+    hf_repo_id: moonshotai/Kimi-K2.6

@@
   task_0:
@@
     environment:
       - HF_LOCAL: /hf-local
+      - MLM_MODEL_CFG: <<global_vars.hf_repo_id>>
+      - QUANT_CFG: NVFP4_DEFAULT_CFG
       - VLLM_STARTUP_TIMEOUT: "1800"  # Kimi-K2.6 weight load alone is ~7.7 min

@@
   task_1:
@@
     environment:
       - HF_MODEL_CKPT: <<global_vars.hf_model>>
+      - MLM_MODEL_CFG: <<global_vars.hf_repo_id>>
+      - QUANT_CFG: NVFP4_DEFAULT_CFG

@@
   task_2:
@@
+    environment:
+      - MLM_MODEL_CFG: <<global_vars.hf_repo_id>>
+      - QUANT_CFG: NVFP4_DEFAULT_CFG
     slurm_config:

@@
   task_3:
@@
     environment:
       - HF_MODEL_CKPT: <<global_vars.hf_model>>
+      - MLM_MODEL_CFG: <<global_vars.hf_repo_id>>
+      - QUANT_CFG: NVFP4_DEFAULT_CFG

As per coding guidelines, "Set MLM_MODEL_CFG environment variable to the HuggingFace repo ID when adding a new model config" and "Set QUANT_CFG environment variable (e.g., NVFP4_DEFAULT_CFG, INT8_DEFAULT_CFG) when adding a new model config".

Also applies to: 69-70, 79-95, 109-110

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/examples/moonshotai/Kimi-K2.5-DFlash/hf_offline_eagle3.yaml`
around lines 47 - 49, The tasks in this YAML are missing required environment
vars: add MLM_MODEL_CFG and QUANT_CFG to the environment lists for task_0,
task_1, and task_3, and create an environment block for task_2 with those same
vars; specifically set MLM_MODEL_CFG to the HuggingFace repo ID used by this
model and QUANT_CFG to the chosen quantization profile (e.g., NVFP4_DEFAULT_CFG
or INT8_DEFAULT_CFG), ensuring the variable names match exactly (MLM_MODEL_CFG,
QUANT_CFG) in each task's environment array so downstream code that reads these
vars (task_0, task_1, task_2, task_3) will find them.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/launcher/examples/moonshotai/Kimi-K2.5-DFlash/hf_offline_eagle3.yaml`:
- Line 22: The comment that mentions vLLM loading "Kimi-K2.5" is stale relative
to the configured checkpoint key hf_model (currently set to Kimi-K2.6); update
that comment text to reference "Kimi-K2.6" so it matches the hf_model value, and
make the same change for the other occurrences referenced (the comments around
the hf_model key and the comments at the other occurrences noted). Ensure all
comments in the file that mention Kimi-K2.5 are changed to Kimi-K2.6 to keep
configuration and documentation consistent.

---

Duplicate comments:
In `@tools/launcher/examples/moonshotai/Kimi-K2.5-DFlash/hf_offline_eagle3.yaml`:
- Line 85: The artifact path for the training output is inconsistent between
tasks: training.output_dir is set to /scratchspace/eagle3 while later steps
expect /scratchspace/export, causing task_2 → task_3 input resolution to fail;
update the value of training.output_dir (the training.output_dir key in the
YAML) to match the downstream artifact path (/scratchspace/export) or change the
downstream read path to /scratchspace/eagle3 so both producer
(training.output_dir) and consumer use the same artifact directory.
- Around line 47-49: The tasks in this YAML are missing required environment
vars: add MLM_MODEL_CFG and QUANT_CFG to the environment lists for task_0,
task_1, and task_3, and create an environment block for task_2 with those same
vars; specifically set MLM_MODEL_CFG to the HuggingFace repo ID used by this
model and QUANT_CFG to the chosen quantization profile (e.g., NVFP4_DEFAULT_CFG
or INT8_DEFAULT_CFG), ensuring the variable names match exactly (MLM_MODEL_CFG,
QUANT_CFG) in each task's environment array so downstream code that reads these
vars (task_0, task_1, task_2, task_3) will find them.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 39023861-07f0-4fbf-a6bb-da4e88f3160a

📥 Commits

Reviewing files that changed from the base of the PR and between 006e8b2 and c3eef71.

📒 Files selected for processing (1)

tools/launcher/examples/moonshotai/Kimi-K2.5-DFlash/hf_offline_eagle3.yaml

coderabbitai · 2026-05-15T18:36:39Z

+  note:
+
+  global_vars:
+    hf_model: /hf-local/moonshotai/Kimi-K2.6


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale model reference in comments.

The comment says vLLM loads Kimi-K2.5, but current configured checkpoint is Kimi-K2.6 (Line 22). Please align the comment to avoid operator confusion.

Also applies to: 27-28

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/examples/moonshotai/Kimi-K2.5-DFlash/hf_offline_eagle3.yaml` at line 22, The comment that mentions vLLM loading "Kimi-K2.5" is stale relative to the configured checkpoint key hf_model (currently set to Kimi-K2.6); update that comment text to reference "Kimi-K2.6" so it matches the hf_model value, and make the same change for the other occurrences referenced (the comments around the hf_model key and the comments at the other occurrences noted). Ensure all comments in the file that mention Kimi-K2.5 are changed to Kimi-K2.6 to keep configuration and documentation consistent.

ChenhanYu · 2026-05-15T18:48:38Z

Should be Kimi-K2.5/hf_offline_dflash.yaml instead.

ChenhanYu · 2026-05-15T18:49:11Z

+      gpus_per_node: 8
+      container: vllm/vllm-openai:latest
+
+  # Step 2: Dump hidden states from target model


Remove the task is not been tested

ChenhanYu · 2026-05-15T18:49:18Z

+      container: nvcr.io/nvidia/tensorrt-llm/release:1.2.0
+
+  # Step 3: Train EAGLE3 draft head (offline, single task)
+  task_2:


Remove the task is not been tested

ChenhanYu · 2026-05-15T18:49:23Z

+      gpus_per_node: 8
+      container: nvcr.io/nvidia/tensorrt-llm/release:1.2.0
+
+  # Step 4: Benchmark speculative decoding (VLLM backend)


Remove the task is not been tested

ChenhanYu · 2026-05-15T20:48:19Z

Review addressed in the force-push (b661cef):

✅ Renamed to moonshotai/Kimi-K2.5/hf_offline_dflash.yaml (input-model directory, DFlash-algorithm filename — matches the spec convention).
✅ Removed task_1, task_2, task_3. Only task_0 ships in this PR (the one with cluster-test evidence). Downstream stages will add their own tasks each with their own evidence.
job_name updated to Kimi-K2.5_DFlash_offline.
Dry-run re-validated against the new path: exit 0.
Existing cluster-test evidence still applies — only the file path + scope changed, not task_0 contents.

The CodeRabbit comments about MLM_MODEL_CFG/QUANT_CFG (megatron quantize specific, doesn't apply to vLLM task_0), the task_2/task_3 artifact-path mismatch (n/a — both removed), and pinning vllm/vllm-openai:latest to a digest are noted but not addressed in this push.

Also rolling the lessons into the synth_support spec template via pensieve-intern MR !5 (https://gitlab-master.nvidia.com/omniml/integration/pensieve-intern/-/merge_requests/5) — that MR will be updated with the directory/filename/per-stage-scope conventions so the next agent run on a new model doesn't ship untested tasks or use the output name as the directory.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/launcher/examples/moonshotai/Kimi-K2.5/hf_offline_dflash.yaml`:
- Around line 46-51: The launcher YAML's environment block is missing required
model keys; add MLM_MODEL_CFG set to the HuggingFace repo ID for this model and
QUANT_CFG set to the chosen quantization profile (e.g., NVFP4_DEFAULT_CFG or
INT8_DEFAULT_CFG) inside the existing environment array so the new launcher
config provides both MLM_MODEL_CFG and QUANT_CFG alongside HF_LOCAL and
VLLM_STARTUP_TIMEOUT.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5475f944-f3fe-4118-96b5-bf9384022088

📥 Commits

Reviewing files that changed from the base of the PR and between c3eef71 and b661cef.

📒 Files selected for processing (1)

tools/launcher/examples/moonshotai/Kimi-K2.5/hf_offline_dflash.yaml

coderabbitai · 2026-05-15T20:50:57Z

+    environment:
+      - HF_LOCAL: /hf-local
+      - VLLM_STARTUP_TIMEOUT: "1800"  # Kimi-K2.6 weight load alone is ~7.7 min
+                                      # at 71 GiB/GPU; default 600s in query.sh
+                                      # is not enough to also cover KV cache
+                                      # profiling + encoder cache init


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add required model environment keys for new launcher configs.

environment is missing MLM_MODEL_CFG and QUANT_CFG, which are required for new model configs in this launcher YAML family.

Suggested patch

environment: - HF_LOCAL: /hf-local + - MLM_MODEL_CFG: moonshotai/Kimi-K2.5 + - QUANT_CFG: NVFP4_DEFAULT_CFG - VLLM_STARTUP_TIMEOUT: "1800" # Kimi-K2.6 weight load alone is ~7.7 min # at 71 GiB/GPU; default 600s in query.sh # is not enough to also cover KV cache

As per coding guidelines, "tools/launcher/**/*.yaml: Set MLM_MODEL_CFG environment variable to the HuggingFace repo ID when adding a new model config" and "Set QUANT_CFG environment variable (e.g., NVFP4_DEFAULT_CFG, INT8_DEFAULT_CFG) when adding a new model config".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

environment:

- HF_LOCAL: /hf-local

- VLLM_STARTUP_TIMEOUT: "1800" # Kimi-K2.6 weight load alone is ~7.7 min

# at 71 GiB/GPU; default 600s in query.sh

# is not enough to also cover KV cache

# profiling + encoder cache init

environment:

- HF_LOCAL: /hf-local

- MLM_MODEL_CFG: moonshotai/Kimi-K2.5

- QUANT_CFG: NVFP4_DEFAULT_CFG

- VLLM_STARTUP_TIMEOUT: "1800" # Kimi-K2.6 weight load alone is ~7.7 min

# at 71 GiB/GPU; default 600s in query.sh

# is not enough to also cover KV cache

# profiling + encoder cache init

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/examples/moonshotai/Kimi-K2.5/hf_offline_dflash.yaml` around lines 46 - 51, The launcher YAML's environment block is missing required model keys; add MLM_MODEL_CFG set to the HuggingFace repo ID for this model and QUANT_CFG set to the chosen quantization profile (e.g., NVFP4_DEFAULT_CFG or INT8_DEFAULT_CFG) inside the existing environment array so the new launcher config provides both MLM_MODEL_CFG and QUANT_CFG alongside HF_LOCAL and VLLM_STARTUP_TIMEOUT.

ChenhanYu · 2026-05-16T04:18:02Z

Re-cleaned via /babysit-jira on OMNIML-4740.

The 2026-05-15 14:26 agent re-fire (pipeline 51447198, running pre-MR-!4 pensieve-intern v0.33.48) force-pushed over the clean cluster-tested commit (b661cef) with a polluted draft — uv.lock churn + VERIFICATION_COMMENT.txt sidecar + a regressed YAML pointing at /hf-local/modelopt/Speculative-Decoding-Prompt-Samples which isn'''t staged on cw-dfw. The agent reported SSH auth failure on the cluster test step and fell back to the sidecar-only verify-task path.

Reset to b661cef — the single clean cluster-validated commit (Slurm 11782946 on cw-dfw, COMPLETED 1:02:11, real end-to-end vLLM output verified).

Paired infra fix to prevent re-occurrence: nmm-sandbox!143 — bumps PENSIEVE_INTERN_REF v0.33.48 → v0.33.57 (which ships the auto-strip uv.lock/sidecar pollution logic from pensieve-intern !4) and adds SSH key + known_hosts to the intern-agent CI before_script (so the next intern_retry can actually run the mandated cw-dfw cluster test instead of falling back to a sidecar).

This PR is back to ready-for-review state; awaiting CODEOWNERS approval from @kevalmorabia97 + modelopt-setup-codeowners.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/launcher/common/service_utils.sh`:
- Around line 51-54: The current fixed marker (/tmp/.nmm_extra_dep_installed)
stored in the _marker variable can be stale across runs; change _marker in
tools/launcher/common/service_utils.sh to be job/session-scoped (e.g., include
$TMPDIR if set or append a unique token like $$, $CI_JOB_ID, or a timestamp) or
generate a unique path with mktemp and persist that same path for the check and
creation; update any code that references _marker to use the new session-scoped
marker so the if [[ -f "$_marker" ]] check and subsequent creation are
consistent.
- Around line 55-67: The install block currently may create the marker even if
installs fail and non-zero ranks keep running after the 600s wait; update the
code around the mpi_local_rank check to fail fast: in the rank 0 branch (where
pip install diskcache, git clone and pip install "${_nvrx_dir}" run) check each
command's exit status and only touch/emit "$_marker" after all installs succeed,
otherwise log the error and exit non-zero; in the non-zero branch (the while
loop using _waited and checking "$_marker") detect the timeout (when _waited >=
600) and abort with a non-zero exit and error message instead of silently
continuing so downstream steps don't run without dependencies.
- Around line 59-60: The git clone in service_utils.sh clones
nvidia-resiliency-ext without pinning, so add an environment variable (e.g.,
NVRX_REF) and use it when cloning to pin to a tag/commit; modify the clone
invocation that uses _nvrx_dir to include --branch "$NVRX_REF" (or fall back to
a sensible default like main when NVRX_REF is empty) and document/export
NVRX_REF so CI/ops can set an immutable ref before running the script.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1759c510-8bfc-419f-8829-c46dd02aeb5e

📥 Commits

Reviewing files that changed from the base of the PR and between b661cef and f0d20f2.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

BLOCKED_ON_UPSTREAM.txt
tools/launcher/common/service_utils.sh
tools/launcher/examples/moonshotai/Kimi-K2.5/hf_offline_dflash.yaml

✅ Files skipped from review due to trivial changes (1)

BLOCKED_ON_UPSTREAM.txt

🚧 Files skipped from review as they are similar to previous changes (1)

tools/launcher/examples/moonshotai/Kimi-K2.5/hf_offline_dflash.yaml

coderabbitai · 2026-05-16T05:12:13Z

+    local _marker=/tmp/.nmm_extra_dep_installed
+    if [[ -f "$_marker" ]]; then
+        return 0
+    fi


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope the marker file to the job/session.

Line 51 uses a fixed /tmp marker name, so stale files from previous runs can cause this run to skip installation incorrectly.

Suggested fix

- local _marker=/tmp/.nmm_extra_dep_installed + local _marker="/tmp/.nmm_extra_dep_installed.${SLURM_JOB_ID:-$$}.${USER:-unknown}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

local _marker=/tmp/.nmm_extra_dep_installed

if [[ -f "$_marker" ]]; then

return 0

fi

local _marker="/tmp/.nmm_extra_dep_installed.${SLURM_JOB_ID:-$$}.${USER:-unknown}"

if [[ -f "$_marker" ]]; then

return 0

fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/common/service_utils.sh` around lines 51 - 54, The current fixed marker (/tmp/.nmm_extra_dep_installed) stored in the _marker variable can be stale across runs; change _marker in tools/launcher/common/service_utils.sh to be job/session-scoped (e.g., include $TMPDIR if set or append a unique token like $$, $CI_JOB_ID, or a timestamp) or generate a unique path with mktemp and persist that same path for the check and creation; update any code that references _marker to use the new session-scoped marker so the if [[ -f "$_marker" ]] check and subsequent creation are consistent.

coderabbitai · 2026-05-16T05:12:13Z

    if [[ "$mpi_local_rank" -eq 0 ]]; then
        pip install diskcache
+        local _nvrx_dir
+        _nvrx_dir="$(mktemp -d)/nvidia-resiliency-ext"
+        git clone --depth 1 https://github.com/NVIDIA/nvidia-resiliency-ext "${_nvrx_dir}" \
+            && pip install "${_nvrx_dir}"
+        touch "$_marker"
+    else
+        local _waited=0
+        while [[ ! -f "$_marker" && $_waited -lt 600 ]]; do
+            sleep 1
+            _waited=$((_waited + 1))
+        done


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on install errors and wait timeout.

Line 61 can publish success even if dependency install failed, and Lines 64-67 let non-zero ranks continue after 600s timeout with no marker. Both paths can silently proceed with missing deps.

Suggested fix

if [[ "$mpi_local_rank" -eq 0 ]]; then - pip install diskcache + pip install diskcache || return 1 local _nvrx_dir _nvrx_dir="$(mktemp -d)/nvidia-resiliency-ext" git clone --depth 1 https://github.com/NVIDIA/nvidia-resiliency-ext "${_nvrx_dir}" \ - && pip install "${_nvrx_dir}" - touch "$_marker" + && pip install "${_nvrx_dir}" \ + && touch "$_marker" \ + || return 1 else local _waited=0 while [[ ! -f "$_marker" && $_waited -lt 600 ]]; do sleep 1 _waited=$((_waited + 1)) done + [[ -f "$_marker" ]] || return 1 fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ "$mpi_local_rank" -eq 0 ]]; then

pip install diskcache

local _nvrx_dir

_nvrx_dir="$(mktemp -d)/nvidia-resiliency-ext"

git clone --depth 1 https://github.com/NVIDIA/nvidia-resiliency-ext "${_nvrx_dir}" \

&& pip install "${_nvrx_dir}"

touch "$_marker"

else

local _waited=0

while [[ ! -f "$_marker" && $_waited -lt 600 ]]; do

sleep 1

_waited=$((_waited + 1))

done

if [[ "$mpi_local_rank" -eq 0 ]]; then

pip install diskcache || return 1

local _nvrx_dir

_nvrx_dir="$(mktemp -d)/nvidia-resiliency-ext"

git clone --depth 1 https://github.com/NVIDIA/nvidia-resiliency-ext "${_nvrx_dir}" \

&& pip install "${_nvrx_dir}" \

&& touch "$_marker" \

|| return 1

else

local _waited=0

while [[ ! -f "$_marker" && $_waited -lt 600 ]]; do

sleep 1

_waited=$((_waited + 1))

done

[[ -f "$_marker" ]] || return 1

fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/common/service_utils.sh` around lines 55 - 67, The install block currently may create the marker even if installs fail and non-zero ranks keep running after the 600s wait; update the code around the mpi_local_rank check to fail fast: in the rank 0 branch (where pip install diskcache, git clone and pip install "${_nvrx_dir}" run) check each command's exit status and only touch/emit "$_marker" after all installs succeed, otherwise log the error and exit non-zero; in the non-zero branch (the while loop using _waited and checking "$_marker") detect the timeout (when _waited >= 600) and abort with a non-zero exit and error message instead of silently continuing so downstream steps don't run without dependencies.

coderabbitai · 2026-05-16T05:12:13Z

+        git clone --depth 1 https://github.com/NVIDIA/nvidia-resiliency-ext "${_nvrx_dir}" \
+            && pip install "${_nvrx_dir}"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cat -n tools/launcher/common/service_utils.sh | sed -n '50,70p'

Repository: NVIDIA/Model-Optimizer

Length of output: 837

🏁 Script executed:

rg "NVRX_REF|nvidia-resiliency-ext" --type sh

Repository: NVIDIA/Model-Optimizer

Length of output: 292

🏁 Script executed:

rg "nvidia-resiliency-ext" -i

Repository: NVIDIA/Model-Optimizer

Length of output: 292

🏁 Script executed:

rg "resiliency|reproducibility|version" tools/launcher/common/service_utils.sh -B2 -A2

Repository: NVIDIA/Model-Optimizer

Length of output: 569

🏁 Script executed:

grep -r "NVRX" tools/launcher/ 2>/dev/null || echo "No NVRX references found"

Repository: NVIDIA/Model-Optimizer

Length of output: 91

Pin nvidia-resiliency-ext to an immutable ref.

Line 59 clones from the default branch at runtime without specifying a Git reference, creating a supply-chain and reproducibility risk for cluster jobs. The branch pointer can move unexpectedly, making builds non-deterministic.

Add an environment variable to allow pinning to a specific tag or commit:

Suggested fix

+ local _nvrx_ref="${NVRX_REF:-<tag-or-commit>}" - git clone --depth 1 https://github.com/NVIDIA/nvidia-resiliency-ext "${_nvrx_dir}" \ + git clone --depth 1 --branch "${_nvrx_ref}" https://github.com/NVIDIA/nvidia-resiliency-ext "${_nvrx_dir}" \ && pip install "${_nvrx_dir}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/common/service_utils.sh` around lines 59 - 60, The git clone in service_utils.sh clones nvidia-resiliency-ext without pinning, so add an environment variable (e.g., NVRX_REF) and use it when cloning to pin to a tag/commit; modify the clone invocation that uses _nvrx_dir to include --branch "$NVRX_REF" (or fall back to a sensible default like main when NVRX_REF is empty) and document/export NVRX_REF so CI/ops can set an immutable ref before running the script.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tools/launcher/examples/Kimi-K2.5/hf_offline_dflash.yaml (1)
38-38: ⚡ Quick win

Pin container image to an available versioned tag instead of latest.

Using vllm/vllm-openai:latest breaks reproducibility as the tag can change between runs. However, the version v0.8.5 does not exist in the vllm/vllm-openai repository. Available pinned alternatives include cu129-nightly or model-specific tags. Check the vllm Docker Hub releases for a stable tag appropriate for your use case.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/examples/Kimi-K2.5/hf_offline_dflash.yaml` at line 38, Replace
the floating image tag "vllm/vllm-openai:latest" with a stable, available
versioned tag to ensure reproducibility; update the container field (the line
containing container: vllm/vllm-openai:latest) to a specific tag such as
vllm/vllm-openai:cu129-nightly or another model-specific release found on the
vllm Docker Hub tags page, verifying the chosen tag exists before committing.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/launcher/examples/Kimi-K2.5/hf_offline_dflash.yaml`:
- Line 12: The hf_model entry incorrectly references
"/hf-local/moonshotai/Kimi-K2.6" while this pipeline targets Kimi-K2.5; update
the hf_model value to the correct K2.5 model path (e.g., replace "Kimi-K2.6"
with "Kimi-K2.5") so the hf_model key points to the matching K2.5 artifact and
remove any leftover test stand-in reference.
- Line 23: Replace the incorrect vLLM flag --trust_remote_code (underscore) with
the hyphenated form --trust-remote-code in the hf_offline_dflash.yaml example so
it matches other vLLM flags (e.g., --enforce-eager, --gpu-memory-utilization,
--max-model-len) and is accepted by the CLI; locate the entry that currently
reads --trust_remote_code and update it to --trust-remote-code.

---

Nitpick comments:
In `@tools/launcher/examples/Kimi-K2.5/hf_offline_dflash.yaml`:
- Line 38: Replace the floating image tag "vllm/vllm-openai:latest" with a
stable, available versioned tag to ensure reproducibility; update the container
field (the line containing container: vllm/vllm-openai:latest) to a specific tag
such as vllm/vllm-openai:cu129-nightly or another model-specific release found
on the vllm Docker Hub tags page, verifying the chosen tag exists before
committing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 623468f8-1c67-4fbd-aa04-fbe320694ab4

📥 Commits

Reviewing files that changed from the base of the PR and between f0d20f2 and 50be962.

📒 Files selected for processing (2)

tools/launcher/common/service_utils.sh
tools/launcher/examples/Kimi-K2.5/hf_offline_dflash.yaml

coderabbitai · 2026-05-18T22:45:02Z

+  note:
+
+  global_vars:
+    hf_model: /hf-local/moonshotai/Kimi-K2.6


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Model path references K2.6 but job targets K2.5.

The hf_model path points to Kimi-K2.6 while the job name and file path indicate this pipeline is for Kimi-K2.5. Per PR objectives, K2.6 was used as a stand-in during cluster testing, but this should be updated to the correct K2.5 path before merge to avoid confusion.

- hf_model: /hf-local/moonshotai/Kimi-K2.6 + hf_model: /hf-local/moonshotai/Kimi-K2.5

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

hf_model: /hf-local/moonshotai/Kimi-K2.6

hf_model: /hf-local/moonshotai/Kimi-K2.5

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/examples/Kimi-K2.5/hf_offline_dflash.yaml` at line 12, The hf_model entry incorrectly references "/hf-local/moonshotai/Kimi-K2.6" while this pipeline targets Kimi-K2.5; update the hf_model value to the correct K2.5 model path (e.g., replace "Kimi-K2.6" with "Kimi-K2.5") so the hf_model key points to the matching K2.5 artifact and remove any leftover test stand-in reference.

coderabbitai · 2026-05-18T22:45:02Z

+      - --tensor-parallel-size 8
+      - --port 8000
+      - --host 0.0.0.0
+      - --trust_remote_code


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

vLLM --trust-remote-code flag syntax

💡 Result:

In vLLM, the --trust-remote-code flag is used to permit the execution of custom model code (such as configuration or modeling files) that is hosted on the Hugging Face Model Hub rather than being natively integrated into the Transformers library [1][2][3]. Usage Syntax: 1. CLI (OpenAI-compatible server): When using the vLLM server, include the flag as a command-line argument [4][3]: vllm serve <model_name> --trust-remote-code 2. Python API: When initializing the LLM class in your Python code, pass the parameter as a boolean argument [3]: from vllm import LLM llm = LLM(model="<model_name>", trust_remote_code=True) Important Considerations: - Security: This flag executes arbitrary code from the model repository on your local machine [1]. Only use it with models from trusted sources [1]. - Necessity: This flag is generally required for custom models or newer architectures that have not yet been upstreamed into the official Hugging Face Transformers library [1][2]. - Default Behavior: While the flag is often required for specific models, users should be aware that some model configurations may hardcode this requirement, and security-related updates may occasionally affect how this flag is handled [5]. Always ensure your transformers library is up to date, as native support for newer models is frequently added, which can eliminate the need for this flag [1][2].

Citations:

1: [Usage]: Running Phi-3-small-128k-instruct with v0.4.3 without --trust-remote-code vllm-project/vllm#5244

2: https://discuss.huggingface.co/t/how-to-avoid-trust-remote-code-true-for-my-models/84134

3: https://docs.vllm.ai/en/latest/models/supported_models/

4: https://docs.vllm.ai/en/v0.5.5/models/engine_args.html

5: https://www.reddit.com/r/LocalLLaMA/comments/1s72zog/vllm_cve202627893_trustremotecodefalse_is/

🏁 Script executed:

fd "hf_offline_dflash.yaml" tools/launcher/examples/

Repository: NVIDIA/Model-Optimizer

Length of output: 123

🏁 Script executed:

cat -n tools/launcher/examples/Kimi-K2.5/hf_offline_dflash.yaml

Repository: NVIDIA/Model-Optimizer

Length of output: 1441

Fix flag syntax: use --trust-remote-code with hyphen instead of underscore.

Line 23 uses --trust_remote_code with an underscore, while other vLLM flags in the same file use hyphens (--enforce-eager, --gpu-memory-utilization, --max-model-len). vLLM CLI expects the hyphenated form.

Diff

- - --trust_remote_code + - --trust-remote-code

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- --trust_remote_code

- --trust-remote-code

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/examples/Kimi-K2.5/hf_offline_dflash.yaml` at line 23, Replace the incorrect vLLM flag --trust_remote_code (underscore) with the hyphenated form --trust-remote-code in the hf_offline_dflash.yaml example so it matches other vLLM flags (e.g., --enforce-eager, --gpu-memory-utilization, --max-model-len) and is accepted by the CLI; locate the entry that currently reads --trust_remote_code and update it to --trust-remote-code.

ChenhanYu closed this May 14, 2026

ChenhanYu reopened this May 14, 2026

ChenhanYu force-pushed the pensieve-intern/OMNIML-4735/synth-support branch 2 times, most recently from 1b02102 to 006e8b2 Compare May 14, 2026 22:48

ChenhanYu marked this pull request as ready for review May 14, 2026 22:49

ChenhanYu enabled auto-merge (squash) May 14, 2026 22:49

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

ChenhanYu disabled auto-merge May 14, 2026 23:04

ChenhanYu marked this pull request as draft May 14, 2026 23:04

ChenhanYu force-pushed the pensieve-intern/OMNIML-4735/synth-support branch from 006e8b2 to c3eef71 Compare May 15, 2026 18:32

ChenhanYu marked this pull request as ready for review May 15, 2026 18:33

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

ChenhanYu commented May 15, 2026

View reviewed changes

ChenhanYu force-pushed the pensieve-intern/OMNIML-4735/synth-support branch from c3eef71 to b661cef Compare May 15, 2026 20:47

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

ChenhanYu force-pushed the pensieve-intern/OMNIML-4735/synth-support branch from b661cef to d35357d Compare May 15, 2026 21:26

ChenhanYu requested a review from a team as a code owner May 15, 2026 21:26

ChenhanYu requested a review from kevalmorabia97 May 15, 2026 21:26

ChenhanYu force-pushed the pensieve-intern/OMNIML-4735/synth-support branch from d35357d to b661cef Compare May 16, 2026 04:16

ChenhanYu force-pushed the pensieve-intern/OMNIML-4735/synth-support branch from b661cef to f0d20f2 Compare May 16, 2026 05:08

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

ChenhanYu force-pushed the pensieve-intern/OMNIML-4735/synth-support branch from f0d20f2 to b661cef Compare May 16, 2026 05:17

kevalmorabia97 removed the request for review from a team May 16, 2026 11:00

ChenhanYu force-pushed the pensieve-intern/OMNIML-4735/synth-support branch from b661cef to 50be962 Compare May 18, 2026 22:40

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

[OMNIML-4740] synth_support — pensieve-intern agent draft

abbda41

ChenhanYu force-pushed the pensieve-intern/OMNIML-4735/synth-support branch from 50be962 to abbda41 Compare May 19, 2026 00:37

		git clone --depth 1 https://github.com/NVIDIA/nvidia-resiliency-ext "${_nvrx_dir}" \
		&& pip install "${_nvrx_dir}"

	hf_model: /hf-local/moonshotai/Kimi-K2.6
	hf_model: /hf-local/moonshotai/Kimi-K2.5

Conversation

ChenhanYu commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Cluster-test evidence (mandatory per the refined synth_support spec)

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

ChenhanYu commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

ChenhanYu commented May 14, 2026

Uh oh!

copy-pr-bot Bot commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ChenhanYu commented May 14, 2026

Status

What's blocking

Unblock path

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

ChenhanYu May 15, 2026

Choose a reason for hiding this comment

Uh oh!

ChenhanYu May 15, 2026

Choose a reason for hiding this comment

Uh oh!

ChenhanYu May 15, 2026

Choose a reason for hiding this comment

Uh oh!

ChenhanYu May 15, 2026

Choose a reason for hiding this comment

Uh oh!

ChenhanYu commented May 15, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

ChenhanYu commented May 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

ChenhanYu commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading

codecov Bot commented May 14, 2026 •

edited

Loading