Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions tools/launcher/common/service_utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
native_mpi_rank=$OMPI_COMM_WORLD_RANK
native_mpi_local_rank=$OMPI_COMM_WORLD_LOCAL_RANK
# Works with Slurm launching with `--mpi=pmix`
mpi_rank=${PMIX_RANK:-$native_mpi_rank}
mpi_local_rank=${PMIX_LOCAL_RANK:-$native_mpi_local_rank}
mpi_rank=${PMIX_RANK:-${native_mpi_rank:-${SLURM_PROCID:-0}}}
mpi_local_rank=${PMIX_LOCAL_RANK:-${native_mpi_local_rank:-${SLURM_LOCALID:-0}}}

FAIL=0
FAIL_EXIT=0
Expand Down Expand Up @@ -48,8 +48,23 @@ function report_result {
}

function util_install_extra_dep {
local _marker=/tmp/.nmm_extra_dep_installed
if [[ -f "$_marker" ]]; then
return 0
fi
if [[ "$mpi_local_rank" -eq 0 ]]; then
pip install diskcache
local _nvrx_dir
_nvrx_dir="$(mktemp -d)/nvidia-resiliency-ext"
git clone --depth 1 https://github.com/NVIDIA/nvidia-resiliency-ext "${_nvrx_dir}" \
&& pip install "${_nvrx_dir}"
touch "$_marker"
else
local _waited=0
while [[ ! -f "$_marker" && $_waited -lt 600 ]]; do
sleep 1
_waited=$((_waited + 1))
done
fi
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# DFlash offline synthetic data generation pipeline for Kimi-K2.5.
#
# 1-step pipeline (task_0 only):
# task_0: Data synthesis — query vLLM server to generate prompt samples
#
# Usage:
# uv run launch.py --yaml examples/moonshotai/Kimi-K2.5/hf_offline_dflash.yaml --yes
# uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/moonshotai/Kimi-K2.5/hf_offline_dflash.yaml --yes

job_name: Kimi-K2.5_DFlash_offline
pipeline:
allow_to_fail: false
skip: false
note:

global_vars:
hf_model: /hf-local/moonshotai/Kimi-K2.6
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Model path references K2.6 but job targets K2.5.

The hf_model path points to Kimi-K2.6 while the job name and file path indicate this pipeline is for Kimi-K2.5. Per PR objectives, K2.6 was used as a stand-in during cluster testing, but this should be updated to the correct K2.5 path before merge to avoid confusion.

-    hf_model: /hf-local/moonshotai/Kimi-K2.6
+    hf_model: /hf-local/moonshotai/Kimi-K2.5
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
hf_model: /hf-local/moonshotai/Kimi-K2.6
hf_model: /hf-local/moonshotai/Kimi-K2.5
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/examples/Kimi-K2.5/hf_offline_dflash.yaml` at line 12, The
hf_model entry incorrectly references "/hf-local/moonshotai/Kimi-K2.6" while
this pipeline targets Kimi-K2.5; update the hf_model value to the correct K2.5
model path (e.g., replace "Kimi-K2.6" with "Kimi-K2.5") so the hf_model key
points to the matching K2.5 artifact and remove any leftover test stand-in
reference.


# Step 1: Data synthesis via vLLM server
# Args before "--" go to vllm-serve; args after "--" go to tools/query.py.
task_0:
script: common/vllm/query.sh
args:
- --model <<global_vars.hf_model>>
- --tensor-parallel-size 8
- --port 8000
- --host 0.0.0.0
- --trust-remote-code
- --enforce-eager
- --gpu-memory-utilization 0.95
- --max-model-len 4096
- --
- --data /nemo_run/code/modules/Model-Optimizer/examples/dataset/synthetic_conversations_1k.jsonl
- --save /scratchspace/data
environment:
- HF_LOCAL: /hf-local
- VLLM_STARTUP_TIMEOUT: "1800"
slurm_config:
_factory_: "slurm_factory"
nodes: 1
ntasks_per_node: 1
gpus_per_node: 8
container: vllm/vllm-openai:latest
Loading