Skip to content

Commit 006e8b2

Browse files
committed
[OMNIML-4740] Add EAGLE3 offline pipeline YAML for moonshotai/Kimi-K2.5 DFlash
4-task pipeline adapted from Qwen/Qwen3-8B as the closest available sibling (per the synth_support stage SPEC: copy + adapt the nearest established-family file as the seed when no sibling exists in the target family). Dry-run validation (mandatory per the synth_support spec): $ uv run slurm.py --yaml '.../moonshotai/Kimi-K2.5 DFlash/hf_offline_eagle3.yaml' --dry-run ... (config table for all 4 tasks rendered) ... Exit: 0 The dry-run validates OmegaConf schema, <<global_vars.X>> interpolation, and factory references. Real cluster submission of the 4 tasks is gated on the downstream run_pipeline stage of Epic OMNIML-4735. This commit is the cleaned version of pensieve-intern agent draft 1b02102 — uv.lock churn and VERIFICATION_COMMENT.txt sidecar removed (those belong in neither this PR nor any synth_support PR). Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
1 parent e27f76f commit 006e8b2

1 file changed

Lines changed: 104 additions & 0 deletions

File tree

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# EAGLE3 offline speculative decoding pipeline for moonshotai/Kimi-K2.5 DFlash.
2+
#
3+
# 4-step pipeline:
4+
# task_0: Data synthesis — query TRT-LLM server to generate prompt samples
5+
# task_1: Dump hidden states — run target model to capture hidden states
6+
# task_2: Offline training — train the EAGLE3 draft head
7+
# task_3: Benchmark — evaluate speculative decoding speedup via VLLM
8+
#
9+
# All tasks share /scratchspace to pass artifacts between steps.
10+
#
11+
# Usage:
12+
# uv run launch.py --yaml examples/moonshotai/Kimi-K2.5 DFlash/hf_offline_eagle3.yaml --yes
13+
# uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/moonshotai/Kimi-K2.5 DFlash/hf_offline_eagle3.yaml --yes
14+
15+
job_name: Kimi-K2.5 DFlash_EAGLE3_offline
16+
pipeline:
17+
allow_to_fail: false
18+
skip: false
19+
note:
20+
21+
global_vars:
22+
hf_model: /hf-local/moonshotai/Kimi-K2.5 DFlash
23+
24+
# Step 1: Data synthesis via TRT-LLM server
25+
# Args before "--" go to trtllm-serve; args after "--" go to tools/query.py.
26+
task_0:
27+
script: common/tensorrt_llm/query.sh
28+
args:
29+
- --model <<global_vars.hf_model>>
30+
- --tp_size 8
31+
- --ep_size 8
32+
- --max_num_tokens 32000
33+
- --port 8000
34+
- --host 0.0.0.0
35+
- --trust_remote_code
36+
- --
37+
- --data /hf-local/modelopt/Speculative-Decoding-Prompt-Samples
38+
- --save /scratchspace/data
39+
environment:
40+
- HF_LOCAL: /hf-local
41+
slurm_config:
42+
_factory_: "slurm_factory"
43+
nodes: 1
44+
ntasks_per_node: 8
45+
gpus_per_node: 8
46+
container: nvcr.io/nvidia/tensorrt-llm/release:1.2.0
47+
48+
# Step 2: Dump hidden states from target model
49+
task_1:
50+
script: common/eagle3/dump_offline_data.sh
51+
args:
52+
- --input-data /scratchspace/data
53+
- --output-dir /scratchspace/offline_hidden_states
54+
- --max-seq-len 8192
55+
- --tp 8
56+
- --moe-ep 8
57+
environment:
58+
- HF_MODEL_CKPT: <<global_vars.hf_model>>
59+
slurm_config:
60+
_factory_: "slurm_factory"
61+
nodes: 1
62+
ntasks_per_node: 8
63+
gpus_per_node: 8
64+
container: nvcr.io/nvidia/tensorrt-llm/release:1.2.0
65+
66+
# Step 3: Train EAGLE3 draft head (offline, single task)
67+
task_2:
68+
script: common/eagle3/train_eagle.sh
69+
args:
70+
- --config modules/Model-Optimizer/modelopt_recipes/general/speculative_decoding/eagle3.yaml
71+
- model.model_name_or_path=<<global_vars.hf_model>>
72+
- data.offline_data_path=/scratchspace/offline_hidden_states
73+
- training.output_dir=/scratchspace/eagle3
74+
- training.training_seq_len=4096
75+
- training.disable_tqdm=true
76+
- training.ar_validate_steps=500000
77+
slurm_config:
78+
_factory_: "slurm_factory"
79+
nodes: 1
80+
ntasks_per_node: 1
81+
gpus_per_node: 8
82+
container: nvcr.io/nvidia/tensorrt-llm/release:1.2.0
83+
84+
# Step 4: Benchmark speculative decoding (VLLM backend)
85+
task_3:
86+
script: common/specdec_bench/quick_check.sh
87+
args:
88+
- --draft_model_dir /scratchspace/export
89+
- --draft_length 3
90+
- --output_length 4096
91+
- --engine VLLM
92+
- --tp_size 8
93+
- --ep_size 1
94+
- --speculative_algorithm EAGLE3
95+
- --mtbench /hf-local/HuggingFaceH4/mt_bench_prompts/raw/question.jsonl
96+
- --concurrency 1
97+
environment:
98+
- HF_MODEL_CKPT: <<global_vars.hf_model>>
99+
slurm_config:
100+
_factory_: "slurm_factory"
101+
nodes: 1
102+
ntasks_per_node: 1
103+
gpus_per_node: 8
104+
container: vllm/vllm-openai:latest

0 commit comments

Comments
 (0)