Commit c3eef71
committed
[OMNIML-4740] Add EAGLE3 offline pipeline YAML for moonshotai/Kimi-K2.5-DFlash
4-task pipeline adapted from Qwen/Qwen3-8B. The synth_support agent's
original draft (1b02102) referenced an output-named input path
(`Kimi-K2.5 DFlash`) on a TRT-LLM container that doesn't yet register
the KimiK25 architecture, plus had uv.lock pollution + sidecar in the
diff. This is the cleaned + cluster-validated version.
Key changes from the agent draft:
- Directory renamed `Kimi-K2.5 DFlash` → `Kimi-K2.5-DFlash` (Slurm tar
packaging breaks on spaces in job_name / path).
- `global_vars.hf_model` points at the *input* model `Kimi-K2.6`
(canonical stand-in for Kimi-K2.5 input per moonshotai), not the
pipeline's *output* `Kimi-K2.5-DFlash`.
- `task_0` switched from TRT-LLM (release:1.2.0 doesn't register
KimiK25ForConditionalGeneration as of 2026-05-14) to vLLM
(vllm-openai:latest, which loads via `--trust-remote-code`).
- vLLM-side knobs documented in-yaml: `--enforce-eager` (skip
inductor — vLLM container is missing torch/bin/ptxas);
`--gpu-memory-utilization 0.95` + `--max-model-len 4096` (Kimi
weights are 595 GB bf16 on 8×80 GB = 93% weight occupancy, default
0.9 left -1.1 GiB for KV cache); `VLLM_STARTUP_TIMEOUT=1800` env
(Kimi weight load is ~7.7 min, default 600s in query.sh wasn't
enough).
- `--data` switched to in-repo `synthetic_conversations_1k.jsonl`
(the canonical Speculative-Decoding-Prompt-Samples isn't on
cw-dfw; the in-repo dataset is the right portable input for
smoke-testing the pipeline).
Cluster-test evidence (cw-dfw, Slurm 11782946, experiment
cicd_1778864959, elapsed 1:02:11, exit 0):
$ SLURM_CLUSTER=cw_dfw uv run slurm.py \
--yaml '.../moonshotai/Kimi-K2.5-DFlash/hf_offline_eagle3.yaml' \
pipeline.task_1.skip=true \
pipeline.task_2.skip=true \
pipeline.task_3.skip=true \
--yes --detach
Loading weights took 461.45 seconds
Model loading took 71.44 GiB memory and 465.10 seconds
Map (num_proc=32): 100%|██████████| 100/100 [06:42<00:00, 4.02s/example]
Saved 10 shards to /scratchspace/data/train-{1..10}-00010.jsonl
Slurm: COMPLETED 01:02:11
Real assistant response verified end-to-end (Kimi correctly
answers the "bat and ball" CRT problem).
The previous draft (1b02102) is replaced; uv.lock churn +
VERIFICATION_COMMENT.txt removed.
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>1 parent e27f76f commit c3eef71
1 file changed
Lines changed: 116 additions & 0 deletions
Lines changed: 116 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
0 commit comments