Skip to content

Commit 7612abe

Browse files
committed
squash: refactor
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
1 parent 5e43b2a commit 7612abe

File tree

17 files changed

+581
-581
lines changed

17 files changed

+581
-581
lines changed

examples/speculative_decoding/README.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,17 @@ To add a system prompt, use the `--system_prompt <system_prompt_text>` argument.
242242

243243
For large scale data generation, please see [SLURM prepare data](SLURM_prepare_data.md) for SLURM support.
244244

245+
### Configuring Draft Model
246+
247+
For EAGLE‑1 and EAGLE‑3 we provide a [default model architecture config](https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt/torch/speculative/config.py#L37) in ModelOpt. You can override default settings by providing an additional JSON dict. E.g. To use 2-layer eagle with 8192 intermediate size for MLP, set `eagle_config.json` to:
248+
249+
```json
250+
{
251+
"num_hidden_layers": 2,
252+
"intermediate_size":8192
253+
}
254+
```
255+
245256
### Draft Vocabulary Compression
246257

247258
We can optionally use smaller vocab size for the draft model for faster training and inference. E.g. Llama3.2-1B has a vocab size of 128256. In this example, we construct a draft vocab mapping of size 32k by finding the most commonly appeared vocabs in our training set:
@@ -252,15 +263,7 @@ python scripts/calibrate_draft_vocab.py --model meta-llama/Llama-3.2-1B-Instruct
252263

253264
This will produce a `d2t.pt` file in `save_dir`, which is the mapping from draft token to target token. During inference, draft tokens can be mapped back to target tokens by `target_token = draft_token + d2t[draft_token]`.
254265

255-
### Configuring Draft Model
256-
257-
For EAGLE‑1 and EAGLE‑3 we provide a [default model architecture config](https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt/torch/speculative/config.py#L37) in ModelOpt. You can override default settings by providing an additional JSON dict. In this example, we override `draft_vocab_size` in `eagle_config.json`:
258-
259-
```json
260-
{
261-
"draft_vocab_size": 32000
262-
}
263-
```
266+
Then, simply include the `--draft_vocab_cache <path_to_d2t.pt>` argument when starting training with `./launch_train.sh`. The draft model will use this provided vocab table during training and export.
264267

265268
### Interact with `modelopt.torch.speculative`
266269

examples/speculative_decoding/collect_hidden_states/run_hf_compute_hiddens.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,5 +19,5 @@
1919

2020
python3 collect_hidden_states/compute_hidden_states_hf.py \
2121
--model meta-llama/Llama-3.2-1B-Instruct \
22-
--input-file synthetic_conversations/daring-anteater.jsonl \
22+
--input-data synthetic_conversations/daring-anteater.jsonl \
2323
--output-dir /mnt/md0/eagle-hidden-states/llama1b/daring_anteater/

examples/speculative_decoding/collect_hidden_states/run_hf_compute_hiddens_dp.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ split -n l/$DP_SIZE --numeric-suffixes=0 -d --additional-suffix=.jsonl $INPUT_FI
3030

3131
for i in $(seq 0 $((DP_SIZE-1)))
3232
do
33-
CUDA_VISIBLE_DEVICES=$i python3 collect_hidden_states/compute_hidden_states_hf.py --model meta-llama/Llama-3.2-1B-Instruct --input-file /tmp/part-0${i}.jsonl --output-dir $OUTPUT_DIR &
33+
CUDA_VISIBLE_DEVICES=$i python3 collect_hidden_states/compute_hidden_states_hf.py --model meta-llama/Llama-3.2-1B-Instruct --input-data /tmp/part-0${i}.jsonl --output-dir $OUTPUT_DIR &
3434
done
3535
wait
3636

examples/speculative_decoding/collect_hidden_states/run_trtllm_compute_hiddens.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,6 @@
2020
export TLLM_LOG_LEVEL="error";
2121
python3 collect_hidden_states/compute_hidden_states_trtllm.py \
2222
--model meta-llama/Llama-3.2-1B-Instruct \
23-
--input-file synthetic_conversations/daring-anteater.jsonl \
23+
--input-data synthetic_conversations/daring-anteater.jsonl \
2424
--output-dir /mnt/md0/eagle-hidden-states/llama1b/daring_anteater/
2525

examples/speculative_decoding/collect_hidden_states/run_trtllm_compute_hiddens_dp.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ split -n l/$DP_SIZE --numeric-suffixes=0 -d --additional-suffix=.jsonl $INPUT_FI
3333
for i in $(seq 0 $((DP_SIZE-1)))
3434
do
3535

36-
export CUDA_VISIBLE_DEVICES=$i; python3 collect_hidden_states/compute_hidden_states_trtllm.py --model $MODEL --input-file /tmp/part-0${i}.jsonl --output-dir $OUTPUT_DIR --dp-rank $i &
36+
export CUDA_VISIBLE_DEVICES=$i; python3 collect_hidden_states/compute_hidden_states_trtllm.py --model $MODEL --input-data /tmp/part-0${i}.jsonl --output-dir $OUTPUT_DIR --dp-rank $i &
3737

3838
done
3939
wait

0 commit comments

Comments
 (0)