You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -244,13 +245,13 @@ For large scale data generation, please see [SLURM prepare data](SLURM_prepare_d
244
245
245
246
### Configuring Draft Model
246
247
247
-
For EAGLE‑1 and EAGLE‑3 we provide a [default model architecture config](https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt/torch/speculative/config.py#L37) in ModelOpt. You can override default settings by providing an additional JSON dict. E.g. To use 2-layer eagle with 8192 intermediate size for MLP, set `eagle_config.json` to:
248
+
For EAGLE‑1 and EAGLE‑3 we provide a [default model architecture config](https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt/torch/speculative/config.py#L37) in ModelOpt. You can override default settings via `eagle.eagle_architecture_config` in the YAML. E.g. to use a 2-layer EAGLE head with 8192 intermediate size:
This will produce a `d2t.pt` file in `save_dir`, which is the mapping from draft token to target token. During inference, draft tokens can be mapped back to target tokens by `target_token = draft_token + d2t[draft_token]`.
265
266
266
-
Then, simply set `{"draft_vocab_size":32000}` in `eagle_config.json`and include `--draft_vocab_cache <path_to_d2t.pt>`when running `./launch_train.sh`. The draft model will use this provided vocab table during training and export.
267
+
Then, set `eagle_architecture_config.draft_vocab_size: 32000`and `data.draft_vocab_cache: <path_to_d2t.pt>`in your YAML. The draft model will use this provided vocab table during training and export.
267
268
268
269
### Interact with `modelopt.torch.speculative`
269
270
270
-
`main.py` provides an example for converting a HF base model for speculative decoding and training it. It consists of a few simple steps:
271
-
First, load the base model and tokenizer from Hugging Face:
272
-
273
-
```python
274
-
model = transformers.AutoModelForCausalLM.from_pretrained(
275
-
"<path to your pretrained model>"
276
-
)
277
-
```
278
-
279
-
Then, load default eagle config and make necessary overwrites:
271
+
`main.py` provides a complete example for converting a HF base model for speculative decoding and training it. The core steps are loading the base model, converting it with an eagle config dict, and training with HF Trainer:
0 commit comments