You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,7 @@ NVIDIA Model Optimizer Changelog (Linux)
8
8
9
9
- User does not need to manually register MOE modules to cover experts calibration coverage in PTQ workflow.
10
10
- ``hf_ptq.py`` now saves the quantization summary and moe expert token count table to the export directory.
11
+
- Add ``--moe_calib_experts_ratio`` flag in ``hf_ptq.py`` to specify the ratio of experts to calibrate during forward pass to improve expert coverage during calibration. Default to all the experts.
11
12
- Add sparse attention optimization for transformer models (``modelopt.torch.sparsity.attention_sparsity``). This reduces computational cost by skipping attention computation. Supports calibration for threshold selection on HuggingFace models. See `examples/llm_sparsity/attention_sparsity/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_sparsity/attention_sparsity>`_ for usage.
12
13
- Add support for rotating the input before quantization for RHT.
The [Nemotron-Pretraining-SFT-v1](https://huggingface.co/datasets/nvidia/Nemotron-Pretraining-SFT-v1) dataset is huge, so it will take a while to download and tokenize. You can also split the large `.jsonl` into multiple files (e.g. 10M samples per file using `split -l 10000000 -d --additional-suffix=.jsonl <file>.jsonl <file>_part`) and tokenize them parallelly.
132
+
To quickly test the script, you can try the [nvidia/Nemotron-Pretraining-Dataset-sample](https://huggingface.co/datasets/nvidia/Nemotron-Pretraining-Dataset-sample) dataset.
133
+
131
134
If you skip `--hf_name`, it will download and tokenize all subsets for the dataset.
132
135
If you skip `--hf_split`, it will download and tokenize all splits for the subset.
133
136
If you skip `--hf_max_samples_per_split`, it will download and tokenize all samples for the split.
### Running [SPEED-Bench](https://huggingface.co/datasets/nvidia/SPEED-Bench) on Llama 3.3 70B + Eagle 3
66
+
67
+
1. Install the requirements file using `pip install -r requirements_speed.txt`
68
+
69
+
2. Prepare the data using the provided script:
70
+
71
+
```bash
72
+
python3 prepare_data.py --dataset speed --config all
73
+
```
74
+
75
+
The data will be saved to `data/` directory, each config type (qualitative, throughput_1k, ...) to each own directory.
76
+
77
+
#### License
78
+
79
+
GOVERNING TERMS: This dataset is governed by the NVIDIA Evaluation Dataset License Agreement.
80
+
81
+
ADDITIONAL INFORMATION: MIT for bigcode/humanevalpack, RUCAIBox/MMATH, RUCAIBox/BAMBOO and EQ-Bench. Apache 2.0 for Writing Bench and Spec-Bench. CC BY 4.0 for FBK-MT/MCIF. MIT and Apache 2.0 for tianyang/repobench_python_v1.1, JetBrains-Research/lca-project-level-code-completion and tianyang/repobench_java_v1.1.
82
+
83
+
NOTICE: For each dataset a user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose. The `prepare_data.py` script automatically fetches data from all the source datasets.
84
+
85
+
Additional details are in [HuggingFace dataset repository](https://huggingface.co/datasets/nvidia/SPEED-Bench).
0 commit comments