Commit 73be810
authored
vLLM fakequant: add recipe-based quantization support (#1233)
### What does this PR do?
Type of change: example update
This PR adds recipe-based quantization support to the vLLM fakequant
example.
### Testing
```
docker run --gpus all -it --shm-size=160GB --network host --rm --entrypoint bash -v <modelopt>:/home/modelopt vllm/vllm-openai:v0.15.0 -c "cd /home/modelopt && pip install . && pip install datasets && RECIPE_PATH=/home/modelopt/modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml python3 /home/modelopt/examples/vllm_serve/vllm_serve_fakequant.py Qwen/Qwen3-0.6B -tp 1 --served-model-name Qwen3-0.6B --host 0.0.0.0 --port 8001 --trust-remote-code --disable-custom-all-reduce --gpu-memory-utilization 0.8"
```
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ✅
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
- Did you write any new necessary tests?: N/A
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A
### Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added `RECIPE_PATH` environment variable support enabling users to
specify ModelOpt PTQ recipe YAML files for quantization configuration in
vLLM serving.
* **Documentation**
* Updated examples and documentation to support recipe-driven
quantization configuration, aligning export workflow with recipe-based
setup.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>1 parent b6c6ec3 commit 73be810
4 files changed
Lines changed: 30 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
65 | 66 | | |
66 | 67 | | |
67 | 68 | | |
68 | | - | |
| 69 | + | |
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
| |||
138 | 139 | | |
139 | 140 | | |
140 | 141 | | |
| 142 | + | |
141 | 143 | | |
142 | 144 | | |
143 | 145 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
141 | 142 | | |
142 | 143 | | |
143 | 144 | | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
152 | 162 | | |
153 | | - | |
154 | | - | |
155 | | - | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
156 | 166 | | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
161 | 171 | | |
162 | 172 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
81 | 82 | | |
82 | 83 | | |
83 | 84 | | |
| |||
0 commit comments