Skip to content

Commit 5f028d0

Browse files
authored
Merge branch 'main' into shengliangx/normalize-yaml-ext
2 parents 8e7ddb9 + b6c6ec3 commit 5f028d0

52 files changed

Lines changed: 2155 additions & 825 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/skills/ptq/SKILL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,9 +124,9 @@ Report the path and size to the user.
124124

125125
## Common Pitfalls
126126

127-
- **Transformers version**: Newer models (e.g., Devstral/ministral3) may require a transformers version not yet in the container. Check `config.json` for `transformers_version` and upgrade if needed. Install ModelOpt first, then upgrade transformers **with** deps (not `--no-deps`) to pull compatible `huggingface_hub`
127+
- **Transformers version**: New models may need a newer version of transformers than what's installed. Check `config.json` for `transformers_version`. In containers, beware of `PIP_CONSTRAINT` blocking upgrades — see `references/slurm-setup-ptq.md` for workarounds
128128
- **Gated datasets**: Some calibration datasets require HF authentication. Ensure `HF_TOKEN` is set in the job environment, or use `--dataset cnn_dailymail` as a non-gated alternative
129-
- **NFS root_squash + Docker**: Docker runs as root, but NFS squashes root to `nobody`. Use `docker run --user $(id -u):$(id -g)`, or `chmod -R a+rwX` on needed directories as a fallback. See `skills/common/slurm-setup.md` section 5
129+
- **NFS root_squash + Docker**: See `skills/common/slurm-setup.md` section 5
130130

131131
## References
132132

.claude/skills/ptq/references/slurm-setup-ptq.md

Lines changed: 36 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,29 +7,54 @@ monitoring), see `skills/common/slurm-setup.md`.
77

88
## 1. Container
99

10-
Get the recommended image version from `examples/llm_ptq/README.md`, then look for a `.sqsh` file in the workspace and common sibling directories:
10+
Get the recommended image version from `examples/llm_ptq/README.md`, then look for an existing `.sqsh` file:
1111

1212
```bash
1313
ls *.sqsh ../*.sqsh ~/containers/*.sqsh 2>/dev/null
1414
```
1515

16-
If you find a `.sqsh` but aren't sure of its version, check it:
16+
**If a `.sqsh` exists**, use it directly with `--container-image=<path>`. Skip import.
17+
18+
**If no `.sqsh` exists**, import with enroot (caches for subsequent smoke tests and reruns):
1719

1820
```bash
19-
srun --container-image=<path/to/container.sqsh> --ntasks=1 bash -c \
20-
"pip show tensorrt-llm 2>/dev/null | grep Version || cat /VERSION 2>/dev/null || echo unknown"
21+
export ENROOT_CACHE_PATH=/path/to/writable/enroot-cache
22+
export ENROOT_DATA_PATH=/path/to/writable/enroot-data
23+
mkdir -p "$ENROOT_CACHE_PATH" "$ENROOT_DATA_PATH"
24+
enroot import --output /path/to/container.sqsh docker://nvcr.io#nvidia/tensorrt-llm/release:<version>
2125
```
2226

23-
If no `.sqsh` exists, import it with enroot. Set writable cache paths first — the default `/raid/containers` is often not writable:
27+
If enroot import fails (e.g., permission errors on lustre), use pyxis inline pull as fallback — pass the NGC URI directly to `--container-image="nvcr.io/nvidia/tensorrt-llm/release:<version>"`. Note this re-pulls on every job.
28+
29+
### Container dependency pitfalls
30+
31+
**New models may need newer transformers** than what's in the container:
2432

2533
```bash
26-
export ENROOT_CACHE_PATH=/path/to/writable/enroot-cache
27-
export ENROOT_DATA_PATH=/path/to/writable/enroot-data
28-
export TMPDIR=/path/to/writable/tmp
29-
mkdir -p "$ENROOT_CACHE_PATH" "$ENROOT_DATA_PATH" "$TMPDIR"
34+
pip install -U transformers
35+
```
36+
37+
For unlisted models that need unreleased transformers (e.g., from git), see `references/unsupported-models.md` Step A.
38+
39+
**Prefer `PYTHONPATH`** to use the synced ModelOpt source instead of installing inside the container — this avoids risking dependency conflicts (e.g., `pip install -U nvidia-modelopt[hf]` can upgrade PyTorch and break other packages):
40+
41+
```bash
42+
export PYTHONPATH=/path/to/Model-Optimizer:$PYTHONPATH
43+
```
44+
45+
If `PYTHONPATH` doesn't work due to missing compiled extensions, fall back to `pip install -e ".[hf]" --no-build-isolation` (run from the Model-Optimizer repo root).
46+
47+
**Watch for pip dependency conflicts** — NGC containers set `PIP_CONSTRAINT` to pin versions, causing `ResolutionImpossible` errors. Unset it first so pip can resolve freely:
48+
49+
```bash
50+
unset PIP_CONSTRAINT
51+
pip install -U transformers # now upgrades and resolves with new deps included
52+
```
3053

31-
enroot import --output /path/to/container.sqsh \
32-
docker://nvcr.io#nvidia/tensorrt-llm/release:<version>
54+
If that still conflicts, fall back to `--no-deps` (skips new deps — may need to add missing ones manually):
55+
56+
```bash
57+
pip install -U transformers --no-deps
3358
```
3459

3560
---
@@ -68,10 +93,3 @@ This catches script errors cheaply before using GPU quota on a real run.
6893
See `skills/common/slurm-setup.md` section 2 for the smoke test partition pattern.
6994

7095
Only submit the full calibration job after the smoke test exits cleanly.
71-
72-
---
73-
74-
## 4. PTQ-Specific Notes
75-
76-
- **Gated datasets**: Some calibration datasets (e.g., `nvidia/Nemotron-Post-Training-Dataset-v2`) require HF authentication. Set `HF_TOKEN` in the job environment, or use `--dataset cnn_dailymail` to use a non-gated alternative.
77-
- **NFS permissions**: Docker + NFS root_squash causes `PermissionError` on output/cache dirs. See `skills/common/slurm-setup.md` section 5 for fixes.

.claude/skills/ptq/references/unsupported-models.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,11 @@ After download, inspect the model files on the target machine (use `remote_run`
1515

1616
Write custom scripts locally (in `./workspaces/<model>/scripts/`), then sync to remote before running.
1717

18-
**Then check `config.json`** (on the target machine):
18+
**Check transformers compatibility** (on the target machine):
19+
20+
First, if README or `config.json` specifies a required transformers version, check if installed version satisfies it. If not, upgrade: `pip install -U "transformers>=<required_version>"`.
21+
22+
Then try loading:
1923

2024
```bash
2125
python -c "
@@ -40,16 +44,14 @@ print(type(cfg).__name__)
4044

4145
Read the modeling file and proceed to Step B.
4246

43-
- **Raises `ValueError` / `OSError` (unknown architecture)** → not in the installed transformers. Determine why:
44-
45-
1. **Check the transformers `main` branch** (not yet released):
47+
- **Raises `ValueError` / `OSError` (unknown architecture)** → not in the installed transformers. Try `pip install -U transformers` first. If still not found, check the `main` branch:
4648

4749
```bash
4850
git clone --depth 1 https://github.com/huggingface/transformers.git /tmp/transformers-main --quiet
4951
grep -r "class <ArchName>" /tmp/transformers-main/src/transformers/models/
5052
```
5153

52-
- **Found**install with deps: `pip install /tmp/transformers-main`, then re-run `AutoConfig.from_pretrained()`. **Important**: if ModelOpt is already installed, its `[hf]` extras may have pinned an older transformers. Install ModelOpt first, then upgrade transformers **after** (with deps, not `--no-deps`) so compatible `huggingface_hub` and other transitive deps are pulled in.
54+
- **Found**`pip install /tmp/transformers-main`, then re-run `AutoConfig`.
5355
- **Not found** → ask the user: *"The checkpoint uses `<ArchName>` which isn't in released or main-branch transformers. Do you have a private fork or custom modeling code?"*
5456

5557
- **No `config.json`** → not a standard HF checkpoint. List the directory for README or `.py` files. If nothing useful, ask the user for the modeling code.
@@ -131,13 +133,15 @@ class QuantCustomModule(OriginalModule):
131133
132134
## Pattern 2: MoE Models
133135
134-
**Standard MoE** (per-expert `nn.Linear` in a `ModuleList` with `gate` + `experts`): Auto-detected by `register_sparse_moe_on_the_fly`. No custom code needed — amax sync and calibration coverage are handled automatically.
136+
**Most MoE models are auto-detected** — ModelOpt handles two common patterns automatically:
137+
138+
- **transformers >= 5.0**: Unified fused experts (`gate_up_proj` + `down_proj` 3D tensors) → auto-detected by `register_fused_experts_on_the_fly`, handled by `_QuantFusedExperts`. Covers Mixtral, Qwen, DeepSeek, Jamba, OlMoE, etc.
139+
- **transformers < 5.0**: Sequential per-expert `nn.Linear` with `gate` + `experts` → auto-detected by `register_sparse_moe_on_the_fly`.
135140
136-
**Custom MoE** requires patching. Read the model source to understand how expert weights are stored and computed, then find the closest pattern in the plugin (`modelopt/torch/quantization/plugins/huggingface.py`):
141+
**Custom MoE** (non-standard layout not matching auto-detection) requires patching. Find the closest pattern in the plugin (`modelopt/torch/quantization/plugins/huggingface.py`):
137142
138143
| MoE design | Strategy | Plugin example |
139144
| --- | --- | --- |
140-
| Fused weights + per-expert dispatch loop | Expand to per-expert `nn.Linear` | `_QuantQwen35MoeExperts` |
141145
| Fused weights + `torch.bmm` | Add `TensorQuantizer` around bmm | `_QuantLlama4TextExperts` |
142146
| Fused weights + functional interception | Intercept matmul ops | `_QuantGptOssExperts` |
143147
| Fused 2D weights (experts stacked in rows) | Two-level expansion | `_QuantDbrxExpertGLU` |
@@ -343,3 +347,4 @@ tokenizer.save_pretrained(output_path)
343347
- **Check quantizer summary**: `mtq.print_quant_summary(model)` shows which quantizers are enabled/disabled
344348
- **Inspect dtypes**: After loading, iterate `model.named_parameters()` and check for unexpected FP8 tensors
345349
- **Watch for silent disabling**: A misconfigured wildcard pattern can silently disable quantizers — always verify the summary
350+
- **Read pip errors carefully**: `ResolutionImpossible` means dependency conflict (try `--no-deps`), NOT network failure. Check for `Connection refused`/`Name resolution failed` before concluding network is down

.github/CODEOWNERS

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,6 @@ modelopt_recipes @NVIDIA/modelopt-recipes-codeowners
5555
/examples/vlm_ptq @NVIDIA/modelopt-examples-vlm-codeowners
5656
/examples/vllm_serve @NVIDIA/modelopt-examples-llm_ptq-codeowners
5757
/examples/windows @NVIDIA/modelopt-windows-codeowners
58+
59+
# Requirements files are owned by the setup team regardless of location
60+
requirements*.txt @NVIDIA/modelopt-setup-codeowners

CONTRIBUTING.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
Thanks for your interest in contributing to Model Optimizer (ModelOpt)!
44

5+
> [!NOTE]
6+
> Any contributions to this repository are only accepted under the Apache 2.0 license.
7+
58
## 🛠️ Setting up your environment
69

710
Ensure that Model Optimizer (ModelOpt) is installed in editable mode and that all `dev` optional requirements are installed:
@@ -64,11 +67,18 @@ If you are an external contributor, seek guidance from `@NVIDIA/modelopt-setup-c
6467
1. A reference link (with commit hash) to the source from which the code was copied.
6568
1. The original repository's Copyright / License.
6669
1. The NVIDIA Apache 2.0 Copyright / License header.
70+
- **Update `SPDX-License-Identifier`:** If the third-party code uses a different license than Apache 2.0, update the `SPDX-License-Identifier` in the NVIDIA header to reflect both licenses using SPDX expression syntax. For example, for MIT-licensed source code:
71+
72+
```python
73+
# SPDX-License-Identifier: Apache-2.0 AND MIT
74+
```
6775

68-
See [`modelopt/torch/speculative/eagle/utils.py`](./modelopt/torch/speculative/eagle/utils.py)
69-
for an example of the correct license header format.
76+
If the third-party code is also Apache 2.0, no change is needed (`SPDX-License-Identifier: Apache-2.0` remains correct).
77+
- **Update `LICENSE`:** Add the third-party copyright holder to the appropriate license section in the [`LICENSE`](./LICENSE) file under *Third-Party Software Notices*. If the third-party license is not already listed there, add a new section with the full license text.
7078
- **Exclude from license pre-commit hook:** Exclude copied files from the license pre-commit hook so it doesn't auto-add the NVIDIA Apache 2.0 license on top of the file. Add the file path to the `exclude` list in the `insert-license` hook in [`.pre-commit-config.yaml`](./.pre-commit-config.yaml).
7179

80+
See [`modelopt/torch/quantization/utils/calib_utils.py`](./modelopt/torch/quantization/utils/calib_utils.py) for an example of the correct license header format.
81+
7282
## 📝 Writing tests
7383

7484
We use [pytest](https://docs.pytest.org/) for all tests. For any new features / examples, make sure to add tests and that the coverage check in your PR passes. The tests are organized into the following directories:

LICENSE

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,3 +199,119 @@
199199
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200200
See the License for the specific language governing permissions and
201201
limitations under the License.
202+
203+
================================================================================
204+
Third-Party Software Notices
205+
================================================================================
206+
207+
Portions of this repository contain code adapted from third-party sources.
208+
Each component is subject to the terms of its respective license as set out
209+
below.
210+
211+
--------------------------------------------------------------------------------
212+
Apache License, Version 2.0
213+
--------------------------------------------------------------------------------
214+
215+
Portions of this repository were adapted from code originally authored by
216+
the following copyright holders, licensed under the Apache License, Version 2.0
217+
(see full license text above):
218+
219+
Copyright 2021 The HuggingFace Inc. team
220+
Copyright 2022 The HuggingFace Team
221+
Copyright 2022, Lefebvre Dalloz Services
222+
Copyright 2022 EleutherAI and the HuggingFace Inc. team
223+
Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
224+
Copyright (c) 2024 Heming Xia
225+
Copyright 2025 The Qwen team, Alibaba Group and the HuggingFace Inc. team
226+
227+
Licensed under the Apache License, Version 2.0 (the "License"); you may not
228+
use these files except in compliance with the License. You may obtain a copy
229+
of the License at http://www.apache.org/licenses/LICENSE-2.0
230+
231+
Unless required by applicable law or agreed to in writing, software distributed
232+
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
233+
CONDITIONS OF ANY KIND, either express or implied. See the License for the
234+
specific language governing permissions and limitations under the License.
235+
236+
--------------------------------------------------------------------------------
237+
MIT License
238+
--------------------------------------------------------------------------------
239+
240+
Portions of this repository were adapted from code originally authored by
241+
the following copyright holders, licensed under the MIT License:
242+
243+
Copyright (c) Andrei Panferov
244+
Copyright (c) Microsoft Corporation
245+
Copyright (c) 2020 EleutherAI
246+
Copyright (c) 2020 Dan Hendrycks
247+
Copyright (c) 2023 Deep Cognition and Language Research (DeCLaRe) Lab
248+
Copyright (c) 2023 DeepSeek
249+
250+
Permission is hereby granted, free of charge, to any person obtaining a copy
251+
of this software and associated documentation files (the "Software"), to deal
252+
in the Software without restriction, including without limitation the rights
253+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
254+
copies of the Software, and to permit persons to whom the Software is
255+
furnished to do so, subject to the following conditions:
256+
257+
The above copyright notice and this permission notice shall be included in all
258+
copies or substantial portions of the Software.
259+
260+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
261+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
262+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
263+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
264+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
265+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
266+
SOFTWARE.
267+
268+
--------------------------------------------------------------------------------
269+
BSD 3-Clause License
270+
--------------------------------------------------------------------------------
271+
272+
Portions of this repository were adapted from code originally authored by
273+
the following copyright holders, licensed under the BSD 3-Clause License:
274+
275+
Copyright (c) 2016- Facebook, Inc (Adam Paszke)
276+
Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
277+
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
278+
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
279+
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
280+
Copyright (c) 2011-2013 NYU (Clement Farabet)
281+
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
282+
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
283+
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
284+
Copyright (c) 2016-present, Facebook Inc.
285+
Copyright (c) 2016 Facebook Inc.
286+
Copyright (c) 2015 Google Inc.
287+
Copyright (c) 2015 Yangqing Jia
288+
Copyright 2019-2020 Kakao Brain
289+
Copyright (c) 2022 Cruise LLC.
290+
Copyright (c) 2024 Tri Dao.
291+
Copyright (c) 2021, 2023-2024 Arm Limited and/or its affiliates
292+
293+
Redistribution and use in source and binary forms, with or without
294+
modification, are permitted provided that the following conditions are met:
295+
296+
1. Redistributions of source code must retain the above copyright notice,
297+
this list of conditions and the following disclaimer.
298+
299+
2. Redistributions in binary form must reproduce the above copyright notice,
300+
this list of conditions and the following disclaimer in the documentation
301+
and/or other materials provided with the distribution.
302+
303+
3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories
304+
America and IDIAP Research Institute nor the names of its contributors may
305+
be used to endorse or promote products derived from this software without
306+
specific prior written permission.
307+
308+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
309+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
310+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
311+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
312+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
313+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
314+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
315+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
316+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
317+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

0 commit comments

Comments
 (0)