|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +NVIDIA Model Optimizer (ModelOpt): open-source library for model optimization techniques including |
| 4 | +quantization, pruning, distillation, sparsity, and speculative decoding to accelerate inference. |
| 5 | +Primarily Python codebase with optional C++/CUDA extensions supporting PyTorch, ONNX, and Hugging Face/Megatron models. |
| 6 | + |
| 7 | +> If a `CLAUDE.local.md` file exists alongside this file, read and respect it — it contains |
| 8 | +> developer-specific overrides that supplement this shared guidance. |
| 9 | +
|
| 10 | +## Rules (Read First) |
| 11 | + |
| 12 | +**CRITICAL (YOU MUST):** |
| 13 | + |
| 14 | +- NVIDIA Apache 2.0 license header on ALL new Python/C++/CUDA files (see `LICENSE_HEADER`) |
| 15 | +- `git commit -s -S` (DCO sign-off + cryptographic signing required). Never attribute AI tools in |
| 16 | + sign-off line |
| 17 | +- `pre-commit` hooks run on commit — if files are modified by hooks, re-stage and commit again |
| 18 | +- PRs require CODEOWNERS review (auto-assigned based on `.github/CODEOWNERS`) |
| 19 | +- After rebasing, always re-run tests locally before pushing |
| 20 | +- All code must follow the security guidelines in `SECURITY.md` — violations are blocked as pre-merge errors |
| 21 | +- For contribution guidelines, commit conventions, and PR requirements, see `CONTRIBUTING.md` |
| 22 | + |
| 23 | +## Common Commands |
| 24 | + |
| 25 | +| Task | Command | |
| 26 | +|------|---------| |
| 27 | +| Install (editable + dev) | `pip install -e ".[dev]"` | |
| 28 | +| CPU unit tests | `python -m pytest tests/unit` | |
| 29 | +| GPU unit tests | `python -m pytest tests/gpu` | |
| 30 | +| Megatron GPU tests | `python -m pytest tests/gpu_megatron` | |
| 31 | +| TRT-LLM GPU tests | `python -m pytest tests/gpu_trtllm` | |
| 32 | +| Pattern match | `pytest tests/unit -k "test_quantize"` | |
| 33 | +| Lint + format (all files) | `pre-commit run --all-files` | |
| 34 | +| Lint (diff only) | `pre-commit run --from-ref origin/main --to-ref HEAD` | |
| 35 | +| Run via tox (CPU unit) | `tox -e py312-torch210-tf_latest-unit` | |
| 36 | +| Build docs | `tox -e build-docs` | |
| 37 | +| Build wheel | `tox -e build-wheel` | |
| 38 | + |
| 39 | +## Architecture |
| 40 | + |
| 41 | +ModelOpt is organized into three top-level namespaces: |
| 42 | + |
| 43 | +| Namespace | Path | Role | |
| 44 | +|-----------|------|------| |
| 45 | +| `modelopt.torch` | `modelopt/torch/` | Core PyTorch optimization library | |
| 46 | +| `modelopt.onnx` | `modelopt/onnx/` | ONNX model quantization and export | |
| 47 | +| `modelopt.deploy` | `modelopt/deploy/` | Deployment utilities for LLMs | |
| 48 | + |
| 49 | +### `modelopt.torch` Sub-packages |
| 50 | + |
| 51 | +| Sub-package | Path | Role | |
| 52 | +|-------------|------|------| |
| 53 | +| `opt` | `modelopt/torch/opt/` | Core optimization infrastructure (modes, config, state dicts) | |
| 54 | +| `quantization` | `modelopt/torch/quantization/` | PTQ, QAT, and quantization-aware algorithms | |
| 55 | +| `prune` | `modelopt/torch/prune/` | Structured and unstructured pruning | |
| 56 | +| `distill` | `modelopt/torch/distill/` | Knowledge distillation | |
| 57 | +| `sparsity` | `modelopt/torch/sparsity/` | Weight and activation sparsity | |
| 58 | +| `speculative` | `modelopt/torch/speculative/` | Speculative decoding (Medusa, EAGLE, etc.) | |
| 59 | +| `nas` | `modelopt/torch/nas/` | Neural architecture search | |
| 60 | +| `export` | `modelopt/torch/export/` | Checkpoint export for TRT-LLM / Megatron | |
| 61 | +| `peft` | `modelopt/torch/peft/` | QLoRA and PEFT integration | |
| 62 | +| `_deploy` | `modelopt/torch/_deploy/` | Internal deployment utilities | |
| 63 | +| `utils` | `modelopt/torch/utils/` | Shared utilities and plugin infrastructure | |
| 64 | + |
| 65 | +### Core Abstraction: Modes |
| 66 | + |
| 67 | +A **mode** is the unit of model optimization in ModelOpt. Each algorithm (quantization, pruning, |
| 68 | +etc.) is implemented as one or more modes. Modes are recorded in the model's `modelopt_state` so |
| 69 | +optimization workflows can be composed, saved, and restored. |
| 70 | + |
| 71 | +## Key Files |
| 72 | + |
| 73 | +| File | Role | |
| 74 | +|------|------| |
| 75 | +| `modelopt/torch/opt/mode.py` | Base class for all optimization modes | |
| 76 | +| `modelopt/torch/opt/config.py` | Configuration system for modes | |
| 77 | +| `modelopt/torch/opt/conversion.py` | `apply_mode()` / `restore()` entry points | |
| 78 | +| `modelopt/torch/quantization/__init__.py` | PTQ/QAT public API | |
| 79 | +| `modelopt/torch/export/unified_export_hf.py` | Unified HF checkpoint export | |
| 80 | +| `modelopt/torch/export/model_config_export.py` | TRT-LLM model config export | |
| 81 | +| `modelopt/deploy/llm/` | LLM deployment utilities | |
| 82 | +| `pyproject.toml` | Optional dependency groups (`[onnx]`, `[hf]`, `[all]`, `[dev]`); ruff, mypy, pytest, bandit, and coverage config | |
| 83 | +| `.pre-commit-config.yaml` | Pre-commit hooks (ruff, mypy, clang-format, license headers) | |
| 84 | +| `tox.ini` | Test environment definitions | |
| 85 | + |
| 86 | +## Design Patterns |
| 87 | + |
| 88 | +| Pattern | Key Points | |
| 89 | +|---------|------------| |
| 90 | +| **Mode composition** | Optimization algorithms are composed as sequences of modes, each recorded in `modelopt_state` | |
| 91 | +| **Plugin system** | Optional integrations (HuggingFace, Megatron, etc.) loaded lazily via `import_plugin()` | |
| 92 | +| **Optional dependencies** | Features gated by install extras (`[onnx]`, `[hf]`, `[all]`); avoid hard imports at module level | |
| 93 | +| **Config dataclasses** | Each mode has a typed config; use Pydantic or dataclass conventions | |
| 94 | +| **State dict** | Models carry `modelopt_state` for checkpoint save/restore across optimization steps | |
| 95 | + |
| 96 | +## CI / Testing |
| 97 | + |
| 98 | +| Layer | Location | Notes | |
| 99 | +|-------|----------|-------| |
| 100 | +| CPU unit tests | `tests/unit/` | Fast, no GPU needed; run in pre-merge CI | |
| 101 | +| GPU unit tests | `tests/gpu/` | Requires CUDA GPU | |
| 102 | +| Megatron GPU tests | `tests/gpu_megatron/` | Requires Megatron-Core + GPU | |
| 103 | +| TRT-LLM GPU tests | `tests/gpu_trtllm/` | Requires TensorRT-LLM + GPU | |
| 104 | +| Example/integration tests | `tests/examples/` | Integration tests for examples; see `tests/examples/README.md` | |
| 105 | +| Pre-commit / lint | `.pre-commit-config.yaml` | ruff, mypy, clang-format, license headers, bandit | |
| 106 | +| Coverage | `pyproject.toml` | 70% minimum on `modelopt/*` | |
0 commit comments