Skip to content

Commit 82f1d21

Browse files
Add Security and IP related contributing guide and configure coderabbit to catch such issues (#935)
### What does this PR do? - Add Security related coding practices in `SECURITY.md` and merge with `2_security.rst` - Update `CONTRIBUTING.md` for instructions to follow if copying code from other repositories - Update PR template - Cleanup dependency files - New API `mto.load_modelopt_state` doing the insecure `torch.load(f, weights_only=False)` instead of doing it separately everywhere. This also allows us to later improve the input validation for `modelopt_state_path` or use safer alternatives to `torch.load` ### Testing <!-- Mention how have you tested your change if applicable. --> N/A ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=True)`, `pickle`, etc.). - Is this change backward compatible?: ✅ <!--- If ❌, explain why. --> - If you copied code from any other source, did you follow IP policy in [CONTRIBUTING.md](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md#-copying-code-from-other-sources)?: ✅ <!--- Mandatory --> - Did you write any new necessary tests?: NA <!--- Mandatory for new features or examples. --> - Did you add or update any necessary documentation and update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: NA <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Expanded and reorganized security guidance and contributor procedures; updated PR template and several READMEs with clearer security, submission, and installation instructions * Replaced an older security document with an enhanced, centralized security guidance * **Chores** * Adjusted example dependency lists and optional extras (adds, removals, and version constraints) * Enabled automated incremental reviews, added pre-merge security checks, and introduced a knowledge-base of coding/security guidelines <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
1 parent f26b9c3 commit 82f1d21

22 files changed

Lines changed: 285 additions & 156 deletions

File tree

.coderabbit.yaml

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,35 @@ reviews:
44
profile: chill
55
collapse_walkthrough: true
66
poem: false
7+
path_instructions:
8+
- path: "modelopt/**/*.py"
9+
instructions: &security_instructions |
10+
Review all modelopt package and examples Python changes against the security coding practices in
11+
SECURITY.md. Flag any of the following as CRITICAL security issues,
12+
request changes, and fail the check if ANY are present:
13+
1. torch.load(..., weights_only=False) with no inline comment justifying why it is safe
14+
(e.g. confirming the file is internally-generated and not user-supplied).
15+
2. numpy.load(..., allow_pickle=True) with no inline comment justifying why it is safe.
16+
Should expose allow_pickle as a caller-configurable parameter defaulting to False, not hardcode True.
17+
3. trust_remote_code=True hardcoded for transformers model or tokenizer loading.
18+
Code should expose it as a caller-configurable parameter defaulting to False, not hardcode True.
19+
4. eval() or exec() on any input that could originate from outside the process.
20+
5. Any use of "# nosec" comments to bypass Bandit security checks is not allowed.
21+
If a security-sensitive pattern is genuinely necessary, the PR must be reviewed and approved
22+
by @NVIDIA/modelopt-setup-codeowners with an explicit justification in the PR description.
23+
- path: "examples/**/*.py"
24+
instructions: *security_instructions
725
auto_review:
8-
auto_incremental_review: false
26+
auto_incremental_review: true
927
drafts: false
1028
base_branches: ["main", "release/.*", "feature/.*"]
29+
pre_merge_checks:
30+
custom_checks:
31+
- name: "Security anti-patterns"
32+
mode: "error"
33+
instructions: *security_instructions
34+
knowledge_base:
35+
code_guidelines:
36+
filePatterns:
37+
- "CONTRIBUTING.md"
38+
- "SECURITY.md"

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,28 @@
1-
## What does this PR do?
1+
### What does this PR do?
22

3-
**Type of change:** ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. -->
3+
Type of change: ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. -->
44

5-
**Overview:** ?
5+
<!-- Details about the change. -->
66

7-
## Usage
8-
<!-- You can potentially add a usage example below. -->
7+
### Usage
98

109
```python
1110
# Add a code snippet demonstrating how to use this
1211
```
1312

14-
## Testing
13+
### Testing
1514
<!-- Mention how have you tested your change if applicable. -->
1615

17-
## Before your PR is "*Ready for review*"
18-
<!-- If you haven't finished some of the above items you can still open `Draft` PR. -->
16+
### Before your PR is "*Ready for review*"
1917

20-
- **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed.
21-
- **Is this change backward compatible?**: Yes/No <!--- If No, explain why. -->
22-
- **Did you write any new necessary tests?**: Yes/No
23-
- **Did you add or update any necessary documentation?**: Yes/No
24-
- **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. -->
18+
Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`).
2519

26-
## Additional Information
20+
Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, using `torch.load(..., weights_only=True)`, avoiding `pickle`, etc.).
21+
22+
- Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain why. -->
23+
- If you copied code from any other source, did you follow IP policy in [CONTRIBUTING.md](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md#-copying-code-from-other-sources)?: ✅ / ❌ / N/A <!--- Mandatory -->
24+
- Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory for new features or examples. -->
25+
- Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. -->
26+
27+
### Additional Information
2728
<!-- E.g. related issue. -->

CONTRIBUTING.md

Lines changed: 32 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -39,36 +39,41 @@ To run the pre-commit hooks without committing, use:
3939
pre-commit run --all-files
4040
```
4141

42-
## 📝 Writing tests
42+
## 🔒 Security coding practices
4343

44-
We use [pytest](https://docs.pytest.org/) for all tests. The tests are organized into the following directories:
44+
All contributors must follow the security coding practices documented in *Security Coding Practices for
45+
Contributors* section of [SECURITY.md](./SECURITY.md#security-coding-practices-for-contributors) page.
4546

46-
- `tests/unit`: Fast cpu-based unit tests for the core ModelOpt library. They should not take more than a few seconds to run.
47-
- `tests/gpu`: Fast GPU-based unit tests for the core ModelOpt library. In most cases, they should not take more than a few seconds to run.
48-
- `tests/examples`: Integration tests for ModelOpt examples. They should not take more than a few minutes to run. Please refer to [example test README](./tests/examples/README.md) for more details.
47+
Any security-sensitive exception requires review and approval from `@NVIDIA/modelopt-setup-codeowners`.
4948

50-
Please refer to [tox.ini](./tox.ini) for more details on how to run the tests and their dependencies.
49+
## 📋 Copying code from other sources
5150

52-
### Code Coverage
51+
The utilization of third-party code requires authorization via the Open Source Review Board (OSRB) team and needs to follow proper guidance on contributing code.
5352

54-
For any new features / examples, make sure to they are covered by the tests and that the Codecov coverage check in your PR passes.
53+
If you are an external contributor, seek guidance from `@NVIDIA/modelopt-setup-codeowners` for next steps. For internal contributors, follow the steps below:
5554

56-
## Submitting your code
55+
- **File NVBug for use of open-source code:**
56+
Clone NVBug 2885977 and add your use case. Copying code from permissive licensed repositories (e.g. MIT, Apache 2) is generally self-checkout but for other licenses, it is necessary to get expert guidance before merging your PR.
57+
- **License header format:** The file which has code copied from another third-party GitHub repository should have the following in order:
58+
1. A reference link (with commit hash) to the source from which the code was copied.
59+
1. The original repository's Copyright / License.
60+
1. The NVIDIA Apache 2.0 Copyright / License header.
5761

58-
- If you are an external contributor, create a fork of the repository.
59-
- Rebase (not merge) your code to the most recent commit of the `main` branch. We want to ensure a linear history;
60-
see [Merge vs Rebase](https://www.atlassian.com/git/tutorials/merging-vs-rebasing). Remember to test again locally after rebasing to catch any new issues before pushing to your PR.
62+
See [`modelopt/torch/speculative/eagle/utils.py`](./modelopt/torch/speculative/eagle/utils.py)
63+
for an example of the correct license header format.
64+
- **Exclude from license pre-commit hook:** Exclude copied files from the license pre-commit hook so it doesn't auto-add the NVIDIA Apache 2.0 license on top of the file. Add the file path to the `exclude` list in the `insert-license` hook in [`.pre-commit-config.yaml`](./.pre-commit-config.yaml).
6165

62-
```bash
63-
git pull
64-
git rebase origin/main
65-
git push origin <branch> --force-with-lease
66-
```
66+
## 📝 Writing tests
6767

68-
- When pushing the rebased (or any) branch, use `git push --force-with-lease` instead of `git push --force`.
69-
- Submit a pull request and let auto-assigned reviewers (based on [CODEOWNERS](./.github/CODEOWNERS)) review your PR.
70-
- If any CI/CD checks fail, fix the issues and push again.
71-
- Once your PR is approved and all checks pass, one of the reviewers will merge the PR.
68+
We use [pytest](https://docs.pytest.org/) for all tests. For any new features / examples, make sure to add tests and that the coverage check in your PR passes. The tests are organized into the following directories:
69+
70+
- `tests/unit`: Fast cpu-based unit tests for the core ModelOpt library. They should not take more than a few seconds to run.
71+
- `tests/gpu`: Fast GPU-based unit tests for the core ModelOpt library. In most cases, they should not take more than a few seconds to run.
72+
- `tests/gpu_megatron`: Fast GPU-based unit tests for the core ModelOpt library for Megatron-Core features. In most cases, they should not take more than a few seconds to run.
73+
- `tests/gpu_trtllm`: Fast GPU-based unit tests for the core ModelOpt library for TensorRT-LLM features. In most cases, they should not take more than a few seconds to run.
74+
- `tests/examples`: Integration tests for ModelOpt examples. They should not take more than a few minutes to run. Please refer to [example test README](./tests/examples/README.md) for more details.
75+
76+
Please refer to [tox.ini](./tox.ini) for more details on how to run the tests and their dependencies.
7277

7378
## ✍️ Signing your work
7479

@@ -135,3 +140,9 @@ git push origin <branch> --force-with-lease
135140
136141
(d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.
137142
```
143+
144+
## Submitting your code
145+
146+
- Submit a pull request and let auto-assigned reviewers (based on [CODEOWNERS](./.github/CODEOWNERS)) review your PR.
147+
- If any CI/CD checks fail, fix the issues and push again.
148+
- Once your PR is approved and all checks pass, one of the reviewers will merge the PR.

SECURITY.md

Lines changed: 147 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,150 @@ While NVIDIA currently does not have a bug bounty program, we do offer acknowled
2222

2323
## NVIDIA Product Security
2424

25-
For all security-related concerns, please visit NVIDIA's [Product Security portal](https://www.nvidia.com/en-us/security)
25+
For all security-related concerns, please visit NVIDIA's [Product Security portal](https://www.nvidia.com/en-us/security).
26+
27+
---
28+
29+
## Security Considerations
30+
31+
### Overview
32+
33+
NVIDIA Model Optimizer (ModelOpt) is a library used to optimize ML models and may load and process user-provided artifacts (models, weights, configs, calibration data) and their dependencies. Secure deployment depends on how you source artifacts, validate inputs, and harden the environment where ModelOpt runs.
34+
35+
### What to Be Aware Of
36+
37+
#### Untrusted model and data inputs
38+
39+
- Models, weights, configs and data may be malicious or corrupted.
40+
41+
#### Deserialization and code-execution risks
42+
43+
- Unsafe deserialization can lead to arbitrary code execution if fed untrusted inputs.
44+
- Avoid using serialization formats/settings that can deserialize arbitrary objects.
45+
46+
#### Input validation and resource exhaustion
47+
48+
- Large or malformed inputs can trigger crashes or excessive CPU/GPU/memory use.
49+
- Missing size/type checks can increase DoS risk.
50+
51+
#### Data in transit and at rest
52+
53+
- If fetching models or dependencies over the network, insecure transport can enable tampering.
54+
- Stored artifacts, logs, and caches may contain sensitive data.
55+
56+
#### Logging and observability
57+
58+
- Logs may inadvertently contain sensitive inputs, paths, tokens, or proprietary model details.
59+
- Overly verbose logs can leak operational and security-relevant information.
60+
61+
#### Supply chain and third-party components
62+
63+
- Dependencies may include known vulnerabilities or be compromised.
64+
- Third-party plugins/components loaded at runtime may not have the same security assurances.
65+
66+
### Example Security Approaches
67+
68+
#### Artifact integrity
69+
70+
- Only load artifacts from trusted sources.
71+
- Prefer signed artifacts; verify signatures before loading.
72+
73+
#### Safe parsing and deserialization
74+
75+
- Prefer safer storage formats (avoid object deserialization for untrusted inputs).
76+
- Avoid `pickle`, `torch.load()` with untrusted weights, or YAML `unsafe_load`.
77+
- Treat any unverified artifact as untrusted and block/guard its loading.
78+
79+
#### Hardening and least privilege
80+
81+
- Run with least privilege and isolate workloads.
82+
83+
#### Data protection
84+
85+
- Encrypt sensitive data at rest; use TLS 1.3 for data in transit.
86+
- Never hardcode or log credentials.
87+
88+
#### Resilience
89+
90+
- Validate inputs and enforce limits (file size, timeouts, quotas, etc.).
91+
- Keep OS, containers, and dependencies patched; scan for known vulnerabilities.
92+
93+
---
94+
95+
## Security Coding Practices for Contributors
96+
97+
ModelOpt processes model checkpoints and weights from various sources. Contributors must avoid patterns that can introduce security vulnerabilities. These rules apply to all code except tests. These rules cover a few key security considerations as follows:
98+
99+
### Deserializing untrusted data
100+
101+
**Do not use `torch.load(..., weights_only=False)`** unless a documented exception is provided. It uses pickle under the hood and can execute arbitrary code from a malicious checkpoint.
102+
103+
```python
104+
# Bad — allows arbitrary code execution from the checkpoint file
105+
state = torch.load(path, weights_only=False)
106+
107+
# Good
108+
state = torch.load(path, weights_only=True, map_location="cpu")
109+
110+
# Acceptable only with an inline comment explaining why weights_only=False
111+
# is required and confirming the file is internally-generated / trusted.
112+
state = torch.load(
113+
path,
114+
weights_only=False, # loaded file is generated internally by ModelOpt and not supplied by the user
115+
map_location="cpu",
116+
)
117+
```
118+
119+
**Do not use `numpy.load(..., allow_pickle=True)`** unless a documented exception is provided. It uses pickle under the hood and can execute arbitrary code from a malicious checkpoint.
120+
121+
```python
122+
# Bad — allows arbitrary code execution from the checkpoint file
123+
state = numpy.load(path, allow_pickle=True)
124+
125+
# Good - let the caller decide; default to False
126+
def load_data(path: str, trust_data: bool = False):
127+
return numpy.load(path, allow_pickle=trust_data)
128+
```
129+
130+
**Do not use `yaml.load()`** — always use `yaml.safe_load()`. The default loader can execute arbitrary Python objects embedded in YAML.
131+
132+
### Loading transformers models with `trust_remote_code`
133+
134+
**Do not hardcode `trust_remote_code=True`.** This flag tells Transformers to execute arbitrary Python shipped with a checkpoint, which is an RCE vector if the model source is untrusted.
135+
136+
```python
137+
# Bad — silently opts every user into remote code execution
138+
model = AutoModel.from_pretrained(name, trust_remote_code=True)
139+
140+
# Good — let the caller decide; default to False
141+
def load_model(name: str, trust_remote_code: bool = False):
142+
return AutoModel.from_pretrained(name, trust_remote_code=trust_remote_code)
143+
```
144+
145+
### Subprocess and shell commands
146+
147+
**Never use `shell=True` with string interpolation or user-supplied input.** This is a command-injection vector.
148+
149+
```python
150+
# Bad — command injection if model_name contains shell metacharacters
151+
subprocess.run(f"python convert.py --model {model_name}", shell=True)
152+
153+
# Good — pass arguments as a list
154+
subprocess.run(["python", "convert.py", "--model", model_name])
155+
```
156+
157+
### Other patterns to avoid
158+
159+
- **`eval()` / `exec()`** on strings derived from external input. If you must generate and execute code dynamically, validate the input against an allowlist of safe patterns.
160+
- **Hardcoded secrets or credentials** — never commit tokens, passwords, or API keys. Use environment variables or config files listed in `.gitignore`.
161+
162+
### Bandit security checks
163+
164+
Bandit is used as a pre-commit hook to check for security-sensitive patterns in the code. **`# nosec` comments are not allowed** as a bypass for security checks.
165+
166+
### Creating a security exception
167+
168+
If a security-sensitive pattern (e.g. `pickle`, `subprocess`) is genuinely required, the contributor must:
169+
170+
1. **Add an inline comment** explaining *why* the pattern is necessary and *why* it is safe in this specific context (e.g. "loaded file is generated internally by ModelOpt").
171+
1. **Request review from [@NVIDIA/modelopt-setup-codeowners](https://github.com/orgs/NVIDIA/teams/modelopt-setup-codeowners)** and include a clear justification in the PR description.

docs/source/guides/2_save_load.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -129,9 +129,7 @@ Here is the example workflow of restoring the ModelOpt-modified model architectu
129129
model = ...
130130
131131
# Restore the model architecture using the saved `modelopt_state`
132-
# Security NOTE: weights_only=False is used here on ModelOpt-generated state_dict, not on untrusted user input
133-
modelopt_state = torch.load("modelopt_state.pth", weights_only=False)
134-
model = mto.restore_from_modelopt_state(model, modelopt_state)
132+
model = mto.restore_from_modelopt_state(model, modelopt_state_path="modelopt_state.pth")
135133
136134
# Load the model weights separately after restoring the model architecture
137135
custom_method_to_load_model_weights(model)

docs/source/reference/2_security.rst

Lines changed: 0 additions & 78 deletions
This file was deleted.

0 commit comments

Comments
 (0)