refactor(jailbreak): Use onnx instead of pickle to load model#1715
refactor(jailbreak): Use onnx instead of pickle to load model#1715erickgalinkin wants to merge 11 commits intodevelopfrom
Conversation
Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
Greptile SummaryThis PR replaces the Two previously-flagged test issues remain open:
|
| Filename | Overview |
|---|---|
| nemoguardrails/library/jailbreak_detection/model_based/models.py | Replaced pickle/sklearn with onnxruntime InferenceSession; also adds JAILBREAK_CHECK_DEVICE env-var support and use_safetensors. Minor: stale comment referencing the removed [:2] slice on line 63. |
| nemoguardrails/library/jailbreak_detection/model_based/checks.py | Added mkdir, onnx file-presence check, and auto-download via hf_hub_download; import fixed to relative (.models). Logic looks correct. |
| tests/test_jailbreak_model_based.py | Most mocks updated for onnxruntime; new HF-hub download tests added using tmp_path. Two open issues: dead sklearn.ensemble mock in test_model_based_classifier_imports, and test_initialize_model_with_valid_path still lacks mocks for Path.mkdir/is_file and hf_hub_download (will attempt real FS/network calls). |
| nemoguardrails/library/jailbreak_detection/requirements.txt | Replaced sklearn, pickle-era numpy pin with onnxruntime, updated transformers/torch versions, added huggingface_hub; torchvision kept per known transformers/nomic-BERT dependency. |
| nemoguardrails/library/jailbreak_detection/Dockerfile | Updated wget target from snowflake.pkl to snowflake.onnx. Straightforward and correct. |
| nemoguardrails/library/jailbreak_detection/Dockerfile-GPU | Same pkl→onnx switch as Dockerfile; no other changes. |
Sequence Diagram
sequenceDiagram
participant Client
participant checks as checks.initialize_model()
participant FS as Filesystem
participant HF as HuggingFace Hub
participant models as JailbreakClassifier
participant ONNX as onnxruntime.InferenceSession
participant Embed as SnowflakeEmbed
Client->>checks: initialize_model()
checks->>FS: Path(classifier_path).mkdir()
checks->>FS: snowflake.onnx is_file()?
alt File missing
checks->>HF: hf_hub_download(snowflake.onnx)
HF-->>FS: write snowflake.onnx
end
checks->>models: JailbreakClassifier(path)
models->>Embed: SnowflakeEmbed()
models->>ONNX: InferenceSession(path, CPUExecutionProvider)
models-->>checks: classifier instance
checks-->>Client: JailbreakClassifier
Client->>models: classifier(text)
models->>Embed: embed(text) → numpy array
models->>ONNX: run(None, X=[e])
ONNX-->>models: [class_idx, [{0: p0, 1: p1}]]
models-->>Client: (bool(classification), float(score))
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemoguardrails/library/jailbreak_detection/model_based/models.py
Line: 63
Comment:
**Stale comment references removed `[:2]` slice**
The comment still says "the slice `res[1][:2]` should have only one element," but the code was updated (per the prior review thread) to drop that slice entirely and use `res[1][0][classification]` directly. The comment now refers to code that does not exist, which is confusing for future readers.
```suggestion
# The second item is a list of dicts of probabilities; element [0] is the dict for the first (only) batch item.
# We access the dict entry for the class.
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: tests/test_jailbreak_model_based.py
Line: 49-53
Comment:
**Dead `sklearn.ensemble` mock left over from old code**
The refactor removed all `sklearn` usage from `models.py`, but this test still injects a `sklearn.ensemble` mock into `sys.modules`. It has no effect on the code under test and will mislead readers into thinking sklearn is still an active dependency.
```suggestion
monkeypatch.setitem(sys.modules, "onnxruntime", fake_onnx)
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (10): Last reviewed commit: "Update path for models.py, fix nits on e..." | Re-trigger Greptile
📝 WalkthroughWalkthroughThis PR migrates the jailbreak detection classifier from a pickle-based model to ONNX Runtime, updating the model initialization, inference mechanism, dependencies, and test expectations accordingly. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
nemoguardrails/library/jailbreak_detection/model_based/models.py (2)
25-35:⚠️ Potential issue | 🟠 MajorPin the model revision to an immutable commit hash.
trust_remote_code=Trueexecutes code from the Hugging Face repository at load time. Without pinningrevisionto a specific commit, the application will execute any code updates published to that repository in the future, creating a supply-chain vulnerability.Add
revision="<commit-sha>"(full commit hash, not a branch or tag) to bothAutoTokenizer.from_pretrained()andAutoModel.from_pretrained()calls.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoguardrails/library/jailbreak_detection/model_based/models.py` around lines 25 - 35, The AutoTokenizer and AutoModel loads (AutoTokenizer.from_pretrained and AutoModel.from_pretrained) currently use trust_remote_code=True without pinning a revision; update both calls to include revision="<commit-sha>" (a full immutable commit hash string) so the tokenizer and model load a specific commit, keeping trust_remote_code=True and preserving existing args (safe_serialization, add_pooling_layer) while preventing future remote code changes from being executed.
25-35:⚠️ Potential issue | 🟠 MajorReplace
safe_serializationwithuse_safetensorsfor load-time safetensors configuration.
safe_serializationis a save-time flag, not a load-time parameter. OnAutoTokenizer.from_pretrained()andAutoModel.from_pretrained(), useuse_safetensors=True(oruse_safetensors=Noneto prefer safetensors if available) instead. The current parameter will be ignored or may cause errors depending on the transformers version.Suggested fix
self.tokenizer = AutoTokenizer.from_pretrained( "Snowflake/snowflake-arctic-embed-m-long", trust_remote_code=True, use_safetensors=True ) self.model = AutoModel.from_pretrained( "Snowflake/snowflake-arctic-embed-m-long", trust_remote_code=True, add_pooling_layer=False, use_safetensors=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoguardrails/library/jailbreak_detection/model_based/models.py` around lines 25 - 35, The calls to AutoTokenizer.from_pretrained and AutoModel.from_pretrained in the model initialization use the save-time parameter safe_serialization; replace safe_serialization with the load-time option use_safetensors (e.g., use_safetensors=True or use_safetensors=None to prefer safetensors) in both self.tokenizer and self.model constructor calls (keep trust_remote_code and add_pooling_layer as-is) so the transformers loader actually uses safetensors at load time.
🧹 Nitpick comments (3)
nemoguardrails/library/jailbreak_detection/model_based/models.py (1)
56-57: BuildXas an explicit batched ndarray before calling ONNX Runtime.
{"X": [e]}passes a Python list of vectors into ORT. Exported sklearn ONNX models usually declareXas a 2-D float tensor, so this is relying on implicit coercion in a pretty fragile place.🧪 Suggested change
def __call__(self, text: str) -> Tuple[bool, float]: e = self.embed(text) - res = self.classifier.run(None, {"X": [e]}) + x = np.asarray([e], dtype=np.float32) + res = self.classifier.run(None, {"X": x})Also add at module scope:
import numpy as np🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoguardrails/library/jailbreak_detection/model_based/models.py` around lines 56 - 57, The code passes a Python list to ONNX Runtime (self.classifier.run(None, {"X": [e]}) ), which relies on implicit coercion; instead, import numpy as np at module scope and build an explicit 2-D batched ndarray for X (e.g., X = np.asarray([e], dtype=np.float32) or np.expand_dims with dtype float32) before calling self.classifier.run(None, {"X": X}) so the input matches the exported sklearn ONNX model's 2-D float tensor expectation.nemoguardrails/library/jailbreak_detection/model_based/checks.py (1)
46-46: Fail fast ifsnowflake.onnxis missing.A bad
EMBEDDING_CLASSIFIER_PATHwill currently bubble up as a low-level ONNX Runtime error. Checkingis_file()here would make startup failures much easier to diagnose, especially since this migration depends on a new artifact being present.🛠 Suggested change
- jailbreak_classifier = JailbreakClassifier(str(Path(classifier_path).joinpath("snowflake.onnx"))) + model_path = Path(classifier_path).joinpath("snowflake.onnx") + if not model_path.is_file(): + raise FileNotFoundError(f"Jailbreak classifier not found: {model_path}") + jailbreak_classifier = JailbreakClassifier(str(model_path))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoguardrails/library/jailbreak_detection/model_based/checks.py` at line 46, Check for the presence of the ONNX file before instantiating JailbreakClassifier: compute the path from classifier_path and "snowflake.onnx" (reference the variable classifier_path and the filename "snowflake.onnx"), use Path(...).is_file() and if it returns False raise a clear, early error (e.g., ValueError or RuntimeError) that includes the missing path, then only call JailbreakClassifier with that path; this ensures startup fails fast instead of surfacing a low-level ONNX Runtime error.tests/test_jailbreak_model_based.py (1)
230-257: Please add a unit test for the ONNX session output shape.This assertion only locks down the filename change. The fragile part of the migration is in
nemoguardrails/library/jailbreak_detection/model_based/models.py, Lines 54-62, whereInferenceSession.run()output is unpacked and scored; a mocked ORT test would catch regressions there.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_jailbreak_model_based.py` around lines 230 - 257, Add a unit test that mocks onnxruntime.InferenceSession.run to return the exact tuple/array shapes your model code expects and assert the classifier handles the unpacking and produces the expected output; specifically, in the new test mock models.InferenceSession (or the attribute used inside nemoguardrails.library.jailbreak_detection.model_based.models) so that InferenceSession.run returns a tuple of numpy arrays with the same shapes used by JailbreakClassifier scoring, then instantiate or call JailbreakClassifier (via initialize_model or directly) and assert no exceptions and that the returned score/label shapes/values match expectations; focus on covering InferenceSession.run, JailbreakClassifier, and the unpacking logic so regressions in the run() output shape will fail the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemoguardrails/library/jailbreak_detection/requirements.txt`:
- Line 14: The file ending for the requirements list is missing a trailing
newline; edit the requirements.txt entry that contains "onnxruntime>=1.24.3" and
ensure the file ends with a single newline character (add a newline after the
last line) so the end-of-file fixer / linters stop failing.
- Around line 7-14: The requirements file is missing a trailing newline which
causes lint failures; open
nemoguardrails/library/jailbreak_detection/requirements.txt and add a single
newline character at the end of the file (after the last entry
onnxruntime>=1.24.3) so the file ends with a newline; save and commit the
change.
---
Outside diff comments:
In `@nemoguardrails/library/jailbreak_detection/model_based/models.py`:
- Around line 25-35: The AutoTokenizer and AutoModel loads
(AutoTokenizer.from_pretrained and AutoModel.from_pretrained) currently use
trust_remote_code=True without pinning a revision; update both calls to include
revision="<commit-sha>" (a full immutable commit hash string) so the tokenizer
and model load a specific commit, keeping trust_remote_code=True and preserving
existing args (safe_serialization, add_pooling_layer) while preventing future
remote code changes from being executed.
- Around line 25-35: The calls to AutoTokenizer.from_pretrained and
AutoModel.from_pretrained in the model initialization use the save-time
parameter safe_serialization; replace safe_serialization with the load-time
option use_safetensors (e.g., use_safetensors=True or use_safetensors=None to
prefer safetensors) in both self.tokenizer and self.model constructor calls
(keep trust_remote_code and add_pooling_layer as-is) so the transformers loader
actually uses safetensors at load time.
---
Nitpick comments:
In `@nemoguardrails/library/jailbreak_detection/model_based/checks.py`:
- Line 46: Check for the presence of the ONNX file before instantiating
JailbreakClassifier: compute the path from classifier_path and "snowflake.onnx"
(reference the variable classifier_path and the filename "snowflake.onnx"), use
Path(...).is_file() and if it returns False raise a clear, early error (e.g.,
ValueError or RuntimeError) that includes the missing path, then only call
JailbreakClassifier with that path; this ensures startup fails fast instead of
surfacing a low-level ONNX Runtime error.
In `@nemoguardrails/library/jailbreak_detection/model_based/models.py`:
- Around line 56-57: The code passes a Python list to ONNX Runtime
(self.classifier.run(None, {"X": [e]}) ), which relies on implicit coercion;
instead, import numpy as np at module scope and build an explicit 2-D batched
ndarray for X (e.g., X = np.asarray([e], dtype=np.float32) or np.expand_dims
with dtype float32) before calling self.classifier.run(None, {"X": X}) so the
input matches the exported sklearn ONNX model's 2-D float tensor expectation.
In `@tests/test_jailbreak_model_based.py`:
- Around line 230-257: Add a unit test that mocks
onnxruntime.InferenceSession.run to return the exact tuple/array shapes your
model code expects and assert the classifier handles the unpacking and produces
the expected output; specifically, in the new test mock models.InferenceSession
(or the attribute used inside
nemoguardrails.library.jailbreak_detection.model_based.models) so that
InferenceSession.run returns a tuple of numpy arrays with the same shapes used
by JailbreakClassifier scoring, then instantiate or call JailbreakClassifier
(via initialize_model or directly) and assert no exceptions and that the
returned score/label shapes/values match expectations; focus on covering
InferenceSession.run, JailbreakClassifier, and the unpacking logic so
regressions in the run() output shape will fail the test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 35bd8e9f-9b1d-410e-a34a-4e98c023a745
📒 Files selected for processing (4)
nemoguardrails/library/jailbreak_detection/model_based/checks.pynemoguardrails/library/jailbreak_detection/model_based/models.pynemoguardrails/library/jailbreak_detection/requirements.txttests/test_jailbreak_model_based.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Erick Galinkin <erick.galinkin@gmail.com>
Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…ix requirements.txt
…wnload`. Create `classifier_path` if it does not exist. Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
Pouyanpi
left a comment
There was a problem hiding this comment.
Thank you Erick, looks good 👍🏻
should be good to merge after resolving following issues:
https://github.com/NVIDIA-NeMo/Guardrails/pull/1715/changes#r3130139678
https://github.com/NVIDIA-NeMo/Guardrails/pull/1715/changes#r3130212541
the other comments are not a blocker but nice to do.
| repo_id="nvidia/NemoGuard-JailbreakDetect", filename="snowflake.onnx", local_dir=classifier_path | ||
| ) | ||
|
|
||
| from model_based.models import JailbreakClassifier |
There was a problem hiding this comment.
will crash at runtime?
| from model_based.models import JailbreakClassifier | |
| from nemoguardrails.library.jailbreak_detection.model_based.models import JailbreakClassifier |
like
ModuleNotFoundError No module named 'model_based'
There was a problem hiding this comment.
This approach forces us to use all of nemoguardrails as a dependency. This works in the Docker container but I see how it could be a problem in a non-docker dependency.
from .models import JailbreakClassifier works for me locally.
| self.device = os.getenv("JAILBREAK_CHECK_DEVICE") | ||
| self.tokenizer = AutoTokenizer.from_pretrained( | ||
| "Snowflake/snowflake-arctic-embed-m-long", | ||
| trust_remote_code=True, |
There was a problem hiding this comment.
Which part? All of that is contained in the model card instructions on how to use the model.
| hf_hub_download( | ||
| repo_id="nvidia/NemoGuard-JailbreakDetect", filename="snowflake.onnx", local_dir=classifier_path | ||
| ) |
There was a problem hiding this comment.
We certainly could. it won't hurt anything.
There was a problem hiding this comment.
dead sklearn monkeypatch and stale test intent as in below
There was a problem hiding this comment.
No test covers the new hf_hub_download branch in initialize_model(). would be great to add a patched test that asserts:
- no download when the file exists
- one call to
hf_hub_downloadwith the expected args when it does not.
| if not Path(classifier_path).exists(): | ||
| Path(classifier_path).mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
New mkdir here runs real filesystem I/O before any mock can intercept, breaking test_initialize_model_with_valid_path (uses /fake/path/to/model) :
PermissionError: [Errno 13] Permission denied: '/fake'
see https://github.com/NVIDIA-NeMo/Guardrails/actions/runs/24830251922/job/72676623185#step:16:3679
we can extract the mkdir + hf_hub_download into a mockable helper (e.g. _ensure_model_downloaded), and switch the test to tmp_path.
Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Signed-off-by: Erick Galinkin <erick.galinkin@gmail.com>
Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Signed-off-by: Erick Galinkin <erick.galinkin@gmail.com>
…. Fix tests. Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
Description
Remove use of
pickleby migrating toonnxruntime.NB: depends on push of
onnxmodel to HuggingFace.Summary by CodeRabbit