refactor(jailbreak): Use onnx instead of pickle to load model by erickgalinkin · Pull Request #1715 · NVIDIA-NeMo/Guardrails

erickgalinkin · 2026-03-11T13:15:51Z

Description

Remove use of pickle by migrating to onnxruntime.

NB: depends on push of onnx model to HuggingFace.

Summary by CodeRabbit

Chores
- Updated jailbreak detection model format and improved model loading with enhanced safety mechanisms.
- Updated dependencies including transformers, PyTorch, and runtime components to newer versions for improved performance and compatibility.

Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>

greptile-apps · 2026-03-11T13:19:01Z

Greptile Summary

This PR replaces the pickle-based sklearn RandomForestClassifier with an onnxruntime InferenceSession, migrating the snowflake.onnx model format. It also adds auto-download via hf_hub_download when the model file is absent, fixes the import to use a relative path (from .models import JailbreakClassifier), and respects JAILBREAK_CHECK_DEVICE for device selection.

Two previously-flagged test issues remain open: test_initialize_model_with_valid_path still uses a fake /fake/path/to/model path without mocking Path.mkdir, Path.is_file, or hf_hub_download, which will cause the test to attempt real filesystem writes and a live HuggingFace network call.

Confidence Score: 4/5

Safe to merge once the unfixed test issues are addressed; core logic is sound.

The production code changes (models.py, checks.py, Dockerfiles, requirements) are correct and clean. The main concern is in the test suite: test_initialize_model_with_valid_path still lacks mocks for Path.mkdir, Path.is_file, and hf_hub_download, meaning it will try real filesystem operations and a live network call on CI. A dead sklearn.ensemble mock and a stale comment are minor but the unguarded test is enough to keep this at 4 rather than 5.

tests/test_jailbreak_model_based.py — test_initialize_model_with_valid_path needs mocks for Path.mkdir, Path.is_file, and hf_hub_download before it is CI-safe.

Important Files Changed

Filename	Overview
nemoguardrails/library/jailbreak_detection/model_based/models.py	Replaced pickle/sklearn with onnxruntime InferenceSession; also adds JAILBREAK_CHECK_DEVICE env-var support and use_safetensors. Minor: stale comment referencing the removed [:2] slice on line 63.
nemoguardrails/library/jailbreak_detection/model_based/checks.py	Added mkdir, onnx file-presence check, and auto-download via hf_hub_download; import fixed to relative (.models). Logic looks correct.
tests/test_jailbreak_model_based.py	Most mocks updated for onnxruntime; new HF-hub download tests added using tmp_path. Two open issues: dead sklearn.ensemble mock in test_model_based_classifier_imports, and test_initialize_model_with_valid_path still lacks mocks for Path.mkdir/is_file and hf_hub_download (will attempt real FS/network calls).
nemoguardrails/library/jailbreak_detection/requirements.txt	Replaced sklearn, pickle-era numpy pin with onnxruntime, updated transformers/torch versions, added huggingface_hub; torchvision kept per known transformers/nomic-BERT dependency.
nemoguardrails/library/jailbreak_detection/Dockerfile	Updated wget target from snowflake.pkl to snowflake.onnx. Straightforward and correct.
nemoguardrails/library/jailbreak_detection/Dockerfile-GPU	Same pkl→onnx switch as Dockerfile; no other changes.

Sequence Diagram

sequenceDiagram
    participant Client
    participant checks as checks.initialize_model()
    participant FS as Filesystem
    participant HF as HuggingFace Hub
    participant models as JailbreakClassifier
    participant ONNX as onnxruntime.InferenceSession
    participant Embed as SnowflakeEmbed

    Client->>checks: initialize_model()
    checks->>FS: Path(classifier_path).mkdir()
    checks->>FS: snowflake.onnx is_file()?
    alt File missing
        checks->>HF: hf_hub_download(snowflake.onnx)
        HF-->>FS: write snowflake.onnx
    end
    checks->>models: JailbreakClassifier(path)
    models->>Embed: SnowflakeEmbed()
    models->>ONNX: InferenceSession(path, CPUExecutionProvider)
    models-->>checks: classifier instance
    checks-->>Client: JailbreakClassifier

    Client->>models: classifier(text)
    models->>Embed: embed(text) → numpy array
    models->>ONNX: run(None, X=[e])
    ONNX-->>models: [class_idx, [{0: p0, 1: p1}]]
    models-->>Client: (bool(classification), float(score))

Prompt To Fix All With AI

This is a comment left during a code review.
Path: nemoguardrails/library/jailbreak_detection/model_based/models.py
Line: 63

Comment:
**Stale comment references removed `[:2]` slice**

The comment still says "the slice `res[1][:2]` should have only one element," but the code was updated (per the prior review thread) to drop that slice entirely and use `res[1][0][classification]` directly. The comment now refers to code that does not exist, which is confusing for future readers.

```suggestion
        # The second item is a list of dicts of probabilities; element [0] is the dict for the first (only) batch item.
        # We access the dict entry for the class.
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/test_jailbreak_model_based.py
Line: 49-53

Comment:
**Dead `sklearn.ensemble` mock left over from old code**

The refactor removed all `sklearn` usage from `models.py`, but this test still injects a `sklearn.ensemble` mock into `sys.modules`. It has no effect on the code under test and will mislead readers into thinking sklearn is still an active dependency.

```suggestion
    monkeypatch.setitem(sys.modules, "onnxruntime", fake_onnx)
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (10): Last reviewed commit: "Update path for models.py, fix nits on e..." | Re-trigger Greptile}

coderabbitai · 2026-03-11T13:26:20Z

📝 Walkthrough

Walkthrough

This PR migrates the jailbreak detection classifier from a pickle-based model to ONNX Runtime, updating the model initialization, inference mechanism, dependencies, and test expectations accordingly.

Changes

Cohort / File(s)	Summary
Model Format Migration `nemoguardrails/library/jailbreak_detection/model_based/models.py`, `nemoguardrails/library/jailbreak_detection/model_based/checks.py`	Changed model serialization from pickle ("snowflake.pkl") to ONNX format ("snowflake.onnx"). Updated inference from scikit-learn's predict_proba/argmax pattern to ONNX Runtime's InferenceSession.run API. Enhanced device selection from "cuda:0" to "cuda" and added trust_remote_code and safe_serialization flags for transformer loading.
Dependency Updates `nemoguardrails/library/jailbreak_detection/requirements.txt`	Updated package versions for transformers, torch, torchvision, nemoguardrails, numpy, scikit-learn, and einops. Added onnxruntime>=1.24.3 as a new dependency.
Test Updates `tests/test_jailbreak_model_based.py`	Updated expected model file path from "snowflake.pkl" to "snowflake.onnx" in test_initialize_model_with_valid_path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR description lacks test results or testing information for this major ONNX migration change affecting model inference pipeline.	Add test results demonstrating ONNX model produces equivalent classification results, include performance metrics, and document regression testing methodology.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: migrating from pickle to ONNX for model loading in the jailbreak detection module.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jailbreak-onnx

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

nemoguardrails/library/jailbreak_detection/model_based/models.py (2)
25-35: ⚠️ Potential issue | 🟠 Major

Pin the model revision to an immutable commit hash.

trust_remote_code=True executes code from the Hugging Face repository at load time. Without pinning revision to a specific commit, the application will execute any code updates published to that repository in the future, creating a supply-chain vulnerability.

Add revision="<commit-sha>" (full commit hash, not a branch or tag) to both AutoTokenizer.from_pretrained() and AutoModel.from_pretrained() calls.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoguardrails/library/jailbreak_detection/model_based/models.py` around
lines 25 - 35, The AutoTokenizer and AutoModel loads
(AutoTokenizer.from_pretrained and AutoModel.from_pretrained) currently use
trust_remote_code=True without pinning a revision; update both calls to include
revision="<commit-sha>" (a full immutable commit hash string) so the tokenizer
and model load a specific commit, keeping trust_remote_code=True and preserving
existing args (safe_serialization, add_pooling_layer) while preventing future
remote code changes from being executed.
25-35: ⚠️ Potential issue | 🟠 Major

Replace safe_serialization with use_safetensors for load-time safetensors configuration.

safe_serialization is a save-time flag, not a load-time parameter. On AutoTokenizer.from_pretrained() and AutoModel.from_pretrained(), use use_safetensors=True (or use_safetensors=None to prefer safetensors if available) instead. The current parameter will be ignored or may cause errors depending on the transformers version.
Suggested fix
self.tokenizer = AutoTokenizer.from_pretrained(
    "Snowflake/snowflake-arctic-embed-m-long",
    trust_remote_code=True,
    use_safetensors=True
)
self.model = AutoModel.from_pretrained(
    "Snowflake/snowflake-arctic-embed-m-long",
    trust_remote_code=True,
    add_pooling_layer=False,
    use_safetensors=True,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoguardrails/library/jailbreak_detection/model_based/models.py` around
lines 25 - 35, The calls to AutoTokenizer.from_pretrained and
AutoModel.from_pretrained in the model initialization use the save-time
parameter safe_serialization; replace safe_serialization with the load-time
option use_safetensors (e.g., use_safetensors=True or use_safetensors=None to
prefer safetensors) in both self.tokenizer and self.model constructor calls
(keep trust_remote_code and add_pooling_layer as-is) so the transformers loader
actually uses safetensors at load time.

🧹 Nitpick comments (3)

nemoguardrails/library/jailbreak_detection/model_based/models.py (1)

56-57: Build X as an explicit batched ndarray before calling ONNX Runtime.

{"X": [e]} passes a Python list of vectors into ORT. Exported sklearn ONNX models usually declare X as a 2-D float tensor, so this is relying on implicit coercion in a pretty fragile place.

🧪 Suggested change

     def __call__(self, text: str) -> Tuple[bool, float]:
         e = self.embed(text)
-        res = self.classifier.run(None, {"X": [e]})
+        x = np.asarray([e], dtype=np.float32)
+        res = self.classifier.run(None, {"X": x})

Also add at module scope:

import numpy as np

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemoguardrails/library/jailbreak_detection/model_based/models.py` around
lines 56 - 57, The code passes a Python list to ONNX Runtime
(self.classifier.run(None, {"X": [e]}) ), which relies on implicit coercion;
instead, import numpy as np at module scope and build an explicit 2-D batched
ndarray for X (e.g., X = np.asarray([e], dtype=np.float32) or np.expand_dims
with dtype float32) before calling self.classifier.run(None, {"X": X}) so the
input matches the exported sklearn ONNX model's 2-D float tensor expectation.

nemoguardrails/library/jailbreak_detection/model_based/checks.py (1)

46-46: Fail fast if snowflake.onnx is missing.

A bad EMBEDDING_CLASSIFIER_PATH will currently bubble up as a low-level ONNX Runtime error. Checking is_file() here would make startup failures much easier to diagnose, especially since this migration depends on a new artifact being present.

🛠 Suggested change

-    jailbreak_classifier = JailbreakClassifier(str(Path(classifier_path).joinpath("snowflake.onnx")))
+    model_path = Path(classifier_path).joinpath("snowflake.onnx")
+    if not model_path.is_file():
+        raise FileNotFoundError(f"Jailbreak classifier not found: {model_path}")
+    jailbreak_classifier = JailbreakClassifier(str(model_path))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemoguardrails/library/jailbreak_detection/model_based/checks.py` at line 46,
Check for the presence of the ONNX file before instantiating
JailbreakClassifier: compute the path from classifier_path and "snowflake.onnx"
(reference the variable classifier_path and the filename "snowflake.onnx"), use
Path(...).is_file() and if it returns False raise a clear, early error (e.g.,
ValueError or RuntimeError) that includes the missing path, then only call
JailbreakClassifier with that path; this ensures startup fails fast instead of
surfacing a low-level ONNX Runtime error.

tests/test_jailbreak_model_based.py (1)

230-257: Please add a unit test for the ONNX session output shape.

This assertion only locks down the filename change. The fragile part of the migration is in nemoguardrails/library/jailbreak_detection/model_based/models.py, Lines 54-62, where InferenceSession.run() output is unpacked and scored; a mocked ORT test would catch regressions there.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_jailbreak_model_based.py` around lines 230 - 257, Add a unit test
that mocks onnxruntime.InferenceSession.run to return the exact tuple/array
shapes your model code expects and assert the classifier handles the unpacking
and produces the expected output; specifically, in the new test mock
models.InferenceSession (or the attribute used inside
nemoguardrails.library.jailbreak_detection.model_based.models) so that
InferenceSession.run returns a tuple of numpy arrays with the same shapes used
by JailbreakClassifier scoring, then instantiate or call JailbreakClassifier
(via initialize_model or directly) and assert no exceptions and that the
returned score/label shapes/values match expectations; focus on covering
InferenceSession.run, JailbreakClassifier, and the unpacking logic so
regressions in the run() output shape will fail the test.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoguardrails/library/jailbreak_detection/requirements.txt`:
- Line 14: The file ending for the requirements list is missing a trailing
newline; edit the requirements.txt entry that contains "onnxruntime>=1.24.3" and
ensure the file ends with a single newline character (add a newline after the
last line) so the end-of-file fixer / linters stop failing.
- Around line 7-14: The requirements file is missing a trailing newline which
causes lint failures; open
nemoguardrails/library/jailbreak_detection/requirements.txt and add a single
newline character at the end of the file (after the last entry
onnxruntime>=1.24.3) so the file ends with a newline; save and commit the
change.

---

Outside diff comments:
In `@nemoguardrails/library/jailbreak_detection/model_based/models.py`:
- Around line 25-35: The AutoTokenizer and AutoModel loads
(AutoTokenizer.from_pretrained and AutoModel.from_pretrained) currently use
trust_remote_code=True without pinning a revision; update both calls to include
revision="<commit-sha>" (a full immutable commit hash string) so the tokenizer
and model load a specific commit, keeping trust_remote_code=True and preserving
existing args (safe_serialization, add_pooling_layer) while preventing future
remote code changes from being executed.
- Around line 25-35: The calls to AutoTokenizer.from_pretrained and
AutoModel.from_pretrained in the model initialization use the save-time
parameter safe_serialization; replace safe_serialization with the load-time
option use_safetensors (e.g., use_safetensors=True or use_safetensors=None to
prefer safetensors) in both self.tokenizer and self.model constructor calls
(keep trust_remote_code and add_pooling_layer as-is) so the transformers loader
actually uses safetensors at load time.

---

Nitpick comments:
In `@nemoguardrails/library/jailbreak_detection/model_based/checks.py`:
- Line 46: Check for the presence of the ONNX file before instantiating
JailbreakClassifier: compute the path from classifier_path and "snowflake.onnx"
(reference the variable classifier_path and the filename "snowflake.onnx"), use
Path(...).is_file() and if it returns False raise a clear, early error (e.g.,
ValueError or RuntimeError) that includes the missing path, then only call
JailbreakClassifier with that path; this ensures startup fails fast instead of
surfacing a low-level ONNX Runtime error.

In `@nemoguardrails/library/jailbreak_detection/model_based/models.py`:
- Around line 56-57: The code passes a Python list to ONNX Runtime
(self.classifier.run(None, {"X": [e]}) ), which relies on implicit coercion;
instead, import numpy as np at module scope and build an explicit 2-D batched
ndarray for X (e.g., X = np.asarray([e], dtype=np.float32) or np.expand_dims
with dtype float32) before calling self.classifier.run(None, {"X": X}) so the
input matches the exported sklearn ONNX model's 2-D float tensor expectation.

In `@tests/test_jailbreak_model_based.py`:
- Around line 230-257: Add a unit test that mocks
onnxruntime.InferenceSession.run to return the exact tuple/array shapes your
model code expects and assert the classifier handles the unpacking and produces
the expected output; specifically, in the new test mock models.InferenceSession
(or the attribute used inside
nemoguardrails.library.jailbreak_detection.model_based.models) so that
InferenceSession.run returns a tuple of numpy arrays with the same shapes used
by JailbreakClassifier scoring, then instantiate or call JailbreakClassifier
(via initialize_model or directly) and assert no exceptions and that the
returned score/label shapes/values match expectations; focus on covering
InferenceSession.run, JailbreakClassifier, and the unpacking logic so
regressions in the run() output shape will fail the test.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 35bd8e9f-9b1d-410e-a34a-4e98c023a745

📥 Commits

Reviewing files that changed from the base of the PR and between 2646995 and 82633e2.

📒 Files selected for processing (4)

nemoguardrails/library/jailbreak_detection/model_based/checks.py
nemoguardrails/library/jailbreak_detection/model_based/models.py
nemoguardrails/library/jailbreak_detection/requirements.txt
tests/test_jailbreak_model_based.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Erick Galinkin <erick.galinkin@gmail.com>

Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>

codecov · 2026-03-11T15:55:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…ix requirements.txt

trebedea

Looks good, added a couple of suggestions. One of them is related to the failing test.

…wnload`. Create `classifier_path` if it does not exist. Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>

Pouyanpi

Thank you Erick, looks good 👍🏻

should be good to merge after resolving following issues:
https://github.com/NVIDIA-NeMo/Guardrails/pull/1715/changes#r3130139678
https://github.com/NVIDIA-NeMo/Guardrails/pull/1715/changes#r3130212541

the other comments are not a blocker but nice to do.

Pouyanpi · 2026-04-23T10:35:00Z

+            repo_id="nvidia/NemoGuard-JailbreakDetect", filename="snowflake.onnx", local_dir=classifier_path
+        )
+
+    from model_based.models import JailbreakClassifier


will crash at runtime?

Suggested change

from model_based.models import JailbreakClassifier

from nemoguardrails.library.jailbreak_detection.model_based.models import JailbreakClassifier

like

ModuleNotFoundError No module named 'model_based'

This approach forces us to use all of nemoguardrails as a dependency. This works in the Docker container but I see how it could be a problem in a non-docker dependency.

from .models import JailbreakClassifier works for me locally.

Pouyanpi · 2026-04-23T10:36:11Z

+            self.device = os.getenv("JAILBREAK_CHECK_DEVICE")
+        self.tokenizer = AutoTokenizer.from_pretrained(
+            "Snowflake/snowflake-arctic-embed-m-long",
+            trust_remote_code=True,


is this needed?

Which part? All of that is contained in the model card instructions on how to use the model.

Pouyanpi · 2026-04-23T10:37:41Z

+        hf_hub_download(
+            repo_id="nvidia/NemoGuard-JailbreakDetect", filename="snowflake.onnx", local_dir=classifier_path
+        )


shall we pin revision?

We certainly could. it won't hurt anything.

Pouyanpi · 2026-04-23T10:38:59Z

dead sklearn monkeypatch and stale test intent as in below

Pouyanpi · 2026-04-23T10:40:20Z

stale patch

Pouyanpi · 2026-04-23T10:41:14Z

No test covers the new hf_hub_download branch in initialize_model(). would be great to add a patched test that asserts:

no download when the file exists

one call to hf_hub_download with the expected args when it does not.

Pouyanpi · 2026-04-23T10:49:24Z

+    if not Path(classifier_path).exists():
+        Path(classifier_path).mkdir(parents=True, exist_ok=True)


New mkdir here runs real filesystem I/O before any mock can intercept, breaking test_initialize_model_with_valid_path (uses /fake/path/to/model) :

PermissionError: [Errno 13] Permission denied: '/fake'

see https://github.com/NVIDIA-NeMo/Guardrails/actions/runs/24830251922/job/72676623185#step:16:3679

we can extract the mkdir + hf_hub_download into a mockable helper (e.g. _ensure_model_downloaded), and switch the test to tmp_path.

Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Signed-off-by: Erick Galinkin <erick.galinkin@gmail.com>

…. Fix tests. Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>

Update code to use onnx instead of pickle

82633e2

Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>

greptile-apps Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread nemoguardrails/library/jailbreak_detection/model_based/models.py

Comment thread nemoguardrails/library/jailbreak_detection/requirements.txt

coderabbitai Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread nemoguardrails/library/jailbreak_detection/requirements.txt Outdated

Comment thread nemoguardrails/library/jailbreak_detection/requirements.txt Outdated

Update nemoguardrails/library/jailbreak_detection/model_based/models.py

fbece23

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Erick Galinkin <erick.galinkin@gmail.com>

greptile-apps Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread nemoguardrails/library/jailbreak_detection/model_based/models.py Outdated

erickgalinkin added 2 commits March 11, 2026 11:23

Apply black, adjust comments. Add trailing newline to requirements.txt

63a9a54

Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>

Adjust tests to expect onnx instead of pickle

759e8ef

Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>

erickgalinkin requested review from Pouyanpi and tgasser-nv March 11, 2026 15:27

greptile-apps Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread tests/test_jailbreak_model_based.py Outdated

Fix broken test.

947e011

Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>

Pouyanpi changed the title ~~Update code to use onnx instead of pickle~~ refactor(jailbreak): Use onnx instead of pickle to load model Mar 12, 2026

This was referenced Apr 2, 2026

chore(deps): bump scikit-learn from 1.2.2 to 1.5.0 in /nemoguardrails/library/jailbreak_detection #934

Closed

feature: make jailbreak model-based rail fully usable in the open-source distribution #1739

Open

erickgalinkin linked an issue Apr 2, 2026 that may be closed by this pull request

feature: make jailbreak model-based rail fully usable in the open-source distribution #1739

Open

1 task

Fix file download for checks. Update Dockerfiles with new location. F…

be31f73

…ix requirements.txt

trebedea approved these changes Apr 6, 2026

View reviewed changes

Comment thread nemoguardrails/library/jailbreak_detection/model_based/checks.py Outdated

Comment thread nemoguardrails/library/jailbreak_detection/model_based/models.py Outdated

Comment thread nemoguardrails/library/jailbreak_detection/model_based/models.py Outdated

erickgalinkin and others added 2 commits April 22, 2026 15:13

Use "JAILBREAK_CHECK_DEVICE" if set. Use local_dir for `hf_hub_do…

3e7204b

…wnload`. Create `classifier_path` if it does not exist. Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>

chore: run pre-commits

c956cdc

Pouyanpi reviewed Apr 23, 2026

View reviewed changes

erickgalinkin and others added 3 commits April 23, 2026 15:14

Update nemoguardrails/library/jailbreak_detection/model_based/models.py

ba907cb

Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Signed-off-by: Erick Galinkin <erick.galinkin@gmail.com>

Apply suggestions from code review

b170e50

Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Signed-off-by: Erick Galinkin <erick.galinkin@gmail.com>

Update path for models.py, fix nits on environ checks and path checks…

411d969

…. Fix tests. Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>

	from model_based.models import JailbreakClassifier
	from nemoguardrails.library.jailbreak_detection.model_based.models import JailbreakClassifier

		if not Path(classifier_path).exists():
		Path(classifier_path).mkdir(parents=True, exist_ok=True)

Conversation

erickgalinkin commented Mar 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

greptile-apps Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Mar 11, 2026

Codecov Report

Uh oh!

trebedea left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pouyanpi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erickgalinkin commented Mar 11, 2026 •

edited by coderabbitai Bot

Loading

greptile-apps Bot commented Mar 11, 2026 •

edited

Loading

coderabbitai Bot commented Mar 11, 2026 •

edited

Loading

trebedea left a comment •

edited

Loading