Feat/nemoguard content safety#192
Conversation
Content safety guardrail using NVIDIA NeMo Guard (llama-3.1-nemoguard-8b-content-safety). Buffers request and response bodies, extracts user/assistant text via configurable JSONPath expressions, and forwards to the NeMo Guard inference endpoint for classification across 23 safety categories. Unsafe requests are blocked; unsafe responses are replaced with a sanitised error message. - Migrated from wso2_gateway_policy_sdk to apip_sdk_core - Single NemoGuardContentSafetyPolicy class implementing both RequestPolicy and ResponsePolicy - Typed frozen dataclasses: SystemParams, PhaseParams, RequestParams - normalize_system_params / normalize_request_params with safe coercion helpers - Per-phase (request/response) configuration: enabled, jsonPath, blockStatusCode, categories filter, passthroughOnError, showAssessment - _blocked_codes maps boolean category toggles to S-code frozenset (None = block all) - _resolve_jsonpath with try/except guard for unclosed brackets - 55 unit tests covering normalisation, prompt building, blocking, error handling, category filtering, and HTTP call details
|
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request introduces a new NVIDIA NeMo Guard Content Safety policy as a complete feature addition. The changes include policy implementation in Python, configuration schema definitions in YAML, documentation covering functionality and configuration, build and packaging configuration, and comprehensive unit tests. The policy validates request and response content by extracting text via configurable JSONPath, sending it to an external NeMo Guard inference endpoint, and blocking or passing through based on safety classification across 23 configurable categories. Sequence Diagram(s)sequenceDiagram
participant Client
participant Gateway
participant Policy as NeMo Guard<br/>Policy
participant NeMoGuard as NeMo Guard<br/>Inference Service
participant Backend
Client->>Gateway: HTTP Request
Gateway->>Policy: Buffer & check request body
alt Request check enabled
Policy->>Policy: Extract content via JSONPath
Policy->>NeMoGuard: POST /v1/chat/completions<br/>(with safety prompt)
NeMoGuard-->>Policy: Safety classification response
alt Content unsafe & blocking enabled
Policy-->>Gateway: ImmediateResponse<br/>(configurable status code)
Gateway-->>Client: Blocked response
else Content safe or passthrough
Policy-->>Gateway: Continue
end
end
Gateway->>Backend: Forward request
Backend-->>Gateway: Response
Gateway->>Policy: Buffer & check response body
alt Response check enabled
Policy->>Policy: Extract content via JSONPath
Policy->>NeMoGuard: POST /v1/chat/completions<br/>(with safety prompt)
NeMoGuard-->>Policy: Safety classification response
alt Content unsafe & blocking enabled
Policy-->>Gateway: ImmediateResponse<br/>(replacement response)
Gateway-->>Client: Sanitized response
else Content safe or passthrough
Policy-->>Gateway: Continue
end
end
Gateway-->>Client: Response
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@docs/nvidia-nemoguard-content-safety/v0.1/docs/nvidia-nemoguard-content-safety.md`:
- Around line 202-203: Example 2's description claims only violence and illegal
activity are blocked but the YAML for both request and response includes the
criminal_planning category; update either the prose or the YAML so they match:
either change the descriptive sentence in "Example 2" to list violence, illegal
activity, and criminal_planning, or remove "criminal_planning" from the YAML
category lists in both the request and response examples so the text and samples
are consistent (look for the "Example 2" heading and the YAML block containing
"criminal_planning" to make the edit).
In `@policies/nvidia-nemoguard-content-safety/policy-definition.yaml`:
- Around line 24-31: The schema currently allows unknown keys in configurable
objects (e.g., the request phase object under properties.request and similar
phase/parameters/category objects), so add additionalProperties: false to each
of those object schemas (including the parameters objects, each phase object
like request/response, and category objects referenced in the
policy-definition.yaml) to ensure typos and unexpected keys fail validation;
locate the object schemas (symbols: properties.request, parameters, category)
and insert additionalProperties: false directly under those object definitions.
In `@policies/nvidia-nemoguard-content-safety/pyproject.toml`:
- Around line 12-14: The dependency declaration for "requests" is unbounded;
update the pyproject.toml dependencies list to pin a safe supported range by
replacing "requests" with a constrained spec such as "requests>=2.33.1,<3.0.0"
so the project uses the patched 2.33.1+ releases while avoiding major-breaking
3.x versions; locate the dependency entry in pyproject.toml (the dependencies =
[...] block) and make this change.
In
`@policies/nvidia-nemoguard-content-safety/src/nvidia_nemoguard_content_safety_v0/policy.py`:
- Around line 229-237: The response flow currently treats missing or unparsable
request bodies as safe because on_response_body() only adds user_text from
res_ctx.request_body and then _call_nemoguard() returns False if no user message
exists; change this by detecting when request extraction
(res_ctx.request_body/res_ctx.request_body.content ->
req_params.request.json_path -> _resolve_jsonpath) fails and either (A)
construct a response-only moderation payload from the response content and
append it to messages before calling _call_nemoguard(), or (B) invoke the
configured error-handling path instead of proceeding as safe; ensure the call
site in on_response_body() still invokes _call_nemoguard(messages, ...) even
when request_text is absent, and apply the same fix to the similar block around
the code referenced at lines 467-470 so responses cannot bypass moderation when
request extraction fails.
In `@policies/nvidia-nemoguard-content-safety/tests/test_policy.py`:
- Around line 100-123: The test currently installs global stubs via
install_dependency_stubs() and mutates sys.path in load_policy_module(), leaving
requests/apip_sdk_core and the added src path in the process state; change this
so the stubs and path changes are scoped and cleaned up: move the call to
load_policy_module() out of module import-time and into test setup and teardown
(or use a context manager/pytest fixture) that (1) saves original sys.modules
entries for "requests" and "apip_sdk_core", (2) calls
install_dependency_stubs(), (3) inserts src_dir into sys.path, (4) imports the
policy via importlib.import_module("nvidia_nemoguard_content_safety_v0.policy"),
and finally (5) restores sys.modules (replacing or deleting the fake entries)
and restores sys.path to its prior state in teardown; update references to
install_dependency_stubs and load_policy_module accordingly so no global process
state is left modified after each test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dc9d5b0b-f9d2-4dff-a9ff-8a37571d76dd
📒 Files selected for processing (11)
docs/README.mddocs/nvidia-nemoguard-content-safety/v0.1/docs/nvidia-nemoguard-content-safety.mddocs/nvidia-nemoguard-content-safety/v0.1/metadata.jsonpolicies/nvidia-nemoguard-content-safety/README.mdpolicies/nvidia-nemoguard-content-safety/policy-definition.yamlpolicies/nvidia-nemoguard-content-safety/pyproject.tomlpolicies/nvidia-nemoguard-content-safety/requirements.txtpolicies/nvidia-nemoguard-content-safety/src/nvidia_nemoguard_content_safety_v0/__init__.pypolicies/nvidia-nemoguard-content-safety/src/nvidia_nemoguard_content_safety_v0/policy.pypolicies/nvidia-nemoguard-content-safety/tests/__init__.pypolicies/nvidia-nemoguard-content-safety/tests/test_policy.py
| Enable response-phase checking and restrict blocking to a specific subset of categories. This example blocks only violence and illegal activity in both directions, ignoring all other categories: | ||
|
|
There was a problem hiding this comment.
Example 2 description contradicts the YAML category set.
The text says only violence and illegal activity are blocked, but criminal_planning is also enabled in both request and response examples. Please align the prose or remove that category from the sample.
Also applies to: 217-227
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@docs/nvidia-nemoguard-content-safety/v0.1/docs/nvidia-nemoguard-content-safety.md`
around lines 202 - 203, Example 2's description claims only violence and illegal
activity are blocked but the YAML for both request and response includes the
criminal_planning category; update either the prose or the YAML so they match:
either change the descriptive sentence in "Example 2" to list violence, illegal
activity, and criminal_planning, or remove "criminal_planning" from the YAML
category lists in both the request and response examples so the text and samples
are consistent (look for the "Example 2" heading and the YAML block containing
"criminal_planning" to make the edit).
| type: object | ||
| properties: | ||
| request: | ||
| type: object | ||
| x-wso2-policy-advanced-param: false | ||
| description: Configuration for request-phase content safety checks. | ||
| properties: | ||
| enabled: |
There was a problem hiding this comment.
Harden schema by rejecting unknown keys in configurable objects.
parameters, phase objects, and category objects currently accept extra keys. Add additionalProperties: false so typos fail fast instead of being silently ignored.
Suggested schema hardening
parameters:
type: object
+ additionalProperties: false
properties:
request:
type: object
+ additionalProperties: false
@@
categories:
type: object
+ additionalProperties: false
@@
response:
type: object
+ additionalProperties: false
@@
categories:
type: object
+ additionalProperties: falseAlso applies to: 53-61, 193-217
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@policies/nvidia-nemoguard-content-safety/policy-definition.yaml` around lines
24 - 31, The schema currently allows unknown keys in configurable objects (e.g.,
the request phase object under properties.request and similar
phase/parameters/category objects), so add additionalProperties: false to each
of those object schemas (including the parameters objects, each phase object
like request/response, and category objects referenced in the
policy-definition.yaml) to ensure typos and unexpected keys fail validation;
locate the object schemas (symbols: properties.request, parameters, category)
and insert additionalProperties: false directly under those object definitions.
| dependencies = [ | ||
| "requests", | ||
| ] |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the latest stable requests version compatible with Python 3.10, and what minimum version is currently recommended to avoid known high-severity advisories?
💡 Result:
The latest stable version of the requests library compatible with Python 3.10 is 2.33.1. All versions from 2.33.0 onwards require Python >=3.10 and dropped support for Python 3.9, making 2.33.1 (released 2026-03-30) fully compatible as Python 3.10 remains in security support until October 2026. To avoid known high-severity advisories, the minimum recommended version is 2.33.1, as it addresses all disclosed vulnerabilities: - CVE-2026-25645 (pre-2.33.0, local file overwrite via zip extraction, CVSS 4.4 Medium) - CVE-2024-47081 (pre-2.32.4, .netrc credential leak, CVSS 5.3 Medium) - CVE-2023-32681 (2.3.0 to pre-2.31.0, Proxy-Authorization leak, CVSS 6.1 Medium) - Older issues like CVE-2018-18074 (pre-2.20.0, auth header leak on redirect, CVSS 7.5 High) are also fixed. Snyk confirms no known security issues in 2.33.1. Python 3.10 compatibility is explicit in PyPI metadata (Python: >=3.10) and GitHub docs.
Citations:
- 1: https://github.com/psf/requests/releases
- 2: https://requests.kennethreitz.org/en/stable/
- 3: https://github.com/requests/requests/
- 4: https://pypi.org/project/requests/
- 5: https://security.snyk.io/package/pip/requests
- 6: https://advisories.gitlab.com/pkg/pypi/requests
- 7: Drop support for Python 3.9 psf/requests#7196
- 8: https://www.opencve.io/cve?product=requests&vendor=python
🏁 Script executed:
cat -n policies/nvidia-nemoguard-content-safety/pyproject.tomlRepository: wso2/gateway-controllers
Length of output: 998
Constrain requests to a supported safe range.
requests is currently unbounded. Add an explicit version range to improve reproducibility and ensure all known security advisories are addressed. Since this package requires Python >=3.10, use version 2.33.1 or later (the current stable release with all CVEs patched).
Suggested update
dependencies = [
- "requests",
+ "requests>=2.33.1,<4",
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| dependencies = [ | |
| "requests", | |
| ] | |
| dependencies = [ | |
| "requests>=2.33.1,<4", | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@policies/nvidia-nemoguard-content-safety/pyproject.toml` around lines 12 -
14, The dependency declaration for "requests" is unbounded; update the
pyproject.toml dependencies list to pin a safe supported range by replacing
"requests" with a constrained spec such as "requests>=2.33.1,<3.0.0" so the
project uses the patched 2.33.1+ releases while avoiding major-breaking 3.x
versions; locate the dependency entry in pyproject.toml (the dependencies =
[...] block) and make this change.
| if res_ctx.request_body and res_ctx.request_body.present and res_ctx.request_body.content: | ||
| req_json_path = req_params.request.json_path | ||
| try: | ||
| req_data = json.loads(res_ctx.request_body.content) | ||
| user_text = _resolve_jsonpath(req_data, req_json_path) | ||
| if user_text and isinstance(user_text, str): | ||
| messages.append({"role": "user", "content": user_text}) | ||
| except (json.JSONDecodeError, UnicodeDecodeError): | ||
| pass |
There was a problem hiding this comment.
Response checks should not silently depend on request extraction succeeding.
on_response_body() will continue when the original request body is missing or cannot be parsed, but _call_nemoguard() then returns False as soon as no user message is present. In practice, that makes response.enabled=true bypass moderation for any response whose paired request body is unavailable or uses a different shape than request.jsonPath. Please either support a response-only prompt here or route this case through the configured error-handling path instead of treating it as safe.
Also applies to: 467-470
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@policies/nvidia-nemoguard-content-safety/src/nvidia_nemoguard_content_safety_v0/policy.py`
around lines 229 - 237, The response flow currently treats missing or unparsable
request bodies as safe because on_response_body() only adds user_text from
res_ctx.request_body and then _call_nemoguard() returns False if no user message
exists; change this by detecting when request extraction
(res_ctx.request_body/res_ctx.request_body.content ->
req_params.request.json_path -> _resolve_jsonpath) fails and either (A)
construct a response-only moderation payload from the response content and
append it to messages before calling _call_nemoguard(), or (B) invoke the
configured error-handling path instead of proceeding as safe; ensure the call
site in on_response_body() still invokes _call_nemoguard(messages, ...) even
when request_text is absent, and apply the same fix to the similar block around
the code referenced at lines 467-470 so responses cannot bypass moderation when
request extraction fails.
| def install_dependency_stubs() -> None: | ||
| sdk_module = types.ModuleType("apip_sdk_core") | ||
| sdk_module.BodyProcessingMode = BodyProcessingMode | ||
| sdk_module.ProcessingMode = ProcessingMode | ||
| sdk_module.RequestPolicy = RequestPolicy | ||
| sdk_module.ResponsePolicy = ResponsePolicy | ||
| sdk_module.UpstreamRequestModifications = UpstreamRequestModifications | ||
| sdk_module.DownstreamResponseModifications = DownstreamResponseModifications | ||
| sdk_module.ImmediateResponse = ImmediateResponse | ||
| sys.modules["apip_sdk_core"] = sdk_module | ||
|
|
||
| requests_module = types.ModuleType("requests") | ||
| requests_module.post = FakeRequests.post | ||
| sys.modules["requests"] = requests_module | ||
|
|
||
|
|
||
| def load_policy_module(): | ||
| install_dependency_stubs() | ||
| src_dir = Path(__file__).resolve().parent.parent / "src" | ||
| if str(src_dir) not in sys.path: | ||
| sys.path.insert(0, str(src_dir)) | ||
| sys.modules.pop("nvidia_nemoguard_content_safety_v0", None) | ||
| sys.modules.pop("nvidia_nemoguard_content_safety_v0.policy", None) | ||
| return importlib.import_module("nvidia_nemoguard_content_safety_v0.policy") |
There was a problem hiding this comment.
Scope the dependency stubs to this test module.
These helpers replace sys.modules["requests"] / sys.modules["apip_sdk_core"] and prepend to sys.path, but nothing restores the original process state. Because policy = load_policy_module() runs at import time, later tests can end up importing the fake modules or resolving from the added src path. Please move the load into setup/teardown that restores sys.modules and sys.path, or keep the overrides inside a scoped patch.
Also applies to: 126-126
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@policies/nvidia-nemoguard-content-safety/tests/test_policy.py` around lines
100 - 123, The test currently installs global stubs via
install_dependency_stubs() and mutates sys.path in load_policy_module(), leaving
requests/apip_sdk_core and the added src path in the process state; change this
so the stubs and path changes are scoped and cleaned up: move the call to
load_policy_module() out of module import-time and into test setup and teardown
(or use a context manager/pytest fixture) that (1) saves original sys.modules
entries for "requests" and "apip_sdk_core", (2) calls
install_dependency_stubs(), (3) inserts src_dir into sys.path, (4) imports the
policy via importlib.import_module("nvidia_nemoguard_content_safety_v0.policy"),
and finally (5) restores sys.modules (replacing or deleting the fake entries)
and restores sys.path to its prior state in teardown; update references to
install_dependency_stubs and load_policy_module accordingly so no global process
state is left modified after each test.
| response = http_client.post( | ||
| f"{params.endpoint}/v1/chat/completions", | ||
| headers=headers, | ||
| json=payload, | ||
| timeout=params.timeout, | ||
| ) | ||
| response.raise_for_status() |
There was a problem hiding this comment.
Are we handling any Python-specific logic here? Otherwise, let's go with a Go policy.
cc: @Arshardh
| @@ -0,0 +1,572 @@ | |||
| # Copyright (c) 2025, WSO2 LLC. (https://www.wso2.com). | |||
There was a problem hiding this comment.
| # Copyright (c) 2025, WSO2 LLC. (https://www.wso2.com). | |
| # Copyright (c) 2026, WSO2 LLC. (https://www.wso2.com). |
There was a problem hiding this comment.
Fixed in latest commit
| Validates request and/or response content using NVIDIA NeMo Guard | ||
| (llama-3.1-nemoguard-8b-content-safety). Request bodies are checked before | ||
| they reach the upstream LLM; response bodies are checked before they are | ||
| delivered to the client. Unsafe requests are rejected with a configurable | ||
| status code; unsafe responses are replaced with a sanitised error message. |
There was a problem hiding this comment.
We can add new lines until a sentence goes to 120 characters.
There was a problem hiding this comment.
Fixed in latest commit
Purpose
Adds a new AI gateway policy —
nvidia-nemoguard-content-safety— to protect LLM-backed APIs by screening both request and response content using NVIDIA NeMo Guard (llama-3.1-nemoguard-8b-content-safety). This addresses the need for broad, multi-category content safety enforcement at the gateway layer without requiring changes to the upstream LLM service.Goals
granite-guardian-prompt-injectionPython policy.Approach
Implemented as a Python policy package using
apip_sdk_core. The policy classNemoGuardContentSafetyPolicyimplements bothRequestPolicyandResponsePolicyand buffers both request and response bodies. On each call it:/v1/chat/completions).Key design decisions:
mode()always buffers both; the response handler returns passthrough immediately whenresponse.enabled=false.SystemParams,PhaseParams,RequestParams) for parameter handling._blocked_codes()maps per-category boolean toggles to an S-codefrozenset;Nonemeans block all categories._coerce_bool()and_coerce_int_in_range()prevent string"false"/"true"from being misinterpreted._resolve_jsonpath()includestry/exceptguard for malformed bracket expressions.User stories
Release note
Added
nvidia-nemoguard-content-safetypolicy (v0.1.0): a Python gateway policy that validates request and/or response content using NVIDIA NeMo Guard (llama-3.1-nemoguard-8b-content-safety). Supports 23 configurable safety categories, per-phase JSONPath targeting, confidence-independent verdict blocking, and both fail-open and fail-closed error handling.Documentation
Added policy documentation at
docs/nvidia-nemoguard-content-safety/v0.1/:metadata.jsondocs/nvidia-nemoguard-content-safety.md— covers Overview, Features, Configuration (system + per-phase user parameters, safety category reference table,build.yamlintegration), and three Reference Scenarios.Training
N/A
Certification
N/A - This PR introduces a new standalone gateway policy package and does not currently impact existing certification exam content.
Marketing
N/A
Automation tests
Security checks
Samples
See Reference Scenarios in the policy documentation:
Related PRs
feat/granite-guardian-prompt-injection— the existing Python policy this one is modelled after.Migrations (if applicable)
N/A
Test environment
python -m unittest)Learning