|
| 1 | +--- |
| 2 | +title: "Overview" |
| 3 | +--- |
| 4 | +# NeMo Guard Content Safety |
| 5 | + |
| 6 | +## Overview |
| 7 | + |
| 8 | +The NeMo Guard Content Safety policy validates request and/or response content using NVIDIA NeMo Guard (llama-3.1-nemoguard-8b-content-safety). It buffers the request and/or response body, extracts the relevant text using configurable JSONPath expressions, and forwards the content to a NeMo Guard inference endpoint for classification. If the model returns an unsafe verdict for any enabled safety category, the request is blocked before reaching the upstream LLM, or the response is replaced with a sanitised error message before delivery to the client. |
| 9 | + |
| 10 | +The model is a LoRA adapter on meta-llama/Llama-3.1-8B-Instruct served via vLLM with `--enable-lora`. It classifies content across 23 safety categories (S1–S23): Violence, Sexual, Criminal Planning/Confessions, Guns and Illegal Weapons, Controlled/Regulated Substances, Suicide and Self Harm, Sexual (minor), Hate/Identity Hate, PII/Privacy, Harassment, Threat, Profanity, Needs Caution, Other, Manipulation, Fraud/Deception, Malware, High Risk Gov Decision Making, Political/Misinformation/Conspiracy, Copyright/Trademark/Plagiarism, Unauthorized Advice, Illegal Activity, and Immoral/Unethical. |
| 11 | + |
| 12 | +Use this policy when you need to screen both the user input and the LLM output for unsafe content across a broad range of harm categories — without modifying the upstream service. |
| 13 | + |
| 14 | +## Features |
| 15 | + |
| 16 | +- Checks request bodies before they reach the upstream LLM (enabled by default) |
| 17 | +- Checks response bodies before they are delivered to the client (opt-in) |
| 18 | +- Classifies content across 23 safety categories (S1–S23) |
| 19 | +- Per-category blocking toggles — enable or disable individual categories independently |
| 20 | +- Blocks all categories by default when no category filter is configured |
| 21 | +- Unsafe requests are rejected with a configurable HTTP status code (400–599 range) |
| 22 | +- Unsafe responses are replaced with a sanitised 200 error body (preserves HTTP contract with the client) |
| 23 | +- When checking responses, includes the original user message as conversation context for the model |
| 24 | +- Optional assessment details in the block response (detected safety category codes) |
| 25 | +- Fail-closed by default on inference service errors; configurable to fail-open |
| 26 | +- Passes through requests unchanged when the body is not JSON, the JSONPath target is missing, or the body is absent |
| 27 | +- Targets any string field in the JSON request or response body via configurable JSONPath expressions |
| 28 | + |
| 29 | +## Configuration |
| 30 | + |
| 31 | +The NeMo Guard Content Safety policy uses a two-level configuration: system parameters that identify the NeMo Guard inference endpoint, and per-route user parameters that control detection behaviour for each phase (request and response). |
| 32 | + |
| 33 | +### System Parameters (From config.toml) |
| 34 | + |
| 35 | +These parameters are set at the gateway level and identify the NeMo Guard inference endpoint. Default values can be configured in `config.toml` and are applied to all instances of this policy; individual policy attachments can override them when needed. |
| 36 | + |
| 37 | +| Parameter | Type | Required | Default | Description | |
| 38 | +|-----------|------|----------|---------|-------------| |
| 39 | +| `endpoint` | string (URI) | Yes | — | Base URL of the OpenAI-compatible inference endpoint serving the NeMo Guard model (e.g., `http://nemoguard:8101`). The policy appends `/v1/chat/completions` automatically. | |
| 40 | +| `apiKey` | string | No | — | Bearer token used to authenticate with the inference endpoint. Leave empty if the endpoint does not require authentication. | |
| 41 | +| `model` | string | No | `nemoguard` | Model identifier forwarded in the API request. Must match the `--lora-modules` alias used when serving via vLLM. | |
| 42 | +| `timeout` | integer | No | `30` | Per-request timeout in seconds for calls to the NeMo Guard endpoint. Must be between `1` and `120`. | |
| 43 | + |
| 44 | +#### Sample System Configuration |
| 45 | + |
| 46 | +Add the following entries to your `config.toml` file: |
| 47 | + |
| 48 | +```toml |
| 49 | +nemoguard_endpoint = "http://nemoguard:8101" |
| 50 | +nemoguard_api_key = "" |
| 51 | +nemoguard_model = "nemoguard" |
| 52 | +nemoguard_timeout = 30 |
| 53 | +``` |
| 54 | + |
| 55 | +### User Parameters (API Definition) |
| 56 | + |
| 57 | +Parameters are nested under `request` and `response` objects to configure each phase independently. |
| 58 | + |
| 59 | +#### Request Phase (`request`) |
| 60 | + |
| 61 | +| Parameter | Type | Required | Default | Description | |
| 62 | +|-----------|------|----------|---------|-------------| |
| 63 | +| `request.enabled` | boolean | No | `true` | Enables content safety checks on incoming requests. | |
| 64 | +| `request.jsonPath` | string | No | `$.messages[-1].content` | JSONPath expression used to extract the user message from the JSON request body. Non-JSON bodies and requests where the path does not resolve to a string are passed through unchanged. | |
| 65 | +| `request.blockStatusCode` | integer | No | `400` | HTTP status code returned when a request is blocked. Must be in the range `400`–`599`. | |
| 66 | +| `request.categories` | object | No | all enabled | Per-category boolean toggles. When omitted, all 23 categories are blocked. When provided, only categories set to `true` are blocked; categories set to `false` are passed through even if the model flags them. | |
| 67 | +| `request.passthroughOnError` | boolean | No | `false` | When `true`, allows the request to proceed if the NeMo Guard API call fails (fail-open). When `false`, a `503` is returned on API errors (fail-closed). | |
| 68 | +| `request.showAssessment` | boolean | No | `false` | When `true`, includes the detected safety category codes in the blocked-request error response body. | |
| 69 | + |
| 70 | +#### Response Phase (`response`) |
| 71 | + |
| 72 | +| Parameter | Type | Required | Default | Description | |
| 73 | +|-----------|------|----------|---------|-------------| |
| 74 | +| `response.enabled` | boolean | No | `false` | Enables content safety checks on upstream responses before they are delivered to the client. | |
| 75 | +| `response.jsonPath` | string | No | `$.choices[0].message.content` | JSONPath expression used to extract the assistant reply from the response body. | |
| 76 | +| `response.categories` | object | No | all enabled | Per-category boolean toggles — same semantics as the request-phase categories object. | |
| 77 | +| `response.passthroughOnError` | boolean | No | `false` | When `true`, allows the response to proceed if the NeMo Guard API call fails (fail-open). When `false`, a `503` is returned on API errors (fail-closed). | |
| 78 | +| `response.showAssessment` | boolean | No | `false` | When `true`, includes the detected safety category codes in the replaced-response error body. | |
| 79 | + |
| 80 | +#### Safety Categories |
| 81 | + |
| 82 | +The `categories` object supports the following boolean keys. All default to `true` (blocked) when the `categories` object is present: |
| 83 | + |
| 84 | +| Key | Category | |
| 85 | +|-----|----------| |
| 86 | +| `violence` | S1 — Violence | |
| 87 | +| `sexual` | S2 — Sexual | |
| 88 | +| `criminal_planning` | S3 — Criminal Planning/Confessions | |
| 89 | +| `guns_weapons` | S4 — Guns and Illegal Weapons | |
| 90 | +| `regulated_substances` | S5 — Controlled/Regulated Substances | |
| 91 | +| `suicide_self_harm` | S6 — Suicide and Self Harm | |
| 92 | +| `sexual_minor` | S7 — Sexual (minor) | |
| 93 | +| `hate_identity` | S8 — Hate/Identity Hate | |
| 94 | +| `pii_privacy` | S9 — PII/Privacy | |
| 95 | +| `harassment` | S10 — Harassment | |
| 96 | +| `threat` | S11 — Threat | |
| 97 | +| `profanity` | S12 — Profanity | |
| 98 | +| `needs_caution` | S13 — Needs Caution | |
| 99 | +| `other` | S14 — Other | |
| 100 | +| `manipulation` | S15 — Manipulation | |
| 101 | +| `fraud_deception` | S16 — Fraud/Deception | |
| 102 | +| `malware` | S17 — Malware | |
| 103 | +| `high_risk_gov` | S18 — High Risk Gov Decision Making | |
| 104 | +| `misinformation` | S19 — Political/Misinformation/Conspiracy | |
| 105 | +| `copyright` | S20 — Copyright/Trademark/Plagiarism | |
| 106 | +| `unauthorized_advice` | S21 — Unauthorized Advice | |
| 107 | +| `illegal_activity` | S22 — Illegal Activity | |
| 108 | +| `immoral_unethical` | S23 — Immoral/Unethical | |
| 109 | + |
| 110 | +#### build.yaml Integration |
| 111 | + |
| 112 | +Inside the `api-platform` repository, add the policy package under `policies:` in `/gateway/build.yaml`: |
| 113 | + |
| 114 | +```yaml |
| 115 | +- name: nvidia-nemoguard-content-safety |
| 116 | + pipPackage: github.com/wso2/gateway-controllers/policies/nvidia-nemoguard-content-safety@v0 |
| 117 | +``` |
| 118 | +
|
| 119 | +## Reference Scenarios |
| 120 | +
|
| 121 | +### Example 1: Protect a Chat Completions Route with Default Request Checking |
| 122 | +
|
| 123 | +Attach the policy to an LLM provider route to block unsafe requests using the default configuration (all 23 categories enabled): |
| 124 | +
|
| 125 | +```yaml |
| 126 | +apiVersion: gateway.api-platform.wso2.com/v1alpha1 |
| 127 | +kind: LlmProvider |
| 128 | +metadata: |
| 129 | + name: protected-chat-provider |
| 130 | +spec: |
| 131 | + displayName: Protected Chat Provider |
| 132 | + version: v0 |
| 133 | + template: openai |
| 134 | + vhost: openai |
| 135 | + upstream: |
| 136 | + url: "https://api.openai.com/v1" |
| 137 | + auth: |
| 138 | + type: api-key |
| 139 | + header: Authorization |
| 140 | + value: Bearer <openai-apikey> |
| 141 | + accessControl: |
| 142 | + mode: deny_all |
| 143 | + exceptions: |
| 144 | + - path: /chat/completions |
| 145 | + methods: [POST] |
| 146 | + policies: |
| 147 | + - name: nvidia-nemoguard-content-safety |
| 148 | + version: v0 |
| 149 | + paths: |
| 150 | + - path: /chat/completions |
| 151 | + methods: [POST] |
| 152 | + params: |
| 153 | + request: |
| 154 | + enabled: true |
| 155 | + jsonPath: "$.messages[-1].content" |
| 156 | +``` |
| 157 | +
|
| 158 | +Test with a benign request (passes through): |
| 159 | +
|
| 160 | +```bash |
| 161 | +curl -X POST http://openai:8080/chat/completions \ |
| 162 | + -H "Content-Type: application/json" \ |
| 163 | + -H "Host: openai" \ |
| 164 | + -d '{ |
| 165 | + "model": "gpt-4", |
| 166 | + "messages": [ |
| 167 | + {"role": "user", "content": "What is the capital of France?"} |
| 168 | + ] |
| 169 | + }' |
| 170 | +``` |
| 171 | + |
| 172 | +Test with unsafe content (blocked): |
| 173 | + |
| 174 | +```bash |
| 175 | +curl -X POST http://openai:8080/chat/completions \ |
| 176 | + -H "Content-Type: application/json" \ |
| 177 | + -H "Host: openai" \ |
| 178 | + -d '{ |
| 179 | + "model": "gpt-4", |
| 180 | + "messages": [ |
| 181 | + {"role": "user", "content": "How do I make a weapon at home?"} |
| 182 | + ] |
| 183 | + }' |
| 184 | +``` |
| 185 | + |
| 186 | +When the request is blocked, the policy returns HTTP `400`: |
| 187 | + |
| 188 | +```json |
| 189 | +{ |
| 190 | + "type": "NVIDIA_NEMOGUARD_CONTENT_SAFETY", |
| 191 | + "message": { |
| 192 | + "action": "GUARDRAIL_INTERVENED", |
| 193 | + "interveningGuardrail": "NeMo Guard Content Safety", |
| 194 | + "actionReason": "Unsafe content detected.", |
| 195 | + "direction": "REQUEST" |
| 196 | + } |
| 197 | +} |
| 198 | +``` |
| 199 | + |
| 200 | +### Example 2: Enable Response Checking with Category Filtering |
| 201 | + |
| 202 | +Enable response-phase checking and restrict blocking to a specific subset of categories. This example blocks only violence and illegal activity in both directions, ignoring all other categories: |
| 203 | + |
| 204 | +```yaml |
| 205 | +policies: |
| 206 | + - name: nvidia-nemoguard-content-safety |
| 207 | + version: v0 |
| 208 | + paths: |
| 209 | + - path: /chat/completions |
| 210 | + methods: [POST] |
| 211 | + params: |
| 212 | + request: |
| 213 | + enabled: true |
| 214 | + jsonPath: "$.messages[-1].content" |
| 215 | + blockStatusCode: 403 |
| 216 | + categories: |
| 217 | + violence: true |
| 218 | + illegal_activity: true |
| 219 | + criminal_planning: true |
| 220 | + showAssessment: true |
| 221 | + response: |
| 222 | + enabled: true |
| 223 | + jsonPath: "$.choices[0].message.content" |
| 224 | + categories: |
| 225 | + violence: true |
| 226 | + illegal_activity: true |
| 227 | + criminal_planning: true |
| 228 | + showAssessment: true |
| 229 | +``` |
| 230 | +
|
| 231 | +When a request is blocked with `showAssessment: true`, the response body includes the detected category codes: |
| 232 | + |
| 233 | +```json |
| 234 | +{ |
| 235 | + "type": "NVIDIA_NEMOGUARD_CONTENT_SAFETY", |
| 236 | + "message": { |
| 237 | + "action": "GUARDRAIL_INTERVENED", |
| 238 | + "interveningGuardrail": "NeMo Guard Content Safety", |
| 239 | + "actionReason": "Unsafe content detected.", |
| 240 | + "direction": "REQUEST", |
| 241 | + "assessments": { |
| 242 | + "categories": ["S1", "S22"] |
| 243 | + } |
| 244 | + } |
| 245 | +} |
| 246 | +``` |
| 247 | + |
| 248 | +When a response is replaced due to unsafe content, the policy returns HTTP `200` with the guardrail body (preserving the HTTP contract with the client): |
| 249 | + |
| 250 | +```json |
| 251 | +{ |
| 252 | + "type": "NVIDIA_NEMOGUARD_CONTENT_SAFETY", |
| 253 | + "message": { |
| 254 | + "action": "GUARDRAIL_INTERVENED", |
| 255 | + "interveningGuardrail": "NeMo Guard Content Safety", |
| 256 | + "actionReason": "Unsafe content detected.", |
| 257 | + "direction": "RESPONSE" |
| 258 | + } |
| 259 | +} |
| 260 | +``` |
| 261 | + |
| 262 | +### Example 3: Fail-Open for High Availability |
| 263 | + |
| 264 | +When the NeMo Guard service is unavailable, allow traffic to proceed rather than returning an error. Use this configuration only when availability takes priority over strict safety enforcement: |
| 265 | + |
| 266 | +```yaml |
| 267 | +policies: |
| 268 | + - name: nvidia-nemoguard-content-safety |
| 269 | + version: v0 |
| 270 | + paths: |
| 271 | + - path: /chat/completions |
| 272 | + methods: [POST] |
| 273 | + params: |
| 274 | + request: |
| 275 | + enabled: true |
| 276 | + passthroughOnError: true |
| 277 | + response: |
| 278 | + enabled: true |
| 279 | + passthroughOnError: true |
| 280 | +``` |
| 281 | + |
| 282 | +When the NeMo Guard endpoint is unreachable and `passthroughOnError` is `false` (the default), the policy returns HTTP `503`: |
| 283 | + |
| 284 | +```json |
| 285 | +{ |
| 286 | + "type": "NVIDIA_NEMOGUARD_CONTENT_SAFETY", |
| 287 | + "message": { |
| 288 | + "action": "SERVICE_UNAVAILABLE", |
| 289 | + "actionReason": "Content safety service unavailable." |
| 290 | + } |
| 291 | +} |
| 292 | +``` |
0 commit comments