Skip to content

Commit f2e7beb

Browse files
authored
docs: document release notes for 0.21 and additional details (#1726)
1 parent c390e89 commit f2e7beb

6 files changed

Lines changed: 183 additions & 93 deletions

File tree

docs/about/release-notes.md

Lines changed: 107 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -25,104 +25,132 @@ For a complete record of changes in a release, refer to the
2525

2626
---
2727

28-
(v0-20-0)=
28+
(v0-21-0)=
2929

30-
## 0.20.0
30+
## 0.21.0
3131

32-
(v0-20-0-features)=
32+
(v0-21-0-features)=
3333

3434
### Key Features
3535

36-
- Added support for multilingual content safety models such as [NVIDIA Nemotron Safety Guard 8B v3](https://build.nvidia.com/nvidia/llama-3_1-nemotron-safety-guard-8b-v3). This feature uses the [fast-langdetect package](https://github.com/LlmKira/fast-langdetect) to detect the user's input language and return refusal messages in the appropriate language. To use this feature, install the NeMo Guardrails library with the `multilingual` extra.
36+
- Added the `IORails` class, a new optimized execution engine that runs NemoGuard input and output rails, such as
37+
content-safety, topic-safety, and jailbreak detection, in parallel. The engine is opt-in:
38+
set `NEMO_GUARDRAILS_IORAILS_ENGINE=1` to enable it. When enabled, the configuration is
39+
validated for compatibility and falls back to LLMRails if unsupported flows are detected.
40+
For more information, refer to [](../configure-rails/yaml-schema/guardrails-configuration/parallel-rails.md#iorails-engine).
3741

38-
```bash
39-
pip install nemoguardrails[multilingual]
40-
```
42+
- Added the `check_async()` and `check()` methods on `LLMRails` to enable validating messages against input and output rails without triggering full LLM generation.
43+
Returns a `RailsResult` with `PASSED`, `MODIFIED`, or `BLOCKED` status.
44+
For more information, refer to [](../run-rails/using-python-apis/check-messages.md).
4145

42-
- Added support for configuring custom refusal messages per language to complement multilingual content safety models. You can enable multilingual refusal messages and specify custom refusal messages in the `rails.config.content_safety` section of the `config.yml` file.
43-
44-
```yaml
45-
rails:
46-
config:
47-
content_safety:
48-
multilingual:
49-
enabled: true
50-
refusal_messages:
51-
en: "Sorry, I cannot help with that request."
52-
es: "Lo siento, no puedo ayudar con esa solicitud."
53-
zh: "抱歉,我无法处理该请求。"
54-
# Add other languages as needed
55-
```
46+
- The guardrails server now exposes a fully OpenAI-compatible
47+
REST API. The `/v1/chat/completions` endpoint accepts standard `ChatCompletion` requests with a
48+
`guardrails` field for config selection. A new `/v1/models` endpoint lists available models from the
49+
configured provider. The `openai` package is now a required component of the optional `server` extra ([#1623](https://github.com/NVIDIA-NeMo/Guardrails/pull/1623)).
50+
For more information, refer to [](../run-rails/using-fastapi-server/overview.md).
51+
52+
- Added the `GuardrailsMiddleware` class, a new middleware that integrates with
53+
LangChain's Agent Middleware protocol, applying input and output rail checks before and after
54+
every model call in the agent loop. It includes the `InputRailsMiddleware` and `OutputRailsMiddleware`
55+
convenience subclasses.
56+
For more information, refer to [](../integration/langchain/agent-middleware.md).
57+
58+
- Added three new community rails:
59+
[PolicyAI](../configure-rails/guardrail-catalog/community/policyai.md) for policy-based content moderation,
60+
[CrowdStrike AIDR](../configure-rails/guardrail-catalog/community/crowdstrike-aidr.md) for AI-powered detection and response, and
61+
[Regex Detection](../configure-rails/guardrail-catalog/community/regex.md) for pattern-based content filtering on input, output, and retrieval.
62+
63+
- Jailbreak detection configuration is now validated at
64+
create-time. Invalid thresholds and malformed URLs raise errors immediately.
65+
For more information, refer to [](../configure-rails/guardrail-catalog/jailbreak-protection.md#configuration-validation).
5666

57-
For more information, refer to [](../configure-rails/guardrail-catalog/content-safety.md#multilingual-refusal-messages).
58-
- Added support for [NVIDIA GLiNER-PII](https://huggingface.co/nvidia/gliner-PII) for detecting entities such as names, email addresses, phone numbers, social security numbers, and more. For more information, refer to [](../configure-rails/guardrail-catalog/community/gliner.md).
67+
- Embedding indexes are now initialized lazily.
68+
FastEmbed models are only downloaded when semantic search is needed, reducing startup time for
69+
configurations that use only input and output rails.
70+
71+
(v0-21-0-breaking-changes)=
5972

6073
### Breaking Changes
6174

62-
- A breaking change removes redundant streaming configuration for output rails. Prior to the change, streaming had to be enabled in two places: `streaming` and `rails.output.streaming.enabled`. This change removes the top-level `streaming` configuration.
63-
- Example `config.yml` before the change:
64-
65-
```{code-block} yaml
66-
:emphasize-lines: 21
67-
68-
models:
69-
- type: main
70-
engine: nvidia_ai_endpoints
71-
model: meta/llama-3.3-70b-instruct
72-
- type: content_safety
73-
engine: nvidia_ai_endpoints
74-
model: nvidia/llama-3.1-nemoguard-8b-content-safety
75-
76-
rails:
77-
input:
78-
flows:
79-
- content safety check input $model=content_safety
80-
output:
81-
flows:
82-
- content safety check output $model=content_safety
83-
streaming:
84-
enabled: True
85-
chunk_size: 200
86-
context_size: 50
87-
88-
streaming: True # No longer needed starting from v0.20.0
89-
```
90-
91-
- Example `config.yml` after the change:
92-
93-
```yaml
94-
models:
95-
- type: main
96-
engine: nvidia_ai_endpoints
97-
model: meta/llama-3.3-70b-instruct
98-
99-
- type: content_safety
100-
engine: nvidia_ai_endpoints
101-
model: nvidia/llama-3.1-nemoguard-8b-content-safety
102-
103-
rails:
104-
input:
105-
flows:
106-
- content safety check input $model=content_safety
107-
output:
108-
flows:
109-
- content safety check output $model=content_safety
110-
streaming:
111-
enabled: True
112-
chunk_size: 200
113-
context_size: 50
114-
```
115-
116-
For more information, refer to [](../run-rails/using-python-apis/streaming.md).
75+
- Streaming metadata parameter renamed. The `include_generation_metadata` parameter on
76+
`LLMRails.stream_async()` and `StreamingHandler` is deprecated in favor of `include_metadata`.
77+
The `generation_info` field in streaming chunk dicts is renamed to `metadata`.
78+
The deprecated parameter still works and emits a `DeprecationWarning`.
79+
80+
```python
81+
# Before (deprecated)
82+
async for chunk in rails.stream_async(messages=messages, include_generation_metadata=True):
83+
info = chunk["generation_info"]
84+
85+
# After
86+
async for chunk in rails.stream_async(messages=messages, include_metadata=True):
87+
info = chunk["metadata"]
88+
```
89+
90+
- `StreamingHandler` no longer inherits from LangChain `AsyncCallbackHandler`.
91+
Streaming now uses `llm.astream()` with direct `push_chunk()` calls.
92+
If your code depends on `StreamingHandler` as a LangChain callback, update it to use the
93+
new `push_chunk()` interface.
94+
95+
- Removed the `stream_usage` parameter. The `stream_usage=True` parameter is no longer
96+
automatically added to LLM call kwargs. Streaming metadata is now captured through
97+
`response_metadata` and `usage_metadata` on final chunks.
98+
99+
- Server request and response format changed. The `/v1/chat/completions` endpoint now uses
100+
OpenAI-compatible request and response schemas. The previous `RequestBody` and `ResponseBody`
101+
classes are removed. For the new format, refer to
102+
[](../run-rails/using-fastapi-server/overview.md).
103+
104+
- ChatNVIDIA streaming patch removed. The custom
105+
`_langchain_nvidia_ai_endpoints_patch.py` module is removed.
106+
The standard `ChatNVIDIA` from `langchain_nvidia_ai_endpoints` is used directly.
107+
108+
(v0-21-0-bug-fixes)=
109+
110+
### Bug Fixes
111+
112+
- Fixed a naming mismatch where the `generate_next_step` action did not match the
113+
`generate_next_steps` task enum value, which prevented task-specific LLM configuration
114+
from working correctly ([#1603](https://github.com/NVIDIA-NeMo/Guardrails/pull/1603)).
115+
- Added the `valid` alias to action results in the GuardrailsAI integration so that
116+
Colang flows checking `$result["valid"]` work as expected ([#1611](https://github.com/NVIDIA-NeMo/Guardrails/pull/1611)).
117+
- Filtered the `stop` parameter for OpenAI reasoning models (such as GPT-5) that do not
118+
accept it, preventing `400` errors during dialogue rail execution ([#1653](https://github.com/NVIDIA-NeMo/Guardrails/pull/1653)).
119+
- Fixed GLiNER PII detection to use "bot refuse to respond" instead of
120+
"bot inform answer unknown", which returned a misleading "I don't know" message ([#1671](https://github.com/NVIDIA-NeMo/Guardrails/pull/1671)).
121+
- Fixed a `TypeError` when `stop=None` is passed to `StreamingHandler` by coercing
122+
`None` to an empty list ([#1685](https://github.com/NVIDIA-NeMo/Guardrails/pull/1685)).
123+
- Fixed a `TypeError` in `RollingBuffer.format_chunks` when `include_metadata=True` is used
124+
with output rail streaming enabled. Dict chunks are now normalized to strings at the
125+
input boundary ([#1687](https://github.com/NVIDIA-NeMo/Guardrails/pull/1687)).
126+
- Fixed `GuardrailsMiddleware` silently dropping content when rails return `MODIFIED` status.
127+
Input rails now replace the last user message and output rails replace the last AI
128+
message with the sanitized content ([#1714](https://github.com/NVIDIA-NeMo/Guardrails/pull/1714)).
129+
- Cache hit statistics are now visible in the Stats log line. Cache stats are also
130+
visible in verbose mode ([#1666](https://github.com/NVIDIA-NeMo/Guardrails/pull/1666), [#1667](https://github.com/NVIDIA-NeMo/Guardrails/pull/1667)).
131+
132+
(v0-21-0-other-changes)=
117133

118134
### Other Changes
119135

120-
- Restructured the documentation with improved navigation, clearer content organization, and updated configuration reference and user guides.
136+
- Updated the Fiddler Guardrails API to match the new specification: the `prompt` field is
137+
renamed to `input`, faithfulness uses strings instead of lists, and a new `fdl_roleplaying`
138+
category is added ([#1619](https://github.com/NVIDIA-NeMo/Guardrails/pull/1619)).
139+
- Updated the Trend Micro Vision One AI Guard integration from the beta endpoint to the
140+
officially released GA endpoint. A required `TMV1-Application-Name` header is added and the
141+
request key is changed from `guard` to `prompt` ([#1546](https://github.com/NVIDIA-NeMo/Guardrails/pull/1546)).
142+
- Added a Locust stress-test benchmark for load testing ([#1629](https://github.com/NVIDIA-NeMo/Guardrails/pull/1629)).
143+
- Removed the `multi_kb` example ([#1673](https://github.com/NVIDIA-NeMo/Guardrails/pull/1673)).
144+
- Removed the AI Virtual Assistant Blueprint notebook ([#1682](https://github.com/NVIDIA-NeMo/Guardrails/pull/1682)).
145+
- Updated the Pangea User-Agent repo URL ([#1610](https://github.com/NVIDIA-NeMo/Guardrails/pull/1610)).
146+
- Updated dependencies for the jailbreak detection Docker container ([#1596](https://github.com/NVIDIA-NeMo/Guardrails/pull/1596)).
147+
- Major documentation revamp with improved structure and navigation.
121148

122149
---
123150

124151
## Previous Release Notes
125152

153+
- [0.20.0](https://docs.nvidia.com/nemo/guardrails/0.20.0/release-notes.html)
126154
- [0.19.0](https://docs.nvidia.com/nemo/guardrails/0.19.0/release-notes.html)
127155
- [0.18.0](https://docs.nvidia.com/nemo/guardrails/0.18.0/release-notes.html)
128156
- [0.17.0](https://docs.nvidia.com/nemo/guardrails/0.17.0/release-notes.html)

docs/configure-rails/guardrail-catalog/jailbreak-protection.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,18 @@ rails:
4848
If the `server_endpoint` parameter is not set, the checks will run in-process. This is useful for TESTING PURPOSES ONLY and **IS NOT RECOMMENDED FOR PRODUCTION DEPLOYMENTS**.
4949
```
5050

51+
### Configuration Validation
52+
53+
The jailbreak detection configuration is validated at create-time. Invalid values raise errors
54+
immediately instead of failing silently at runtime. The following validation rules apply:
55+
56+
| Parameter | Rule |
57+
|-----------|------|
58+
| `length_per_perplexity_threshold` | Must be greater than 0 |
59+
| `prefix_suffix_perplexity_threshold` | Must be greater than 0 |
60+
| `nim_base_url` | Must start with `http://` or `https://` |
61+
| `server_endpoint` | Must start with `http://` or `https://` |
62+
5163
### Heuristics
5264

5365
#### Length per Perplexity

docs/configure-rails/yaml-schema/guardrails-configuration/parallel-rails.md

Lines changed: 51 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,24 +7,70 @@ description: Configure input and output rails to run in parallel for improved la
77

88
You can configure input and output rails to run in parallel. This can improve latency and throughput.
99

10-
## When to Use Parallel Rails Execution
10+
## IORails Engine
1111

12-
Use parallel execution:
12+
The IORails engine is an optimized execution engine that runs NemoGuard input and output rails in
13+
parallel with dedicated model management. The IORails engine is an opt-in feature. By default, the
14+
NeMo Guardrails library uses the LLMRails engine.
15+
16+
:::{note}
17+
IORails is an early-release feature and currently does not support streaming, reasoning models, and telemetry as in LLMRails.
18+
:::
19+
20+
### Supported Flows
21+
22+
The IORails engine supports the following flows:
23+
24+
- `content safety check input` / `content safety check output`
25+
- `topic safety check input`
26+
- `jailbreak detection model`
27+
28+
When IORails is enabled and the configuration uses only these flows, the engine runs them in parallel.
29+
Configurations that include custom flows, dialog rails, or other unsupported flows
30+
raise an error at initialization.
31+
32+
### Enabling IORails
33+
34+
To enable the IORails engine, set the `NEMO_GUARDRAILS_IORAILS_ENGINE` environment variable to `1`:
35+
36+
```bash
37+
NEMO_GUARDRAILS_IORAILS_ENGINE=1 nemoguardrails chat --config examples/configs/content_safety
38+
```
39+
40+
When using the Python API, import the `Guardrails` class directly and pass `use_iorails=True`:
41+
42+
```python
43+
from nemoguardrails import RailsConfig
44+
from nemoguardrails.guardrails.guardrails import Guardrails
45+
46+
config = RailsConfig.from_path("./config")
47+
guardrails = Guardrails(config, use_iorails=True)
48+
```
49+
50+
## YAML-Based Parallel Execution
51+
52+
You can also configure existing LLMRails flows to run in parallel using the `parallel: True`
53+
option in the `config.yml` file. This approach works with any flow type and does not require
54+
the IORails engine.
55+
56+
### When to Use
57+
58+
Use YAML-based parallel execution:
1359

1460
- For I/O-bound rails such as external API calls to LLMs or third-party integrations.
1561
- If you have two or more independent input or output rails without shared state dependencies.
1662
- In production environments where response latency affects user experience and business metrics.
1763

18-
## When Not to Use Parallel Rails Execution
64+
### When Not to Use
1965

2066
Avoid parallel execution:
2167

2268
- For CPU-bound rails; it might not improve performance and can introduce overhead.
2369
- During development and testing for debugging and simpler workflows.
2470

25-
## Configuration Example
71+
### Configuration Example
2672

27-
To enable parallel execution, set `parallel: True` in the `rails.input` and `rails.output` sections in the `config.yml` file. The following configuration example is tested by NVIDIA and shows how to enable parallel execution for input and output rails.
73+
To enable parallel execution, set `parallel: True` in the `rails.input` and `rails.output` sections in the `config.yml` file.
2874

2975
```{note}
3076
Input rail mutations can lead to erroneous results during parallel execution because of race conditions arising from the execution order and timing of parallel operations. This can result in output divergence compared to sequential execution. For such cases, use sequential mode.
@@ -60,5 +106,4 @@ rails:
60106
chunk_size: 200
61107
context_size: 50
62108
stream_first: True
63-
streaming: True
64109
```

docs/getting-started/installation-guide.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ You can install the NeMo Guardrails library with optional extra packages to add
122122
|-------|-------------|
123123
| `nvidia` | NVIDIA-hosted model integration through [build.nvidia.com](https://build.nvidia.com/) |
124124
| `openai` | OpenAI-hosted model integration |
125+
| `server` | [Guardrails API server](../run-rails/using-fastapi-server/overview.md) dependencies (aiofiles for async file handling, openai for API schemas). FastAPI is a core dependency. Required to run `nemoguardrails server`. |
125126
| `sdd` | [Sensitive data detection](../configure-rails/guardrail-catalog/pii-detection.md#presidio-based-sensitive-data-detection) using Presidio |
126127
| `eval` | [Evaluation tools](../evaluation/evaluate-guardrails.md) for testing guardrails |
127128
| `tracing` | OpenTelemetry tracing support |

docs/project.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
{ "name": "nemo-guardrails-toolkit", "version": "0.20.0" }
1+
{ "name": "nemo-guardrails-toolkit", "version": "0.21.0" }

docs/versions1.json

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,35 @@
11
[
22
{
33
"preferred": true,
4+
"version": "0.21.0",
5+
"url": "https://docs.nvidia.com/nemo/guardrails/0.21.0/"
6+
},
7+
{
48
"version": "0.20.0",
5-
"url": "../0.20.0/"
9+
"url": "https://docs.nvidia.com/nemo/guardrails/0.20.0/"
610
},
711
{
812
"version": "0.19.0",
9-
"url": "../0.19.0/"
13+
"url": "https://docs.nvidia.com/nemo/guardrails/0.19.0/"
1014
},
1115
{
1216
"version": "0.18.0",
13-
"url": "../0.18.0/"
17+
"url": "https://docs.nvidia.com/nemo/guardrails/0.18.0/"
1418
},
1519
{
1620
"version": "0.17.0",
17-
"url": "../0.17.0/"
21+
"url": "https://docs.nvidia.com/nemo/guardrails/0.17.0/"
1822
},
1923
{
2024
"version": "0.16.0",
21-
"url": "../0.16.0/"
25+
"url": "https://docs.nvidia.com/nemo/guardrails/0.16.0/"
2226
},
2327
{
2428
"version": "0.15.0",
25-
"url": "../0.15.0/"
29+
"url": "https://docs.nvidia.com/nemo/guardrails/0.15.0/"
2630
},
2731
{
2832
"version": "0.14.1",
29-
"url": "../0.14.1/"
33+
"url": "https://docs.nvidia.com/nemo/guardrails/0.14.1/"
3034
}
3135
]

0 commit comments

Comments
 (0)