Skip to content

Commit 528a87a

Browse files
committed
Merge remote-tracking branch 'upstream/develop' into sync-upstream-develop-20260324
2 parents 87247b4 + f2e7beb commit 528a87a

43 files changed

Lines changed: 5461 additions & 5111 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.coderabbit.yaml

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
2+
# https://docs.coderabbit.ai/getting-started/configure-coderabbit/
3+
# Validator https://docs.coderabbit.ai/configuration/yaml-validator#yaml-validator
4+
# In PR, comment "@coderabbitai configuration" to get the full config including defaults
5+
# Set the language for reviews by using the corresponding ISO language code.
6+
# Default: "en-US"
7+
language: "en-US"
8+
# Settings related to reviews.
9+
# Default: {}
10+
reviews:
11+
# Set the profile for reviews. Assertive profile yields more feedback, that may be considered nitpicky.
12+
# Options: chill, assertive
13+
# Default: "chill"
14+
profile: chill
15+
# Add this keyword in the PR/MR title to auto-generate the title.
16+
# Default: "@coderabbitai"
17+
auto_title_placeholder: "@coderabbitai title"
18+
# Auto Title Instructions - Custom instructions for auto-generating the PR/MR title.
19+
# Default: ""
20+
auto_title_instructions: 'Format: "<category>: <title>". Category must be one of: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, cp. The category must be followed by a colon. Title should be concise (<= 80 chars). Example: "feat: Add logit_bias support".' # current: ''
21+
# Set the commit status to 'pending' when the review is in progress and 'success' when it is complete.
22+
# Default: true
23+
commit_status: false
24+
# Generate walkthrough in a markdown collapsible section.
25+
# Default: false
26+
collapse_walkthrough: true
27+
# Generate an assessment of how well the changes address the linked issues in the walkthrough.
28+
# Default: true
29+
assess_linked_issues: true
30+
# Include possibly related issues in the walkthrough.
31+
# Default: true
32+
related_issues: true
33+
# Related PRs - Include possibly related pull requests in the walkthrough.
34+
# Default: true
35+
related_prs: true
36+
# Suggest labels based on the changes in the pull request in the walkthrough.
37+
# Default: true
38+
suggested_labels: true
39+
# Suggest reviewers based on the changes in the pull request in the walkthrough.
40+
# Default: true
41+
suggested_reviewers: true
42+
# Generate a poem in the walkthrough comment.
43+
# Default: true
44+
poem: false # current: true
45+
# Post review details on each review. Additionally, post a review status when a review is skipped in certain cases.
46+
# Default: true
47+
review_status: false # current: true
48+
# Configuration for pre merge checks
49+
# Default: {}
50+
pre_merge_checks:
51+
# Custom Pre-merge Checks - Add unique checks to enforce your team's standards before merging a pull request. Each check must have a unique name (up to 50 characters) and clear instructions (up to 10000 characters). Use these to automatically verify coding, security, documentation, or business rules and maintain code quality.
52+
# Default: []
53+
custom_checks:
54+
- name: "Test Results for Major Changes"
55+
mode: "warning" # or "error" to block merges
56+
instructions: |
57+
If this PR contains major changes (such as new features, breaking changes, or significant refactoring), verify that the PR description includes test results or testing information.
58+
If a change could affect numerics or convergence, the PR description should include information demonstrating that there is no regression.
59+
If a change could affect performance, the PR description should include before-and-after performance numbers, as well as the configuration and context in which they apply.
60+
Pass if test results are documented or if the changes are minor.
61+
auto_review:
62+
# Configuration for auto review
63+
# Default: {}
64+
# Automatic Incremental Review - Automatic incremental code review on each push
65+
# Default: true
66+
auto_incremental_review: false # current: true
67+
# Review draft PRs/MRs.
68+
# Default: false
69+
drafts: false
70+
# Base branches (other than the default branch) to review. Accepts regex patterns. Use '.*' to match all branches.

CHANGELOG.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,56 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
99
>
1010
> The changes related to the Colang language and runtime have moved to [CHANGELOG-Colang](./CHANGELOG-Colang.md) file.
1111
12+
## [0.21.0] - 2026-03-12
13+
14+
### 🚀 Features
15+
16+
- *(library)* Update Trend Micro Vision One AI Guard official endpoint ([#1546](https://github.com/NVIDIA-NeMo/Guardrails/issues/1546))
17+
- *(llmrails)* Add check_async method for input/output rails validation ([#1605](https://github.com/NVIDIA-NeMo/Guardrails/issues/1605))
18+
- *(server)* Make guardrails server OpenAI compatible ([#1340](https://github.com/NVIDIA-NeMo/Guardrails/issues/1340))
19+
- New top-level scaffold ([#1613](https://github.com/NVIDIA-NeMo/Guardrails/issues/1613))
20+
- Add Async work queue ([#1620](https://github.com/NVIDIA-NeMo/Guardrails/issues/1620))
21+
- *(integration)* Add GuardrailsMiddleware for LangChain agent ([#1606](https://github.com/NVIDIA-NeMo/Guardrails/issues/1606))
22+
- *(library)* Update Fiddler Guardrails API to match new specification ([#1619](https://github.com/NVIDIA-NeMo/Guardrails/issues/1619))
23+
- *(library)* Add CrowdStrike AIDR community integration ([#1601](https://github.com/NVIDIA-NeMo/Guardrails/issues/1601))
24+
- *(iorails)* Introduce IORails optimized Input/Output rail engine. Supports non-streaming parallel nemoguard input/output rails (content-safety, topic-safety, jailbreak detection) ([#1638](https://github.com/NVIDIA-NeMo/Guardrails/issues/1638), [#1649](https://github.com/NVIDIA-NeMo/Guardrails/issues/1649), [#1654](https://github.com/NVIDIA-NeMo/Guardrails/issues/1654), [#1656](https://github.com/NVIDIA-NeMo/Guardrails/issues/1656), [#1658](https://github.com/NVIDIA-NeMo/Guardrails/issues/1658), [#1660](https://github.com/NVIDIA-NeMo/Guardrails/issues/1660), [#1661](https://github.com/NVIDIA-NeMo/Guardrails/issues/1661), [#1674](https://github.com/NVIDIA-NeMo/Guardrails/issues/1674))
25+
- *(server)* Add OpenAI compatible v1/models endpoint ([#1637](https://github.com/NVIDIA-NeMo/Guardrails/issues/1637))
26+
- *(benchmark)* Add Locust stress-test ([#1629](https://github.com/NVIDIA-NeMo/Guardrails/issues/1629))
27+
- *(jailbreak)* Validate Jailbreak Detection config at create-time ([#1675](https://github.com/NVIDIA-NeMo/Guardrails/issues/1675))
28+
- *(library)* Add PolicyAI Integration for Content Moderation ([#1576](https://github.com/NVIDIA-NeMo/Guardrails/issues/1576))
29+
30+
### 🐛 Bug Fixes
31+
32+
- *(server)* Make openai an optional server-only dependency ([#1623](https://github.com/NVIDIA-NeMo/Guardrails/issues/1623))
33+
- *(actions)* Rename generate_next_step to generate_next_steps for task-specific LLM support ([#1603](https://github.com/NVIDIA-NeMo/Guardrails/issues/1603))
34+
- *(library)* Add `valid` alias to action results in GuardrailsAI integration ([#1578](https://github.com/NVIDIA-NeMo/Guardrails/issues/1578)) ([#1611](https://github.com/NVIDIA-NeMo/Guardrails/issues/1611))
35+
- *(llm)* Filter stop parameter for OpenAI reasoning models ([#1653](https://github.com/NVIDIA-NeMo/Guardrails/issues/1653))
36+
- *(logging)* Show cache hits in Stats log and fix duplicate metadata restore ([#1666](https://github.com/NVIDIA-NeMo/Guardrails/issues/1666))
37+
- *(cache)* Make cache stats log visible in verbose mode ([#1667](https://github.com/NVIDIA-NeMo/Guardrails/issues/1667))
38+
- *(library)* Use bot refuse to respond in gliner PII detection flows ([#1671](https://github.com/NVIDIA-NeMo/Guardrails/issues/1671))
39+
- *(streaming)* Handle None stop tokens in streaming handler ([#1685](https://github.com/NVIDIA-NeMo/Guardrails/issues/1685))
40+
- *(streaming)* Handle dict chunks in RollingBuffer.format_chunks ([#1687](https://github.com/NVIDIA-NeMo/Guardrails/issues/1687))
41+
- *(middleware)* Handle MODIFIED status in GuardrailsMiddleware instead of silently dropping it ([#1714](https://github.com/NVIDIA-NeMo/Guardrails/issues/1714))
42+
43+
### 🚜 Refactor
44+
45+
- *(streaming)* Remove LangChain callback dependencies from StreamingHandler ([#1547](https://github.com/NVIDIA-NeMo/Guardrails/issues/1547))
46+
- *(streaming)* Remove ChatNVIDIA streaming patch ([#1607](https://github.com/NVIDIA-NeMo/Guardrails/issues/1607))
47+
- *(streaming)* [**breaking**] Remove stream_usage and fix streaming metadata capture ([#1624](https://github.com/NVIDIA-NeMo/Guardrails/issues/1624))
48+
49+
### ⚡ Performance
50+
51+
- *(actions)* Lazy initialization of embedding indexes ([#1572](https://github.com/NVIDIA-NeMo/Guardrails/issues/1572))
52+
53+
### ⚙️ Miscellaneous Tasks
54+
55+
- Update Pangea User-Agent repo URL ([#1595](https://github.com/NVIDIA-NeMo/Guardrails/issues/1595)) ([#1610](https://github.com/NVIDIA-NeMo/Guardrails/issues/1610))
56+
- *(jailbreak)* Update dependencies for jailbreak detection docker container. ([#1596](https://github.com/NVIDIA-NeMo/Guardrails/issues/1596))
57+
- Remove multi_kb example ([#1673](https://github.com/NVIDIA-NeMo/Guardrails/issues/1673))
58+
- *(iorails)* Increase work queue concurrency and depth ([#1674](https://github.com/NVIDIA-NeMo/Guardrails/issues/1674))
59+
- *(docs)* Remove AI Virtual Assistant Blueprint notebook ([#1682](https://github.com/NVIDIA-NeMo/Guardrails/issues/1682))
60+
- Update dependencies ahead of v0.21 release ([#1617](https://github.com/NVIDIA-NeMo/Guardrails/issues/1617))
61+
1262
## [0.20.0] - 2026-01-22
1363

1464
### 🚀 Features

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
[![Downloads](https://static.pepy.tech/badge/nemoguardrails)](https://pepy.tech/project/nemoguardrails)
1414
[![Downloads](https://static.pepy.tech/badge/nemoguardrails/month)](https://pepy.tech/project/nemoguardrails)
1515

16-
> **LATEST RELEASE / DEVELOPMENT VERSION**: The [main](https://github.com/NVIDIA-NeMo/Guardrails/tree/main) branch tracks the latest released beta version: [0.20.0](https://github.com/NVIDIA-NeMo/Guardrails/tree/v0.20.0). For the latest development version, checkout the [develop](https://github.com/NVIDIA-NeMo/Guardrails/tree/develop) branch.
16+
> **LATEST RELEASE / DEVELOPMENT VERSION**: The [main](https://github.com/NVIDIA-NeMo/Guardrails/tree/main) branch tracks the latest released beta version: [0.21.0](https://github.com/NVIDIA-NeMo/Guardrails/tree/v0.21.0). For the latest development version, checkout the [develop](https://github.com/NVIDIA-NeMo/Guardrails/tree/develop) branch.
1717
1818
✨✨✨
1919

@@ -295,9 +295,9 @@ Evaluating the safety of a LLM-based conversational application is a complex tas
295295

296296
## How is this different?
297297

298-
There are many ways guardrails can be added to an LLM-based conversational application. For example: explicit moderation endpoints (e.g., OpenAI, ActiveFence), critique chains (e.g. constitutional chain), parsing the output (e.g. guardrails.ai), individual guardrails (e.g., LLM-Guard), hallucination detection for RAG applications (e.g., Got It AI, Patronus Lynx).
298+
There are many ways guardrails can be added to an LLM-based conversational application. For example: explicit moderation endpoints (e.g., OpenAI, ActiveFence, PolicyAI), critique chains (e.g. constitutional chain), parsing the output (e.g. guardrails.ai), individual guardrails (e.g., LLM-Guard), hallucination detection for RAG applications (e.g., Got It AI, Patronus Lynx).
299299

300-
NeMo Guardrails aims to provide a flexible toolkit that can integrate all these complementary approaches into a cohesive LLM guardrails layer. For example, the toolkit provides out-of-the-box integration with ActiveFence, AlignScore and LangChain chains.
300+
NeMo Guardrails aims to provide a flexible toolkit that can integrate all these complementary approaches into a cohesive LLM guardrails layer. For example, the toolkit provides out-of-the-box integration with ActiveFence, PolicyAI, AlignScore and LangChain chains.
301301

302302
To the best of our knowledge, NeMo Guardrails is the only guardrails toolkit that also offers a solution for modeling the dialog between the user and the LLM. This enables on one hand the ability to guide the dialog in a precise way. On the other hand it enables fine-grained control for when certain guardrails should be used, e.g., use fact-checking only for certain types of questions.
303303

benchmark/aiperf/README.md

Lines changed: 25 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,15 @@ To use the provided configurations, you need to create accounts at <https://buil
2727
1. **Create a virtual environment in which to install AIPerf**
2828

2929
```bash
30-
$ mkdir ~/env
31-
$ python -m venv ~/env/aiperf
30+
mkdir ~/env
31+
python -m venv ~/env/aiperf
32+
source ~/env/aiperf/bin/activate
3233
```
3334

3435
2. **Install dependencies in the virtual environment**
3536

3637
```bash
37-
$ pip install aiperf huggingface_hub typer
38+
pip install aiperf huggingface_hub typer httpx
3839
```
3940

4041
3. **Login to Hugging Face:**
@@ -50,39 +51,39 @@ To use the provided configurations, you need to create accounts at <https://buil
5051
After creating a Personal API key, set the `NVIDIA_API_KEY` variable as below.
5152

5253
```bash
53-
$ export NVIDIA_API_KEY="your-api-key-here"
54+
export NVIDIA_API_KEY="your-api-key-here"
5455
```
5556

5657
## Running Benchmarks
5758

5859
Each benchmark is configured using the `AIPerfConfig` Pydantic model in [aiperf_models.py](aiperf_models.py).
5960
The configs are stored in YAML files, and converted to an `AIPerfConfig` object.
60-
There are two example configs included which can be extended for your use-cases. These both use Nvidia-hosted models, :
61+
There are two example configs included which can be extended for your use-cases. These both use Nvidia-hosted models:
6162

62-
- [`single_concurrency.yaml`](aiperf_configs/single_concurrency.yaml): Example single-run benchmark with a single concurrency value.
63-
- [`sweep_concurrency.yaml`](aiperf_configs/sweep_concurrency.yaml): Example multiple-run benchmark to sweep concurency values and run a new benchmark for each.
63+
- [`single_concurrency.yaml`](configs/single_concurrency.yaml): Example single-run benchmark with a single concurrency value.
64+
- [`sweep_concurrency.yaml`](configs/sweep_concurrency.yaml): Example multiple-run benchmark to sweep concurrency values and run a new benchmark for each.
6465

6566
To run a benchmark, use the following command:
6667

6768
```bash
68-
$ python -m benchmark.aiperf --config-file <path-to-config.yaml>
69+
python -m benchmark.aiperf --config-file <path-to-config.yaml>
6970
```
7071

7172
### Running a Single Benchmark
7273

7374
To run a single benchmark with fixed parameters, use the `single_concurrency.yaml` configuration:
7475

7576
```bash
76-
$ python -m benchmark.aiperf --config-file aiperf/configs/single_concurrency.yaml
77+
python -m benchmark.aiperf --config-file benchmark/aiperf/configs/single_concurrency.yaml
7778
```
7879

7980
**Example output:**
8081

81-
```text
82-
2025-12-01 10:35:17 INFO: Running AIPerf with configuration: aiperf/configs/single_concurrency.yaml
82+
```terminaloutput
83+
2025-12-01 10:35:17 INFO: Running AIPerf with configuration: benchmark/aiperf/configs/single_concurrency.yaml
8384
2025-12-01 10:35:17 INFO: Results root directory: aiperf_results/single_concurrency/20251201_103517
8485
2025-12-01 10:35:17 INFO: Sweeping parameters: None
85-
2025-12-01 10:35:17 INFO: Running AIPerf with configuration: aiperf/configs/single_concurrency.yaml
86+
2025-12-01 10:35:17 INFO: Running AIPerf with configuration: benchmark/aiperf/configs/single_concurrency.yaml
8687
2025-12-01 10:35:17 INFO: Output directory: aiperf_results/single_concurrency/20251201_103517
8788
2025-12-01 10:35:17 INFO: Single Run
8889
2025-12-01 10:36:54 INFO: Run completed successfully
@@ -97,13 +98,13 @@ $ python -m benchmark.aiperf --config-file aiperf/configs/single_concurrency.yam
9798
To run multiple benchmarks with different concurrency levels, use the `sweep_concurrency.yaml` configuration as below:
9899

99100
```bash
100-
$ python -m benchmark.aiperf --config-file aiperf/configs/sweep_concurrency.yaml
101+
python -m benchmark.aiperf --config-file benchmark/aiperf/configs/sweep_concurrency.yaml
101102
```
102103

103104
**Example output:**
104105

105-
```text
106-
2025-11-14 14:02:54 INFO: Running AIPerf with configuration: nemoguardrails/benchmark/aiperf/aiperf_configs/sweep_concurrency.yaml
106+
```terminaloutput
107+
2025-11-14 14:02:54 INFO: Running AIPerf with configuration: benchmark/aiperf/configs/sweep_concurrency.yaml
107108
2025-11-14 14:02:54 INFO: Results root directory: aiperf_results/sweep_concurrency/20251114_140254
108109
2025-11-14 14:02:54 INFO: Sweeping parameters: {'concurrency': [1, 2, 4]}
109110
2025-11-14 14:02:54 INFO: Running 3 benchmarks
@@ -134,7 +135,7 @@ The `--dry-run` option allows you to preview all benchmark commands without exec
134135
- Debugging configuration issues
135136

136137
```bash
137-
$ python -m benchmark.aiperf --config-file aiperf/configs/sweep_concurrency.yaml --dry-run
138+
python -m benchmark.aiperf --config-file benchmark/aiperf/configs/sweep_concurrency.yaml --dry-run
138139
```
139140

140141
When in dry-run mode, the script will:
@@ -150,7 +151,7 @@ When in dry-run mode, the script will:
150151
The `--verbose` option outputs more detailed debugging information to understand each step of the benchmarking process.
151152

152153
```bash
153-
$ python -m benchmark.aiperf --config-file <config.yaml> --verbose
154+
python -m benchmark.aiperf --config-file <config.yaml> --verbose
154155
```
155156

156157
Verbose mode provides:
@@ -165,14 +166,14 @@ Verbose mode provides:
165166

166167
## Configuration Files
167168

168-
Configuration files are YAML files located in [aiperf_configs](aiperf_configs). The configuration is validated using Pydantic models to catch errors early.
169+
Configuration files are YAML files located in [configs](configs). The configuration is validated using Pydantic models to catch errors early.
169170

170171
### Top-Level Configuration Fields
171172

172173
| Field | Type | Required | Description |
173174
|-------|------|----------|-------------|
174-
| `batch_name` | string | Yes | Name for this batch of benchmarks. Used in output directory naming (e.g., `aiperf_results/batch_name/timestamp/`) |
175-
| `output_base_dir` | string | Yes | Base directory where all benchmark results will be stored |
175+
| `batch_name` | string | No | Name for this batch of benchmarks. Used in output directory naming (e.g., `aiperf_results/batch_name/timestamp/`). Default: `benchmark` |
176+
| `output_base_dir` | string | No | Base directory where all benchmark results will be stored. Default: `aiperf_results` |
176177
| `base_config` | object | Yes | Base configuration parameters applied to all benchmark runs (see below) |
177178
| `sweeps` | object | No | Optional parameter sweeps for running multiple benchmarks with different values |
178179

@@ -355,11 +356,11 @@ Each run directory contains multiple files with benchmark results and metadata.
355356
- **`inputs.json`**: Synthetic prompt data generated for the benchmark.
356357
- **`profile_export_aiperf.json`**: Main metrics file in JSON format containing aggregated statistics.
357358
- **`profile_export_aiperf.csv`**: Same metrics as the JSON file, but in CSV format for easy import into spreadsheet tools or data analysis libraries.
358-
- **`profile_export.jsonl`**: JSON Lines format file containing per-request metrics. Each line is a complete JSON object for one request with:
359-
- **`logs/aiperf.log`**: Detailed log file from AIPerf execution containing:
359+
- **`profile_export.jsonl`**: JSON Lines format file containing per-request metrics. Each line is a complete JSON object for one request.
360+
- **`logs/aiperf.log`**: Detailed log file from AIPerf execution.
360361

361362
## Resources
362363

363-
- [AIPerf GitHub Repository](https://github.com/triton-inference-server/perf_analyzer/tree/main/genai-perf)
364-
- [AIPerf Documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/client/src/c%2B%2B/perf_analyzer/genai-perf/README.html)
364+
- [AIPerf GitHub Repository](https://github.com/ai-dynamo/aiperf)
365+
- [AIPerf Documentation](https://docs.nvidia.com/nim/benchmarking/llm/latest/step-by-step.html)
365366
- [NVIDIA API Catalog](https://build.nvidia.com/)

0 commit comments

Comments
 (0)