trustyai-explainability
diff --git a/‎.coderabbit.yaml‎
Lines changed: 70 additions & 0 deletions b/‎.coderabbit.yaml‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 50 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎benchmark/aiperf/README.md‎
Lines changed: 25 additions & 24 deletions b/‎benchmark/aiperf/README.md‎
Lines changed: 25 additions & 24 deletions
@@ -0,0 +1,70 @@
+# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
+# https://docs.coderabbit.ai/getting-started/configure-coderabbit/
+# Validator https://docs.coderabbit.ai/configuration/yaml-validator#yaml-validator
+# In PR, comment "@coderabbitai configuration" to get the full config including defaults
+# Set the language for reviews by using the corresponding ISO language code.
+# Default: "en-US"
+language: "en-US"
+# Settings related to reviews.
+# Default: {}
+reviews:
+  # Set the profile for reviews. Assertive profile yields more feedback, that may be considered nitpicky.
+  # Options: chill, assertive
+  # Default: "chill"
+  profile: chill
+  # Add this keyword in the PR/MR title to auto-generate the title.
+  # Default: "@coderabbitai"
+  auto_title_placeholder: "@coderabbitai title"
+  # Auto Title Instructions - Custom instructions for auto-generating the PR/MR title.
+  # Default: ""
+  auto_title_instructions: 'Format: "<category>: <title>". Category must be one of: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, cp. The category must be followed by a colon. Title should be concise (<= 80 chars). Example: "feat: Add logit_bias support".' # current: ''
+  # Set the commit status to 'pending' when the review is in progress and 'success' when it is complete.
+  # Default: true
+  commit_status: false
+  # Generate walkthrough in a markdown collapsible section.
+  # Default: false
+  collapse_walkthrough: true
+  # Generate an assessment of how well the changes address the linked issues in the walkthrough.
+  # Default: true
+  assess_linked_issues: true
+  # Include possibly related issues in the walkthrough.
+  # Default: true
+  related_issues: true
+  # Related PRs - Include possibly related pull requests in the walkthrough.
+  # Default: true
+  related_prs: true
+  # Suggest labels based on the changes in the pull request in the walkthrough.
+  # Default: true
+  suggested_labels: true
+  # Suggest reviewers based on the changes in the pull request in the walkthrough.
+  # Default: true
+  suggested_reviewers: true
+  # Generate a poem in the walkthrough comment.
+  # Default: true
+  poem: false # current: true
+  # Post review details on each review. Additionally, post a review status when a review is skipped in certain cases.
+  # Default: true
+  review_status: false # current: true
+  # Configuration for pre merge checks
+  # Default: {}
+  pre_merge_checks:
+    # Custom Pre-merge Checks - Add unique checks to enforce your team's standards before merging a pull request. Each check must have a unique name (up to 50 characters) and clear instructions (up to 10000 characters). Use these to automatically verify coding, security, documentation, or business rules and maintain code quality.
+    # Default: []
+    custom_checks:
+      - name: "Test Results for Major Changes"
+        mode: "warning" # or "error" to block merges
+        instructions: |
+          If this PR contains major changes (such as new features, breaking changes, or significant refactoring), verify that the PR description includes test results or testing information.
+          If a change could affect numerics or convergence, the PR description should include information demonstrating that there is no regression.
+          If a change could affect performance, the PR description should include before-and-after performance numbers, as well as the configuration and context in which they apply.
+          Pass if test results are documented or if the changes are minor.
+  auto_review:
+    # Configuration for auto review
+    # Default: {}
+    # Automatic Incremental Review - Automatic incremental code review on each push
+    # Default: true
+    auto_incremental_review: false # current: true
+    # Review draft PRs/MRs.
+    # Default: false
+    drafts: false
+    # Base branches (other than the default branch) to review. Accepts regex patterns. Use '.*' to match all branches.
@@ -9,6 +9,56 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 >
 > The changes related to the Colang language and runtime have moved to [CHANGELOG-Colang](./CHANGELOG-Colang.md) file.
 
+## [0.21.0] - 2026-03-12
+
+### 🚀 Features
+
+- *(library)* Update Trend Micro Vision One AI Guard official endpoint ([#1546](https://github.com/NVIDIA-NeMo/Guardrails/issues/1546))
+- *(llmrails)* Add check_async method for input/output rails validation ([#1605](https://github.com/NVIDIA-NeMo/Guardrails/issues/1605))
+- *(server)* Make guardrails server OpenAI compatible ([#1340](https://github.com/NVIDIA-NeMo/Guardrails/issues/1340))
+- New top-level scaffold ([#1613](https://github.com/NVIDIA-NeMo/Guardrails/issues/1613))
+- Add Async work queue ([#1620](https://github.com/NVIDIA-NeMo/Guardrails/issues/1620))
+- *(integration)* Add GuardrailsMiddleware for LangChain agent ([#1606](https://github.com/NVIDIA-NeMo/Guardrails/issues/1606))
+- *(library)* Update Fiddler Guardrails API to match new specification ([#1619](https://github.com/NVIDIA-NeMo/Guardrails/issues/1619))
+- *(library)* Add CrowdStrike AIDR community integration ([#1601](https://github.com/NVIDIA-NeMo/Guardrails/issues/1601))
+- *(iorails)* Introduce IORails optimized Input/Output rail engine. Supports non-streaming parallel nemoguard input/output rails (content-safety, topic-safety, jailbreak detection) ([#1638](https://github.com/NVIDIA-NeMo/Guardrails/issues/1638), [#1649](https://github.com/NVIDIA-NeMo/Guardrails/issues/1649), [#1654](https://github.com/NVIDIA-NeMo/Guardrails/issues/1654), [#1656](https://github.com/NVIDIA-NeMo/Guardrails/issues/1656), [#1658](https://github.com/NVIDIA-NeMo/Guardrails/issues/1658), [#1660](https://github.com/NVIDIA-NeMo/Guardrails/issues/1660), [#1661](https://github.com/NVIDIA-NeMo/Guardrails/issues/1661), [#1674](https://github.com/NVIDIA-NeMo/Guardrails/issues/1674))
+- *(server)* Add OpenAI compatible v1/models endpoint ([#1637](https://github.com/NVIDIA-NeMo/Guardrails/issues/1637))
+- *(benchmark)* Add Locust stress-test ([#1629](https://github.com/NVIDIA-NeMo/Guardrails/issues/1629))
+- *(jailbreak)* Validate Jailbreak Detection config at create-time ([#1675](https://github.com/NVIDIA-NeMo/Guardrails/issues/1675))
+- *(library)* Add PolicyAI Integration for Content Moderation ([#1576](https://github.com/NVIDIA-NeMo/Guardrails/issues/1576))
+
+### 🐛 Bug Fixes
+
+- *(server)* Make openai an optional server-only dependency ([#1623](https://github.com/NVIDIA-NeMo/Guardrails/issues/1623))
+- *(actions)* Rename generate_next_step to generate_next_steps for task-specific LLM support ([#1603](https://github.com/NVIDIA-NeMo/Guardrails/issues/1603))
+- *(library)* Add `valid` alias to action results in GuardrailsAI integration ([#1578](https://github.com/NVIDIA-NeMo/Guardrails/issues/1578)) ([#1611](https://github.com/NVIDIA-NeMo/Guardrails/issues/1611))
+- *(llm)* Filter stop parameter for OpenAI reasoning models ([#1653](https://github.com/NVIDIA-NeMo/Guardrails/issues/1653))
+- *(logging)* Show cache hits in Stats log and fix duplicate metadata restore ([#1666](https://github.com/NVIDIA-NeMo/Guardrails/issues/1666))
+- *(cache)* Make cache stats log visible in verbose mode ([#1667](https://github.com/NVIDIA-NeMo/Guardrails/issues/1667))
+- *(library)* Use bot refuse to respond in gliner PII detection flows ([#1671](https://github.com/NVIDIA-NeMo/Guardrails/issues/1671))
+- *(streaming)* Handle None stop tokens in streaming handler ([#1685](https://github.com/NVIDIA-NeMo/Guardrails/issues/1685))
+- *(streaming)* Handle dict chunks in RollingBuffer.format_chunks ([#1687](https://github.com/NVIDIA-NeMo/Guardrails/issues/1687))
+- *(middleware)* Handle MODIFIED status in GuardrailsMiddleware instead of silently dropping it ([#1714](https://github.com/NVIDIA-NeMo/Guardrails/issues/1714))
+
+### 🚜 Refactor
+
+- *(streaming)* Remove LangChain callback dependencies from StreamingHandler ([#1547](https://github.com/NVIDIA-NeMo/Guardrails/issues/1547))
+- *(streaming)* Remove ChatNVIDIA streaming patch ([#1607](https://github.com/NVIDIA-NeMo/Guardrails/issues/1607))
+- *(streaming)* [**breaking**] Remove stream_usage and fix streaming metadata capture ([#1624](https://github.com/NVIDIA-NeMo/Guardrails/issues/1624))
+
+### ⚡ Performance
+
+- *(actions)* Lazy initialization of embedding indexes ([#1572](https://github.com/NVIDIA-NeMo/Guardrails/issues/1572))
+
+### ⚙️ Miscellaneous Tasks
+
+- Update Pangea User-Agent repo URL ([#1595](https://github.com/NVIDIA-NeMo/Guardrails/issues/1595)) ([#1610](https://github.com/NVIDIA-NeMo/Guardrails/issues/1610))
+- *(jailbreak)* Update dependencies for jailbreak detection docker container. ([#1596](https://github.com/NVIDIA-NeMo/Guardrails/issues/1596))
+- Remove multi_kb example ([#1673](https://github.com/NVIDIA-NeMo/Guardrails/issues/1673))
+- *(iorails)* Increase work queue concurrency and depth ([#1674](https://github.com/NVIDIA-NeMo/Guardrails/issues/1674))
+- *(docs)* Remove AI Virtual Assistant Blueprint notebook ([#1682](https://github.com/NVIDIA-NeMo/Guardrails/issues/1682))
+- Update dependencies ahead of v0.21 release ([#1617](https://github.com/NVIDIA-NeMo/Guardrails/issues/1617))
+
 ## [0.20.0] - 2026-01-22
 
 ### 🚀 Features
 
@@ -13,7 +13,7 @@
 [![Downloads](https://static.pepy.tech/badge/nemoguardrails)](https://pepy.tech/project/nemoguardrails)
 [![Downloads](https://static.pepy.tech/badge/nemoguardrails/month)](https://pepy.tech/project/nemoguardrails)
 
-> **LATEST RELEASE / DEVELOPMENT VERSION**: The [main](https://github.com/NVIDIA-NeMo/Guardrails/tree/main) branch tracks the latest released beta version: [0.20.0](https://github.com/NVIDIA-NeMo/Guardrails/tree/v0.20.0). For the latest development version, checkout the [develop](https://github.com/NVIDIA-NeMo/Guardrails/tree/develop) branch.
+> **LATEST RELEASE / DEVELOPMENT VERSION**: The [main](https://github.com/NVIDIA-NeMo/Guardrails/tree/main) branch tracks the latest released beta version: [0.21.0](https://github.com/NVIDIA-NeMo/Guardrails/tree/v0.21.0). For the latest development version, checkout the [develop](https://github.com/NVIDIA-NeMo/Guardrails/tree/develop) branch.
 
 ✨✨✨
 
@@ -295,9 +295,9 @@ Evaluating the safety of a LLM-based conversational application is a complex tas
 
 ## How is this different?
 
-There are many ways guardrails can be added to an LLM-based conversational application. For example: explicit moderation endpoints (e.g., OpenAI, ActiveFence), critique chains (e.g. constitutional chain), parsing the output (e.g. guardrails.ai), individual guardrails (e.g., LLM-Guard), hallucination detection for RAG applications (e.g., Got It AI, Patronus Lynx).
+There are many ways guardrails can be added to an LLM-based conversational application. For example: explicit moderation endpoints (e.g., OpenAI, ActiveFence, PolicyAI), critique chains (e.g. constitutional chain), parsing the output (e.g. guardrails.ai), individual guardrails (e.g., LLM-Guard), hallucination detection for RAG applications (e.g., Got It AI, Patronus Lynx).
 
-NeMo Guardrails aims to provide a flexible toolkit that can integrate all these complementary approaches into a cohesive LLM guardrails layer. For example, the toolkit provides out-of-the-box integration with ActiveFence, AlignScore and LangChain chains.
+NeMo Guardrails aims to provide a flexible toolkit that can integrate all these complementary approaches into a cohesive LLM guardrails layer. For example, the toolkit provides out-of-the-box integration with ActiveFence, PolicyAI, AlignScore and LangChain chains.
 
 To the best of our knowledge, NeMo Guardrails is the only guardrails toolkit that also offers a solution for modeling the dialog between the user and the LLM. This enables on one hand the ability to guide the dialog in a precise way. On the other hand it enables fine-grained control for when certain guardrails should be used, e.g., use fact-checking only for certain types of questions.
 
 
@@ -27,14 +27,15 @@ To use the provided configurations, you need to create accounts at <https://buil
 1. **Create a virtual environment in which to install AIPerf**
 
    ```bash
-   $ mkdir ~/env
-   $ python -m venv ~/env/aiperf
+   mkdir ~/env
+   python -m venv ~/env/aiperf
+   source ~/env/aiperf/bin/activate
    ```
 
 2. **Install dependencies in the virtual environment**
 
    ```bash
-   $ pip install aiperf huggingface_hub typer
+   pip install aiperf huggingface_hub typer httpx
    ```
 
 3. **Login to Hugging Face:**
@@ -50,39 +51,39 @@ To use the provided configurations, you need to create accounts at <https://buil
    After creating a Personal API key, set the `NVIDIA_API_KEY` variable as below.
 
    ```bash
-   $ export NVIDIA_API_KEY="your-api-key-here"
+   export NVIDIA_API_KEY="your-api-key-here"
    ```
 
 ## Running Benchmarks
 
 Each benchmark is configured using the `AIPerfConfig` Pydantic model in [aiperf_models.py](aiperf_models.py).
 The configs are stored in YAML files, and converted to an `AIPerfConfig` object.
-There are two example configs included which can be extended for your use-cases. These both use Nvidia-hosted models, :
+There are two example configs included which can be extended for your use-cases. These both use Nvidia-hosted models:
 
-- [`single_concurrency.yaml`](aiperf_configs/single_concurrency.yaml): Example single-run benchmark with a single concurrency value.
-- [`sweep_concurrency.yaml`](aiperf_configs/sweep_concurrency.yaml): Example multiple-run benchmark to sweep concurency values and run a new benchmark for each.
+- [`single_concurrency.yaml`](configs/single_concurrency.yaml): Example single-run benchmark with a single concurrency value.
+- [`sweep_concurrency.yaml`](configs/sweep_concurrency.yaml): Example multiple-run benchmark to sweep concurrency values and run a new benchmark for each.
 
 To run a benchmark, use the following command:
 
 ```bash
-$ python -m benchmark.aiperf --config-file <path-to-config.yaml>
+python -m benchmark.aiperf --config-file <path-to-config.yaml>
 ```
 
 ### Running a Single Benchmark
 
 To run a single benchmark with fixed parameters, use the `single_concurrency.yaml` configuration:
 
 ```bash
-$ python -m benchmark.aiperf --config-file aiperf/configs/single_concurrency.yaml
+python -m benchmark.aiperf --config-file benchmark/aiperf/configs/single_concurrency.yaml
 ```
 
 **Example output:**
 
-```text
-2025-12-01 10:35:17 INFO: Running AIPerf with configuration: aiperf/configs/single_concurrency.yaml
+```terminaloutput
+2025-12-01 10:35:17 INFO: Running AIPerf with configuration: benchmark/aiperf/configs/single_concurrency.yaml
 2025-12-01 10:35:17 INFO: Results root directory: aiperf_results/single_concurrency/20251201_103517
 2025-12-01 10:35:17 INFO: Sweeping parameters: None
-2025-12-01 10:35:17 INFO: Running AIPerf with configuration: aiperf/configs/single_concurrency.yaml
+2025-12-01 10:35:17 INFO: Running AIPerf with configuration: benchmark/aiperf/configs/single_concurrency.yaml
 2025-12-01 10:35:17 INFO: Output directory: aiperf_results/single_concurrency/20251201_103517
 2025-12-01 10:35:17 INFO: Single Run
 2025-12-01 10:36:54 INFO: Run completed successfully
@@ -97,13 +98,13 @@ $ python -m benchmark.aiperf --config-file aiperf/configs/single_concurrency.yam
 To run multiple benchmarks with different concurrency levels, use the `sweep_concurrency.yaml` configuration as below:
 
 ```bash
-$ python -m benchmark.aiperf --config-file aiperf/configs/sweep_concurrency.yaml
+python -m benchmark.aiperf --config-file benchmark/aiperf/configs/sweep_concurrency.yaml
 ```
 
 **Example output:**
 
-```text
-2025-11-14 14:02:54 INFO: Running AIPerf with configuration: nemoguardrails/benchmark/aiperf/aiperf_configs/sweep_concurrency.yaml
+```terminaloutput
+2025-11-14 14:02:54 INFO: Running AIPerf with configuration: benchmark/aiperf/configs/sweep_concurrency.yaml
 2025-11-14 14:02:54 INFO: Results root directory: aiperf_results/sweep_concurrency/20251114_140254
 2025-11-14 14:02:54 INFO: Sweeping parameters: {'concurrency': [1, 2, 4]}
 2025-11-14 14:02:54 INFO: Running 3 benchmarks
@@ -134,7 +135,7 @@ The `--dry-run` option allows you to preview all benchmark commands without exec
 - Debugging configuration issues
 
 ```bash
-$ python -m benchmark.aiperf --config-file aiperf/configs/sweep_concurrency.yaml --dry-run
+python -m benchmark.aiperf --config-file benchmark/aiperf/configs/sweep_concurrency.yaml --dry-run
 ```
 
 When in dry-run mode, the script will:
@@ -150,7 +151,7 @@ When in dry-run mode, the script will:
 The `--verbose` option outputs more detailed debugging information to understand each step of the benchmarking process.
 
 ```bash
-$ python -m benchmark.aiperf --config-file <config.yaml> --verbose
+python -m benchmark.aiperf --config-file <config.yaml> --verbose
 ```
 
 Verbose mode provides:
@@ -165,14 +166,14 @@ Verbose mode provides:
 
 ## Configuration Files
 
-Configuration files are YAML files located in [aiperf_configs](aiperf_configs). The configuration is validated using Pydantic models to catch errors early.
+Configuration files are YAML files located in [configs](configs). The configuration is validated using Pydantic models to catch errors early.
 
 ### Top-Level Configuration Fields
 
 | Field | Type | Required | Description |
 |-------|------|----------|-------------|
-| `batch_name` | string | Yes | Name for this batch of benchmarks. Used in output directory naming (e.g., `aiperf_results/batch_name/timestamp/`) |
-| `output_base_dir` | string | Yes | Base directory where all benchmark results will be stored |
+| `batch_name` | string | No | Name for this batch of benchmarks. Used in output directory naming (e.g., `aiperf_results/batch_name/timestamp/`). Default: `benchmark` |
+| `output_base_dir` | string | No | Base directory where all benchmark results will be stored. Default: `aiperf_results` |
 | `base_config` | object | Yes | Base configuration parameters applied to all benchmark runs (see below) |
 | `sweeps` | object | No | Optional parameter sweeps for running multiple benchmarks with different values |
 
@@ -355,11 +356,11 @@ Each run directory contains multiple files with benchmark results and metadata.
 - **`inputs.json`**: Synthetic prompt data generated for the benchmark.
 - **`profile_export_aiperf.json`**: Main metrics file in JSON format containing aggregated statistics.
 - **`profile_export_aiperf.csv`**: Same metrics as the JSON file, but in CSV format for easy import into spreadsheet tools or data analysis libraries.
-- **`profile_export.jsonl`**: JSON Lines format file containing per-request metrics. Each line is a complete JSON object for one request with:
-- **`logs/aiperf.log`**: Detailed log file from AIPerf execution containing:
+- **`profile_export.jsonl`**: JSON Lines format file containing per-request metrics. Each line is a complete JSON object for one request.
+- **`logs/aiperf.log`**: Detailed log file from AIPerf execution.
 
 ## Resources
 
-- [AIPerf GitHub Repository](https://github.com/triton-inference-server/perf_analyzer/tree/main/genai-perf)
-- [AIPerf Documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/client/src/c%2B%2B/perf_analyzer/genai-perf/README.html)
+- [AIPerf GitHub Repository](https://github.com/ai-dynamo/aiperf)
+- [AIPerf Documentation](https://docs.nvidia.com/nim/benchmarking/llm/latest/step-by-step.html)
 - [NVIDIA API Catalog](https://build.nvidia.com/)