fix(output_parsers): use correct JSON key "Violated Categories" in nemoguard parsers by nac7 · Pull Request #2011 · NVIDIA-NeMo/Guardrails

nac7 · 2026-06-08T20:30:48Z

Fixes #2010.

Problem

nemoguard_parse_prompt_safety and nemoguard_parse_response_safety in nemoguardrails/llm/output_parsers.py looked for key "Safety Categories" when extracting violation categories from the NemoGuard ContentSafety model response. However, the NemoGuard model returns key "Violated Categories" — as documented in each function's own docstring.

This caused violation categories to be silently dropped on every unsafe response:

response = '{"User Safety": "unsafe", "Violated Categories": "violence, hate_speech"}'
result = nemoguard_parse_prompt_safety(response)
# Was:    [False]                           ← categories silently dropped
# Now:    [False, 'violence', 'hate_speech']

Impact:

Audit logs that record which policy categories were violated always received an empty list
Downstream guardrail logic that dispatches on violation type never received category information
Compliance reporting showed "unsafe" with no details

Fix

Change "Safety Categories" → "Violated Categories" on lines 163 and 202 of output_parsers.py to match the key the NemoGuard model actually emits (and the key documented in the function docstrings).

Tests

Updated tests/test_content_safety_output_parsers.py:

Renamed test cases using old wrong key to use correct "Violated Categories" key
Added test_wrong_key_safety_categories_yields_no_categories regression tests for both parsers to confirm the old wrong key no longer extracts categories

Summary by CodeRabbit

Bug Fixes
- Updated the content safety system to correctly parse and identify violated policy categories during prompt and response safety screening.
Tests
- Expanded test suite with comprehensive coverage for safety violation detection, category extraction, and handling of edge cases.

…moguard parsers Both nemoguard_parse_prompt_safety and nemoguard_parse_response_safety checked for key "Safety Categories" when extracting violation categories from NemoGuard ContentSafety model output, but the model actually returns key "Violated Categories" (as documented in each function's own docstring). This caused violation categories to be silently dropped on every unsafe response, breaking audit logging, granular guardrail policies, and compliance reporting that depend on knowing which policy categories were flagged. Fixes NVIDIA-NeMo#2010

Existing tests provided mock NemoGuard JSON responses with the wrong key Safety Categories. Now that the parser correctly reads Violated Categories, update all mock response fixtures to match what the real model emits. The intentional regression tests in test_content_safety_output_parsers.py that verify Safety Categories no longer extracts data are left unchanged.

greptile-apps · 2026-06-08T21:41:47Z

Greptile Summary

This PR fixes a key mismatch in nemoguard_parse_prompt_safety and nemoguard_parse_response_safety: both functions were reading "Safety Categories" from the NemoGuard model response, but the model actually emits "Violated Categories" (which also matches the functions' own docstrings). The result was that violation categories were silently dropped on every unsafe response.

output_parsers.py: Two-line fix replacing "Safety Categories" with "Violated Categories" in both parsers.
All changed test files correctly update JSON fixtures and add regression tests confirming the old key no longer extracts categories.
Not updated: benchmark/mock_llm_server/configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env still emits the old key in its UNSAFE_TEXT, and at least 8 example prompts.yml files still instruct the model to output "Safety Categories", creating a parser/prompt mismatch for users who copy those configs.

Confidence Score: 3/5

The parser fix itself is correct, but the change is incomplete: the benchmark mock server and at least 8 example prompt configs were not updated and still reference the old key.

The two-line change to output_parsers.py is correct and well-tested. However, the benchmark mock server (nvidia-llama-3.1-nemoguard-8b-content-safety.env) still emits "Safety Categories" in its UNSAFE_TEXT, so benchmarks will now silently drop categories — precisely the defect this PR aims to fix. Additionally, eight example prompts.yml files still tell the model to output "Safety Categories", meaning users who copy those configs with instruction-following models will reproduce the original bug.

benchmark/mock_llm_server/configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env and all example prompts.yml/prompts.yaml files under examples/configs/ — they still reference the old key and were not included in the PR changeset.

Important Files Changed

Filename	Overview
nemoguardrails/llm/output_parsers.py	Correct fix: both `nemoguard_parse_prompt_safety` and `nemoguard_parse_response_safety` now look for `"Violated Categories"` matching the actual NemoGuard model output and their own docstrings.
tests/test_content_safety_output_parsers.py	Tests updated to use `"Violated Categories"` throughout; two new regression tests added that confirm the old wrong key no longer extracts categories.
tests/guardrails/test_data.py	Expected prompt strings in test fixtures updated from `"Safety Categories"` to `"Violated Categories"` to match the corrected parser.
tests/guardrails/test_content_safety_iorails_actions.py	Test JSON stubs updated to use `"Violated Categories"` key for both prompt and response safety test cases.
tests/test_content_safety_integration.py	Integration test JSON responses updated to use `"Violated Categories"`, now correctly asserting that categories are extracted.
benchmark/mock_llm_server/configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env	Not updated in this PR — mock UNSAFE_TEXT still uses old `"Safety Categories"` key, causing violation categories to be silently dropped during benchmark runs.
tests/guardrails/test_iorails_telemetry.py	Telemetry test fixture `UNSAFE_INPUT_JSON` updated to use `"Violated Categories"` key.
tests/guardrails/test_rails_manager.py	Rails manager test stubs updated to use `"Violated Categories"` for both input and output unsafe JSON.

Sequence Diagram

sequenceDiagram
    participant App
    participant Parser as output_parsers.py
    participant NemoGuard as NemoGuard Model

    App->>NemoGuard: Safety check request
    NemoGuard-->>Parser: "{"User Safety": "unsafe", "Violated Categories": "S1, S8"}"
    
    Note over Parser: Before fix: looked for "Safety Categories" → not found → []
    Note over Parser: After fix: looks for "Violated Categories" → found → ["S1", "S8"]
    
    Parser-->>App: [False, "S1", "S8"]
    App->>App: Dispatch on violation type ✓

_{Reviews (1): Last reviewed commit: "test: update mock responses to use corre..." | Re-trigger Greptile}

coderabbitai · 2026-06-08T21:43:05Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ab61cde9-1100-49e2-950b-523f1d846993

📥 Commits

Reviewing files that changed from the base of the PR and between 7285f2c and 542ab9e.

📒 Files selected for processing (7)

nemoguardrails/llm/output_parsers.py
tests/guardrails/test_content_safety_iorails_actions.py
tests/guardrails/test_data.py
tests/guardrails/test_iorails_telemetry.py
tests/guardrails/test_rails_manager.py
tests/test_content_safety_integration.py
tests/test_content_safety_output_parsers.py

📝 Walkthrough

Walkthrough

This PR fixes a silent data-loss bug in NemoGuard content-safety response parsing. The parser functions were reading from the wrong JSON field name ("Safety Categories" instead of "Violated Categories"), causing violated policy categories to be discarded. The fix updates both parser functions and all corresponding test fixtures and test cases across the test suite.

Changes

NemoGuard JSON Key Fix

Layer / File(s)	Summary
Core parsing logic update `nemoguardrails/llm/output_parsers.py`	`nemoguard_parse_prompt_safety` and `nemoguard_parse_response_safety` now read violated categories from `"Violated Categories"` instead of the incorrect `"Safety Categories"` key. Missing field returns empty list; JSON parse errors still return `["JSON parsing failed"]`.
Test data schemas and prompt templates `tests/guardrails/test_data.py`	`CONTENT_SAFETY_INPUT_PROMPT` and `CONTENT_SAFETY_OUTPUT_PROMPT` updated to document the correct `"Violated Categories"` field in the JSON schema for unsafe content.
Comprehensive output parser test coverage `tests/test_content_safety_output_parsers.py`	Regression tests added confirming `"Safety Categories"` yields no categories; expanded coverage for `"Violated Categories"` parsing (single, complex, whitespace-trimmed, empty). Real-world scenario tests updated for both prompt and response safety with colon-delimited category formats (e.g., `"S1: Violence"`).
Integration and system test fixtures `tests/guardrails/test_content_safety_iorails_actions.py`, `tests/guardrails/test_iorails_telemetry.py`, `tests/guardrails/test_rails_manager.py`, `tests/test_content_safety_integration.py`	Updated `UNSAFE_INPUT_JSON` and `UNSAFE_OUTPUT_JSON` fixtures to use `"Violated Categories"` instead of `"Safety Categories"`. All integration test scenarios for prompt and response safety now use the corrected JSON key.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main fix: correcting the JSON key from "Safety Categories" to "Violated Categories" in nemoguard parsers.
Linked Issues check	✅ Passed	All code changes address the requirements in `#2010`: parser functions now check for "Violated Categories" key, ensuring violation categories are extracted from NemoGuard responses and delivered to callers for audit logging and compliance reporting.
Out of Scope Changes check	✅ Passed	All changes are scoped to fixing the JSON key mismatch: output_parsers.py lines updated, test fixtures and constants synchronized to use "Violated Categories", and regression tests added per `#2010` requirements.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes	✅ Passed	This is a minor bug fix (2 lines changed) not a major feature/refactor. PR includes comprehensive test coverage with new positive and regression tests validating the fix works correctly.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-08T22:11:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

nac7 changed the base branch from main to develop June 8, 2026 21:37

nac7 added 2 commits June 8, 2026 16:38

nac7 force-pushed the fix/nemoguard-violated-categories-key branch from 7ede184 to 542ab9e Compare June 8, 2026 21:38

github-actions Bot added the needs: signing label Jun 17, 2026

nac7 force-pushed the fix/nemoguard-violated-categories-key branch from 542ab9e to 2727e79 Compare June 18, 2026 00:51

github-actions Bot added size: S and removed needs: signing labels Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(output_parsers): use correct JSON key "Violated Categories" in nemoguard parsers#2011

fix(output_parsers): use correct JSON key "Violated Categories" in nemoguard parsers#2011
nac7 wants to merge 2 commits into
NVIDIA-NeMo:developfrom
nac7:fix/nemoguard-violated-categories-key

nac7 commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

greptile-apps Bot commented Jun 8, 2026

Confidence Score: 3/5

Sequence Diagram

Uh oh!

coderabbitai Bot commented Jun 8, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nac7 commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Tests

Summary by CodeRabbit

Uh oh!

greptile-apps Bot commented Jun 8, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

coderabbitai Bot commented Jun 8, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov Bot commented Jun 8, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nac7 commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading