fix(output_parsers): use correct JSON key "Violated Categories" in nemoguard parsers#2011
fix(output_parsers): use correct JSON key "Violated Categories" in nemoguard parsers#2011nac7 wants to merge 2 commits into
Conversation
…moguard parsers Both nemoguard_parse_prompt_safety and nemoguard_parse_response_safety checked for key "Safety Categories" when extracting violation categories from NemoGuard ContentSafety model output, but the model actually returns key "Violated Categories" (as documented in each function's own docstring). This caused violation categories to be silently dropped on every unsafe response, breaking audit logging, granular guardrail policies, and compliance reporting that depend on knowing which policy categories were flagged. Fixes NVIDIA-NeMo#2010
Existing tests provided mock NemoGuard JSON responses with the wrong key Safety Categories. Now that the parser correctly reads Violated Categories, update all mock response fixtures to match what the real model emits. The intentional regression tests in test_content_safety_output_parsers.py that verify Safety Categories no longer extracts data are left unchanged.
7ede184 to
542ab9e
Compare
Greptile SummaryThis PR fixes a key mismatch in
|
| Filename | Overview |
|---|---|
| nemoguardrails/llm/output_parsers.py | Correct fix: both nemoguard_parse_prompt_safety and nemoguard_parse_response_safety now look for "Violated Categories" matching the actual NemoGuard model output and their own docstrings. |
| tests/test_content_safety_output_parsers.py | Tests updated to use "Violated Categories" throughout; two new regression tests added that confirm the old wrong key no longer extracts categories. |
| tests/guardrails/test_data.py | Expected prompt strings in test fixtures updated from "Safety Categories" to "Violated Categories" to match the corrected parser. |
| tests/guardrails/test_content_safety_iorails_actions.py | Test JSON stubs updated to use "Violated Categories" key for both prompt and response safety test cases. |
| tests/test_content_safety_integration.py | Integration test JSON responses updated to use "Violated Categories", now correctly asserting that categories are extracted. |
| benchmark/mock_llm_server/configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env | Not updated in this PR — mock UNSAFE_TEXT still uses old "Safety Categories" key, causing violation categories to be silently dropped during benchmark runs. |
| tests/guardrails/test_iorails_telemetry.py | Telemetry test fixture UNSAFE_INPUT_JSON updated to use "Violated Categories" key. |
| tests/guardrails/test_rails_manager.py | Rails manager test stubs updated to use "Violated Categories" for both input and output unsafe JSON. |
Sequence Diagram
sequenceDiagram
participant App
participant Parser as output_parsers.py
participant NemoGuard as NemoGuard Model
App->>NemoGuard: Safety check request
NemoGuard-->>Parser: "{"User Safety": "unsafe", "Violated Categories": "S1, S8"}"
Note over Parser: Before fix: looked for "Safety Categories" → not found → []
Note over Parser: After fix: looks for "Violated Categories" → found → ["S1", "S8"]
Parser-->>App: [False, "S1", "S8"]
App->>App: Dispatch on violation type ✓
Reviews (1): Last reviewed commit: "test: update mock responses to use corre..." | Re-trigger Greptile
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis PR fixes a silent data-loss bug in NemoGuard content-safety response parsing. The parser functions were reading from the wrong JSON field name ( ChangesNemoGuard JSON Key Fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
542ab9e to
2727e79
Compare
Fixes #2010.
Problem
nemoguard_parse_prompt_safetyandnemoguard_parse_response_safetyinnemoguardrails/llm/output_parsers.pylooked for key"Safety Categories"when extracting violation categories from the NemoGuard ContentSafety model response. However, the NemoGuard model returns key"Violated Categories"— as documented in each function's own docstring.This caused violation categories to be silently dropped on every unsafe response:
Impact:
Fix
Change
"Safety Categories"→"Violated Categories"on lines 163 and 202 ofoutput_parsers.pyto match the key the NemoGuard model actually emits (and the key documented in the function docstrings).Tests
Updated
tests/test_content_safety_output_parsers.py:"Violated Categories"keytest_wrong_key_safety_categories_yields_no_categoriesregression tests for both parsers to confirm the old wrong key no longer extracts categoriesSummary by CodeRabbit
Bug Fixes
Tests