feat: add context bloat detection rail by MuneezaAzmat · Pull Request #48 · trustyai-explainability/NeMo-Guardrails

MuneezaAzmat · 2026-05-13T12:21:43Z

Summary

Adds a new context_bloat_detection guardrail that detects context-manipulation attacks (padded, oversized, or repetitive content in tool outputs, RAG chunks, or user input)
Checks (cheapest first): size cap, Shannon entropy, longest char run, n-gram repetition
Supports reject, truncate, and warn actions via config
Registers ContextBloatDetectionConfig Pydantic model in RailsConfigData with sensible defaults

Test plan

Verify config loads with default values
Verify oversized, low-entropy, padded, and repetitive inputs are detected
Verify truncate mode truncates to max_chars
Verify warn mode logs but does not block
Verify normal text passes all checks

Add a new guardrail that detects context-manipulation attacks where attacker-controlled content is padded, oversized, or repetitively structured to cause system prompt forgetting or exhaust token budget. Checks (cheapest first): size cap, Shannon entropy, longest char run, n-gram repetition. Supports reject, truncate, and warn actions.

m-misiura

Thanks for the PR. To me the concept of using this rail to detect suspiciously long texts appears sound, although I think there might be a considerable overlap between the n-gram and the entropy method (I could be missing something though). You might also consider if you want to land this upstream.

In any case, from an engineering side, we'll need:

test files: there is none currently
config is always always instantiated, which breaks the optional pattern: IIUC, this rail is configured even when the user didn't ask for it
there is no __init__.py: meaning the library won't be importable
truncation happens after full analysis

Happy to expand if anything is unclear

- Add __init__.py so the library is importable - Short-circuit truncation after size_cap instead of running full analysis - Added unit tests covering config, detection paths, action modes - Fix typo in config.yml

MuneezaAzmat · 2026-05-14T12:57:05Z

Thanks for the comments @m-misiura :

Entropy catches character-level padding , while n-gram catches phrase-level repetition
example of what entropy will catch but n-gram wont and vice versa:
───────────────
Text: "ababab..."
Entropy: 1.000 🚩
Repetition Ratio: 0.000
───────────────
Text: "The quick brown fox jumps over" x30
Entropy: 4.109
Repetition Ratio: 0.966 🚩
───────────────

pushed the test file i used to verify - not sure if more tests are needed
kept default_factory to match every other rail in RailsConfigData. Afaiu It doesn't activate the rail , rail only runs when the user explicitly wires it in the flow.

m-misiura requested changes May 13, 2026

View reviewed changes

fix: address PR review feedback

0b85b02

- Add __init__.py so the library is importable - Short-circuit truncation after size_cap instead of running full analysis - Added unit tests covering config, detection paths, action modes - Fix typo in config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add context bloat detection rail#48

feat: add context bloat detection rail#48
MuneezaAzmat wants to merge 2 commits into
trustyai-explainability:developfrom
MuneezaAzmat:feat/context-bloat-detection

MuneezaAzmat commented May 13, 2026

Uh oh!

m-misiura left a comment

Uh oh!

MuneezaAzmat commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MuneezaAzmat commented May 13, 2026

Summary

Test plan

Uh oh!

m-misiura left a comment

Choose a reason for hiding this comment

Uh oh!

MuneezaAzmat commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants