Skip to content

feat: add context bloat detection rail#48

Open
MuneezaAzmat wants to merge 2 commits into
trustyai-explainability:developfrom
MuneezaAzmat:feat/context-bloat-detection
Open

feat: add context bloat detection rail#48
MuneezaAzmat wants to merge 2 commits into
trustyai-explainability:developfrom
MuneezaAzmat:feat/context-bloat-detection

Conversation

@MuneezaAzmat
Copy link
Copy Markdown

Summary

  • Adds a new context_bloat_detection guardrail that detects context-manipulation attacks (padded, oversized, or repetitive content in tool outputs, RAG chunks, or user input)
  • Checks (cheapest first): size cap, Shannon entropy, longest char run, n-gram repetition
  • Supports reject, truncate, and warn actions via config
  • Registers ContextBloatDetectionConfig Pydantic model in RailsConfigData with sensible defaults

Test plan

  • Verify config loads with default values
  • Verify oversized, low-entropy, padded, and repetitive inputs are detected
  • Verify truncate mode truncates to max_chars
  • Verify warn mode logs but does not block
  • Verify normal text passes all checks

Add a new guardrail that detects context-manipulation attacks where
attacker-controlled content is padded, oversized, or repetitively
structured to cause system prompt forgetting or exhaust token budget.

Checks (cheapest first): size cap, Shannon entropy, longest char run,
n-gram repetition. Supports reject, truncate, and warn actions.
Copy link
Copy Markdown
Collaborator

@m-misiura m-misiura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. To me the concept of using this rail to detect suspiciously long texts appears sound, although I think there might be a considerable overlap between the n-gram and the entropy method (I could be missing something though). You might also consider if you want to land this upstream.

In any case, from an engineering side, we'll need:

  • test files: there is none currently
  • config is always always instantiated, which breaks the optional pattern: IIUC, this rail is configured even when the user didn't ask for it
  • there is no __init__.py: meaning the library won't be importable
  • truncation happens after full analysis

Happy to expand if anything is unclear

- Add __init__.py so the library is importable
- Short-circuit truncation after size_cap instead of running full analysis
- Added unit tests covering config, detection paths, action modes
- Fix typo in config.yml
@MuneezaAzmat
Copy link
Copy Markdown
Author

Thanks for the comments @m-misiura :

Entropy catches character-level padding , while n-gram catches phrase-level repetition
example of what entropy will catch but n-gram wont and vice versa:
───────────────
Text: "ababab..."
Entropy: 1.000 🚩
Repetition Ratio: 0.000
───────────────
Text: "The quick brown fox jumps over" x30
Entropy: 4.109
Repetition Ratio: 0.966 🚩
───────────────

  1. pushed the test file i used to verify - not sure if more tests are needed
  2. kept default_factory to match every other rail in RailsConfigData. Afaiu It doesn't activate the rail , rail only runs when the user explicitly wires it in the flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants