feat: add context bloat detection rail#48
Conversation
Add a new guardrail that detects context-manipulation attacks where attacker-controlled content is padded, oversized, or repetitively structured to cause system prompt forgetting or exhaust token budget. Checks (cheapest first): size cap, Shannon entropy, longest char run, n-gram repetition. Supports reject, truncate, and warn actions.
m-misiura
left a comment
There was a problem hiding this comment.
Thanks for the PR. To me the concept of using this rail to detect suspiciously long texts appears sound, although I think there might be a considerable overlap between the n-gram and the entropy method (I could be missing something though). You might also consider if you want to land this upstream.
In any case, from an engineering side, we'll need:
- test files: there is none currently
- config is always always instantiated, which breaks the optional pattern: IIUC, this rail is configured even when the user didn't ask for it
- there is no
__init__.py: meaning the library won't be importable - truncation happens after full analysis
Happy to expand if anything is unclear
- Add __init__.py so the library is importable - Short-circuit truncation after size_cap instead of running full analysis - Added unit tests covering config, detection paths, action modes - Fix typo in config.yml
|
Thanks for the comments @m-misiura : Entropy catches character-level padding , while n-gram catches phrase-level repetition
|
Summary
context_bloat_detectionguardrail that detects context-manipulation attacks (padded, oversized, or repetitive content in tool outputs, RAG chunks, or user input)reject,truncate, andwarnactions via configContextBloatDetectionConfigPydantic model inRailsConfigDatawith sensible defaultsTest plan
truncatemode truncates tomax_charswarnmode logs but does not block