Commit ea09c9a
authored
fix(middleware): improve pre-tool middleware guarding logic (#1824)
**Fix content truncation security vulnerability in PreToolVerifierMiddleware**
Summary:
The previous _analyze_content implementation truncated oversized inputs by keeping the first and last halves, silently dropping the middle.
An attacker could exploit this by padding benign content around a malicious payload, guaranteeing it lands in the truncated region and bypasses verification.
A subtler variant could also split a directive across the boundary between two disjoint chunks, making it invisible to both.
Replaced truncation with a sliding window approach: content exceeding max_content_length is scanned in overlapping windows of max_content_length chars with a stride of max_content_length // 2 (50% overlap).
Any injection directive up to stride chars long is mathematically guaranteed to appear fully within at least one window.
Windows are analyzed sequentially with early exit on the first refusing result.
Inputs requiring more than max_chunks windows are handled by selecting max_chunks evenly-spaced windows at deterministic intervals, ensuring uniform coverage of the full input.
Both max_content_length (default 32000) and max_chunks (default 16) are now configurable via PreToolVerifierMiddlewareConfig.
sanitized_input is always None for multi-window content since overlapping windows make reconstruction impossible.
Also fixed a secondary vulnerability: chunk content was interpolated verbatim into the LLM prompt inside <user_input> tags, allowing a payload containing </user_input> to break the boundary and inject instructions outside it.
Chunk content is now HTML-escaped before insertion, and the prompt label notes this explicitly so the verifier treats tags as literal text.
Also fixed the test mock helper to serialize LLM response bodies with json.dumps so the positive-path chunking behavior is actually exercised through the full JSON parse path.
Test plan:
- test_chunk_xml_tags_are_escaped_in_prompt — chunk containing </user_input> is escaped; the raw tag is absent from the injected payload in the LLM message
- test_short_content_single_llm_call — content within limit uses a single LLM call
- test_long_content_uses_sliding_windows — oversized content produces overlapping windows
- test_malicious_payload_in_middle_window_detected — the previously exploitable scenario is caught; early exit stops remaining windows
- test_malicious_payload_split_at_old_boundary_detected — directive straddling the old disjoint-chunk boundary is caught by the overlapping window
- test_violation_in_last_window_detected — violation at the tail is caught
- test_no_violation_in_any_window_returns_clean — all-clean input passes through
- test_early_exit_stops_after_first_refusing_window — scan halts after the first refusing window
- test_over_cap_selects_evenly_spaced_windows — over-cap input is analyzed via deterministic evenly-spaced sampling of exactly max_chunks windows
- test_windowed_* — aggregation of confidence (max), violation types (deduplicated union), reasons (semicolon-joined), and sanitized_input (always None)
- TestPreToolVerifierInvoke / TestPreToolVerifierStreaming — action modes and streaming path still work
## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.
## Summary by CodeRabbit
* **New Features**
* Sliding-window analysis for long inputs with configurable window size and max chunks, 50% overlap, and early-exit on refusal.
* Per-window analysis with HTML-escaping of user content sent to the model.
* **Bug Fixes**
* Aggregation improvements: max confidence, de-duplicated violation types, concatenated reasons, sanitized output disabled for multi-window results.
* Added logging for input/window sizes and sampling caps.
* **Tests**
* Comprehensive end-to-end tests covering windowing, sampling cap, early-exit, aggregation, redirection, and error handling.
Authors:
- https://github.com/cparadis-nvidia
- Will Killian (https://github.com/willkill07)
Approvers:
- Will Killian (https://github.com/willkill07)
URL: #18241 parent 998d535 commit ea09c9a
2 files changed
Lines changed: 709 additions & 10 deletions
File tree
- packages/nvidia_nat_core
- src/nat/middleware/defense
- tests/nat/middleware
Lines changed: 99 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
73 | 74 | | |
74 | 75 | | |
75 | 76 | | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
76 | 92 | | |
77 | 93 | | |
78 | 94 | | |
| |||
144 | 160 | | |
145 | 161 | | |
146 | 162 | | |
147 | | - | |
148 | | - | |
| 163 | + | |
| 164 | + | |
149 | 165 | | |
150 | 166 | | |
151 | | - | |
| 167 | + | |
152 | 168 | | |
153 | 169 | | |
154 | 170 | | |
155 | 171 | | |
156 | 172 | | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | 173 | | |
164 | 174 | | |
165 | 175 | | |
| |||
189 | 199 | | |
190 | 200 | | |
191 | 201 | | |
192 | | - | |
| 202 | + | |
| 203 | + | |
193 | 204 | | |
194 | 205 | | |
195 | 206 | | |
| |||
247 | 258 | | |
248 | 259 | | |
249 | 260 | | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
250 | 339 | | |
251 | 340 | | |
252 | 341 | | |
| |||
0 commit comments