security: scan ML classifiers in overlapping windows (bypass via content padding)#1154
Open
garagon wants to merge 1 commit intogarrytan:mainfrom
Open
security: scan ML classifiers in overlapping windows (bypass via content padding)#1154garagon wants to merge 1 commit intogarrytan:mainfrom
garagon wants to merge 1 commit intogarrytan:mainfrom
Conversation
…t 4000 chars The ML classifiers (TestSavantAI and DeBERTa) only scanned the first 4000 characters of page content. An injection payload placed after 4000 chars of benign content was invisible to both classifiers. Fix: scan in overlapping windows of 4000 chars with 1000-char overlap, take the maximum confidence across all windows. A 12K-char page now produces 3 windows instead of silently dropping 8K of unscanned content. Also raises the Haiku transcript classifier's tool_output cap from 4000 to 8000 chars (~2K Haiku tokens, ~$0.001 extra per scan) so the LLM classifier sees more context.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The ML prompt injection classifiers (TestSavantAI L4 and DeBERTa L4c) only scan the first 4000 characters of content. An attacker can bypass both classifiers by placing benign content before the injection payload:
What happens today
scanPageContent()callshtmlToPlainText(text)thenplain.slice(0, 4000).The same truncation affects the Haiku transcript classifier's
tool_outputparameter (checkTranscriptat line 440), which also caps at 4000 chars.Why this matters
The 4000-char cap was described in comments as "just a cheap upper bound" because "real-world injection signals land in the first few hundred tokens anyway." This is true for direct injection but not for indirect injection, where the attacker controls page content and can pad arbitrarily. A malicious page that puts 4K of real article content before a hidden injection div defeats the entire ML defense stack.
Fix
Windowed scanning (TestSavantAI + DeBERTa)
Instead of
plain.slice(0, 4000), scan in overlapping windows:A 12K-char page produces 3 windows. A 4K-or-shorter page produces 1 window (no regression).
Haiku transcript classifier
Raise
tool_outputcap from 4000 to 8000 chars. Haiku is an LLM, not a BERT model — no 512-token limit. The extra ~2K tokens cost ~$0.001 per scan and give the transcript classifier meaningful coverage of longer tool outputs.Performance impact
For pages under 4K chars (the common case): zero change —
windowedSlices()returns a single slice.For a 12K-char page: 3x classifier invocations per scan. TestSavantAI runs in ~50ms per window on CPU, so worst case adds ~100ms. DeBERTa (opt-in ensemble) adds another ~100ms. Both run in parallel with Haiku, so wall-clock impact is bounded by whichever is slower.
Test plan
windowedSlicesunit tests: 6 cases covering short text, exact boundary, overlap correctness, tail coverage, and injection-at-5000-chars detectionbun test browse/test/security-classifier.test.ts— 15/15 passbun test browse/test/security.test.ts browse/test/content-security.test.ts— 102/102 pass