feat(prompt): cross-source seam injection scan for context files (#395)#396
Merged
Merged
Conversation
Closes the pre-existing scanner gap surfaced by the #388 security review: the context-file injection scanner is regex-on-contiguous-text, so a payload split across the seam between two concatenated fragments slips through (each fragment scans clean; the structural markers between them — ## headers, > imported-from markers — break the regex). Add a cross-seam pass: join the fragment BODIES with those markers removed and scan as one stream. Wired into both multi-source loaders: - _load_agents_md: joins nested AGENTS.md/override bodies (## headers dropped via _section_body, internal markdown headings preserved). - _load_claude_md: strips > imported-from markers from the import- resolved blob. A cross-seam hit blocks the whole source (fail-safe — the combination is the attack). Per-fragment scanning is unchanged; this is an added gate. Tests: AGENTS.md cross-file split, CLAUDE.md body->import and import->import splits all blocked; benign two-file merge with internal headings not blocked. 167 passed.
Contributor
🔎 Lint report:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #395
What
Closes the pre-existing scanner gap surfaced during the #388 security review (PR #393): the context-file injection scanner (
_scan_context_content→threat_patterns"context" scope) is regex-on-contiguous-text, so a payload split across the seam between two concatenated fragments evades it — each fragment scans clean, and the structural markers between them (## labelheaders,> imported from …markers) insert non-word chars that break a contiguous regex.Fix
Add a cross-seam pass (
_scan_context_seams): join the fragment bodies with those markers removed and scan as one stream. Wired into both multi-source loaders:_load_agents_md— joins nestedAGENTS.md/AGENTS.override.mdbodies; the## labelheader of each section is dropped via_section_body, but the document's own internal markdown headings are preserved (no within-doc false bridges)._load_claude_md— strips the> imported from …markers from the import-resolved blob.A cross-seam hit blocks the whole source (fail-safe: the combination is the attack). Per-fragment scanning is unchanged — this is an added gate, not a replacement.
Tradeoff
This is the bounded mitigation, not the "move off regex" rewrite. Joining bodies could in principle create a rare false-positive across an innocent boundary; blocking is fail-safe and the message is explicit, and real repos are extremely unlikely to hit it. A full structural/non-regex detector remains the longer-term option.
Tests
tests/agent/test_prompt_builder.py:##headings → not blockedFull suite: 174 passed, 1 skipped (prompt_builder + system_prompt).