Skip to content

feat(prompt): cross-source seam injection scan for context files (#395)#396

Merged
Lexus2016 merged 1 commit into
mainfrom
evolution/issue-395-cross-seam-context-scan
Jun 20, 2026
Merged

feat(prompt): cross-source seam injection scan for context files (#395)#396
Lexus2016 merged 1 commit into
mainfrom
evolution/issue-395-cross-seam-context-scan

Conversation

@Lexus2016

Copy link
Copy Markdown
Owner

Closes #395

What

Closes the pre-existing scanner gap surfaced during the #388 security review (PR #393): the context-file injection scanner (_scan_context_contentthreat_patterns "context" scope) is regex-on-contiguous-text, so a payload split across the seam between two concatenated fragments evades it — each fragment scans clean, and the structural markers between them (## label headers, > imported from … markers) insert non-word chars that break a contiguous regex.

Fix

Add a cross-seam pass (_scan_context_seams): join the fragment bodies with those markers removed and scan as one stream. Wired into both multi-source loaders:

  • _load_agents_md — joins nested AGENTS.md/AGENTS.override.md bodies; the ## label header of each section is dropped via _section_body, but the document's own internal markdown headings are preserved (no within-doc false bridges).
  • _load_claude_md — strips the > imported from … markers from the import-resolved blob.

A cross-seam hit blocks the whole source (fail-safe: the combination is the attack). Per-fragment scanning is unchanged — this is an added gate, not a replacement.

Tradeoff

This is the bounded mitigation, not the "move off regex" rewrite. Joining bodies could in principle create a rare false-positive across an innocent boundary; blocking is fail-safe and the message is explicit, and real repos are extremely unlikely to hit it. A full structural/non-regex detector remains the longer-term option.

Tests

tests/agent/test_prompt_builder.py:

  • AGENTS.md cross-file split payload → blocked
  • CLAUDE.md body→import split → blocked (the whole re-scan alone would miss this; proves the seam path)
  • CLAUDE.md import→import split → blocked
  • benign two-file merge with internal ## headings → not blocked

Full suite: 174 passed, 1 skipped (prompt_builder + system_prompt).

Closes the pre-existing scanner gap surfaced by the #388 security review:
the context-file injection scanner is regex-on-contiguous-text, so a
payload split across the seam between two concatenated fragments slips
through (each fragment scans clean; the structural markers between them —
## headers, > imported-from markers — break the regex).

Add a cross-seam pass: join the fragment BODIES with those markers
removed and scan as one stream. Wired into both multi-source loaders:
- _load_agents_md: joins nested AGENTS.md/override bodies (## headers
  dropped via _section_body, internal markdown headings preserved).
- _load_claude_md: strips > imported-from markers from the import-
  resolved blob.

A cross-seam hit blocks the whole source (fail-safe — the combination is
the attack). Per-fragment scanning is unchanged; this is an added gate.

Tests: AGENTS.md cross-file split, CLAUDE.md body->import and
import->import splits all blocked; benign two-file merge with internal
headings not blocked. 167 passed.
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: evolution/issue-395-cross-seam-context-scan vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 11312 on HEAD, 11312 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5941 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@Lexus2016 Lexus2016 merged commit 436fa7f into main Jun 20, 2026
39 checks passed
@Lexus2016 Lexus2016 deleted the evolution/issue-395-cross-seam-context-scan branch June 20, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SECURITY] Context-file injection scanner misses payloads split across concatenated sources

1 participant