Skip to content

fix(indexer): treat escaped "\!" gitignore lines as literal, not negation#193

Open
jichaowang02-lang wants to merge 2 commits into
cocoindex-io:mainfrom
jichaowang02-lang:fix/gitignore-escaped-bang
Open

fix(indexer): treat escaped "\!" gitignore lines as literal, not negation#193
jichaowang02-lang wants to merge 2 commits into
cocoindex-io:mainfrom
jichaowang02-lang:fix/gitignore-escaped-bang

Conversation

@jichaowang02-lang

Copy link
Copy Markdown

Summary

_normalize_gitignore_lines mishandles a .gitignore line that escapes a leading
! (\!name). Per gitignore semantics, \!name means "ignore a file literally
named !name" — it is not a negation. The function unescapes the backslash and
then runs its negation check, so \!name is unescaped to !name and misread as a
re-include rule.

Impact

A literal entry like \!important is normalized to !**/important (a negation),
which cancels an unrelated important ignore rule — files the user meant to
ignore start getting indexed.

# .gitignore
important       # ignore files named "important"
\!important     # ignore the file literally named "!important"
file before after
important ❌ no longer ignored ✅ ignored
!important ❌ not ignored ✅ ignored

Fix

Detect the \# / \! escape before the negation check and skip negation
handling for escaped lines. \!important now normalizes to **/!important — since
! is no longer the leading character, it is treated literally. Ordinary negation
(!build/keep.txt) and escaped-# behavior are unchanged.

Tests

Adds tests/test_indexer_gitignore.py (the function had no tests): plain, negated,
escaped-#, escaped-!, and subdirectory-prefix cases, plus an end-to-end
GitIgnoreSpec assertion that the escaped line no longer re-includes unrelated
matches. Verified against pathspec==1.1.1.

…tion

`_normalize_gitignore_lines` unescaped a leading "\#"/"\!" and only *then*
checked for negation. For "\!name" the unescape produced "!name", which the
negation check misread as a re-include rule, emitting "!**/name". A literal
ignore such as `\!important` therefore cancelled an unrelated `important`
rule instead of ignoring the file named "!important".

Detect the escape before the negation check and skip negation handling for
escaped lines, so "\!important" normalizes to "**/!important" (the '!' is no
longer pattern-leading, so it is literal). Ordinary negation and escaped "\#"
behavior are unchanged.

Add tests/test_indexer_gitignore.py covering plain/negated/escaped patterns
and an end-to-end GitIgnoreSpec check that the escaped line no longer
re-includes unrelated matches.
Copilot AI review requested due to automatic review settings June 18, 2026 23:19

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes .gitignore normalization so escaped leading \! patterns are treated as literal ! entries (not negations), preventing unintended re-includes during indexing.

Changes:

  • Adjust _normalize_gitignore_lines to detect escaped \! / \# before applying negation logic.
  • Add new unit tests covering plain patterns, negation, escaped #, escaped !, subdirectory prefixing, and an end-to-end GitIgnoreSpec regression.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/cocoindex_code/indexer.py Updates .gitignore line normalization to avoid misclassifying escaped bang as a negation.
tests/test_indexer_gitignore.py Adds regression/unit coverage for .gitignore normalization (including the escaped \! scenario).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +56 to +63
escaped = line.startswith("\\#") or line.startswith("\\!")
if escaped:
line = line[1:]
negated = False
else:
negated = line.startswith("!")
if negated:
line = line[1:]
Comment on lines +25 to +35
def test_escaped_hash_is_literal_not_comment() -> None:
# "\#notacomment" -> a file literally named "#notacomment".
assert _normalize_gitignore_lines(["\\#notacomment"], ROOT) == ["**/#notacomment"]


def test_escaped_bang_is_literal_not_negation() -> None:
# Regression: "\!important" means "ignore a file literally named '!important'",
# NOT a negation, so it must not become a "!"-prefixed (negation) pattern.
assert _normalize_gitignore_lines(["\\!important"], ROOT) == ["**/!important"]


Addressing review feedback: the previous fix stripped the backslash from an
escaped "\!"/"\#" line and relied on the "**/" prefix to keep the leading
"!"/"#" from being misread. But a pattern that contains a "/" is anchored and
gets no "**/" prefix, so "\!dir/file" normalized to "!dir/file" — which
GitIgnoreSpec reads back as a negation (and "\#dir/file" as a comment),
dropping the rule.

Keep the backslash in the emitted pattern so pathspec parses the "!"/"#"
literally in both the prefixed ("**/\!foo") and anchored ("\!dir/file") cases.

Adds an end-to-end test for path-bearing escaped patterns; updates the
exact-form assertions to the now-escaped output.
@jichaowang02-lang

Copy link
Copy Markdown
Author

Good catch — fixed in ec2ce38. Path-bearing escaped patterns (\!dir/file, \#dir/other) get no **/ prefix, so the leading !/# landed at the start of the pattern and pathspec read it back as a negation/comment. I now keep the backslash in the emitted pattern so pathspec parses it literally in both the prefixed and anchored cases, and added an end-to-end test covering the slash case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants