Skip to content

Commit 72b6ec7

Browse files
perf(watcher): optimize wildcard matching
Split wildcard ignore patterns into simple (no slashes) and compound (with slashes) regexes. This prevents redundant evaluations of simple wildcards against cumulative path prefixes, and compound wildcards against simple directory components, thereby improving performance in the file event hot path. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: shenald-dev <245350826+shenald-dev@users.noreply.github.com>
1 parent 9f2b4f1 commit 72b6ec7

3 files changed

Lines changed: 23 additions & 8 deletions

File tree

.jules/bolt.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,3 +141,11 @@ Inside the `_is_ignored_impl` hot path, `os.path.relpath` is computationally exp
141141
Action:
142142
In `watchdog` event path normalization, bypass the computationally expensive `os.path.relpath` for the common case where `base_path` is `.` and the path is already relative by adding a fast-path condition: `elif self.base_path == "." and not os.path.isabs(path) and not path.startswith(".."): pass`.
143143
To optimize ignore pattern matching in hot loops, pre-compute a flag during initialization (e.g., `self._has_compound_ignores = any('/' in p for p in self.ignore_patterns)`) and use it to short-circuit the evaluation of compound directory paths if no slash-based ignore patterns exist.
144+
145+
## 2026-05-01 — Wildcard Regex Split Optimization
146+
147+
Learning:
148+
Inside the file watcher's `_is_ignored_impl` hot path, applying a combined wildcard regex that includes both simple patterns (e.g. `*.tmp`) and compound patterns (e.g. `src/*.tmp`) to individual path segments (`parts`) and cumulative directory prefixes (`prefix`) is redundant and computationally wasteful. A simple wildcard pattern incorrectly evaluated against a cumulative prefix path loop wastes time, and a compound wildcard will never match a simple directory segment.
149+
150+
Action:
151+
Split wildcard patterns into `simple_wildcards` (no slashes) and `compound_wildcards` (contains slashes), and compile them into separate regular expressions (`simple_wildcard_regex` and `compound_wildcard_regex`). Only apply the simple regex when iterating over individual parts, and apply the compound regex when accumulating the directory prefix. This optimization prevents unnecessary regex checks in the hot path.

src/echo/watcher.py

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,18 @@ def __init__(self, command: str, base_path: str = ".", ignore_patterns: list[str
3232
# Pre-compute exact vs wildcard patterns for faster matching
3333
self.exact_ignores = {p for p in self.ignore_patterns if not any(c in p for c in ('*', '?', '['))}
3434
wildcard_ignores = [p for p in self.ignore_patterns if any(c in p for c in ('*', '?', '['))]
35-
self.wildcard_regex = None
35+
36+
simple_wildcards = [p for p in wildcard_ignores if '/' not in p]
37+
compound_wildcards = [p for p in wildcard_ignores if '/' in p]
38+
39+
self.simple_wildcard_regex = None
40+
self.compound_wildcard_regex = None
3641
self._has_compound_ignores = any('/' in p for p in self.ignore_patterns)
37-
if wildcard_ignores:
38-
regex_str = "|".join(f"(?:{fnmatch.translate(p)})" for p in wildcard_ignores)
39-
self.wildcard_regex = re.compile(regex_str)
42+
43+
if simple_wildcards:
44+
self.simple_wildcard_regex = re.compile("|".join(f"(?:{fnmatch.translate(p)})" for p in simple_wildcards))
45+
if compound_wildcards:
46+
self.compound_wildcard_regex = re.compile("|".join(f"(?:{fnmatch.translate(p)})" for p in compound_wildcards))
4047

4148
self.current_process = None
4249
self.process_lock = threading.Lock()
@@ -188,9 +195,9 @@ def _is_ignored_impl(self, path: str) -> bool:
188195
if not self.exact_ignores.isdisjoint(parts):
189196
return True
190197

191-
if self.wildcard_regex:
198+
if self.simple_wildcard_regex:
192199
for part in parts:
193-
if self.wildcard_regex.match(part):
200+
if self.simple_wildcard_regex.match(part):
194201
return True
195202

196203
# Check for exact and wildcard ignore patterns matching cumulative prefix directories
@@ -203,7 +210,7 @@ def _is_ignored_impl(self, path: str) -> bool:
203210
prefix = f"{prefix}/{part}"
204211
if prefix in self.exact_ignores:
205212
return True
206-
if self.wildcard_regex and self.wildcard_regex.match(prefix):
213+
if self.compound_wildcard_regex and self.compound_wildcard_regex.match(prefix):
207214
return True
208215

209216
return False

tests/test_ignore.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ def test_character_class_wildcard_match():
126126
handler = CommandRunnerHandler("echo 1", ignore_patterns=["[a-z].tmp"])
127127

128128
# Must correctly categorize as wildcard and compile regex
129-
assert handler.wildcard_regex is not None
129+
assert handler.simple_wildcard_regex is not None
130130
assert "[a-z].tmp" not in handler.exact_ignores
131131

132132
assert handler._is_ignored("a.tmp") is True

0 commit comments

Comments
 (0)