perf(grep): up to 14.5× speedup via preFilter extensions and matcher reuse#248
Open
Hazzng wants to merge 5 commits into
Open
perf(grep): up to 14.5× speedup via preFilter extensions and matcher reuse#248Hazzng wants to merge 5 commits into
Hazzng wants to merge 5 commits into
Conversation
Switch all UserRegex methods to acquireMatcher(input) to avoid per-call allocations, and propagate preFilter into searchContentMultiline so files with no needle are skipped without scanning every line.
…tions
- matchAll(): revert to fresh `_re2.matcher(input)`. As a generator that
suspends at yield, sharing the cached `_matcher` risks corruption if a
caller interleaves any other UserRegex method (test/exec/search/replace)
between two next() calls — acquireMatcher would reset/repoint the shared
matcher, breaking the in-progress iteration. All other synchronous
methods continue to use acquireMatcher.
- matcher.test.ts: replace two toContain assertions in the multiline
preFilter tests with full output equality (per AGENTS.md guidance), so
regressions in line numbering or group separators surface.
- grep.ts: add file-level preFilter check right after readFile — skips
searchContent (and the content.split("\n")) entirely when no needle
exists in the file. Handles countOnly correctly (emits "0\n" or
"filename:0\n" without entering the line loop).
- matcher.ts searchContentMultiline: fix preFilter early return to emit
"0\n"/"filename:0\n" in count-only mode instead of empty string.
- user-regex.ts replace() callback path: capture matcher.start(0) and
matcher.end(0) before invoking the callback — acquireMatcher mutates
charSequence in-place, so a re-entrant call would corrupt those reads.
|
@Hazzng is attempting to deploy a commit to the Vercel Labs Team on Vercel. A member of the Team first needs to authorize it. |
Contributor
Author
|
@cramforce i've made some code optimization for grep so its getting quite abit faster now compared to the last PR. Can you have a look please ? thanks |
cramforce
reviewed
May 24, 2026
The callback may re-enter the same UserRegex instance, which would route through acquireMatcher and repoint the shared matcher's charSequence, causing the next matcher.find(pos) to advance through the wrong input. Mirrors the matchAll() fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
regex.ts):extractPreFilternow strips leading^/ trailing$from each alternative before extracting the literal needle. Patterns like^def \|^async def(previously unoptimized) now get theString.indexOffast-path instead of running the full RE2 NFA against every line.user-regex.ts):match(),replace()(both string and callback paths),search(), andmatchAll()now route throughacquireMatcher(). Previously onlytest()andexec()used the cached matcher, leaving awk/sed hot-paths allocating a newRE2JS.Matcheron every call.matcher.ts):searchContentMultiline()now receivespreFilterand performs a whole-filecontent.includes(needle)check before splitting lines or invoking RE2 — files with no matching needle are rejected in O(n) string scan instead of O(n·m) NFA.grep.ts): afterreadFile, a file-level needle check now skipssearchContent(andcontent.split("\n")) entirely for files with no match. Count-only mode (-c) emits0\ncorrectly without entering the line loop.Benchmark results
100 files × 100 lines, 5 runs, median reported. Baseline:
just-bash@3.0.1(npm latest).^def |^async def(anchored BRE — the key case)def(simple literal — baseline)def |async def(unanchored alternation)The baseline itself dropped 2.3× because
acquireMatchernow coversmatch(),search(), andreplace()— reducing GC pressure for all regex operations, not just the grep hot path.Root cause (the
^def \|^async defcase)literalFromAlternativeinregex.tsrejected any alternative containing^or$, treating them as regex metacharacters. Since^Limplies the line containsL, andL$implies the line containsL, stripping outer anchors before needle extraction is provably sound (false positives are safe; RE2 re-checks the match). The fix is a 17-line anchor-strip inliteralFromAlternativebefore the existing character loop.Tests added
src/commands/search-engine/regex.test.tssrc/regex/user-regex.test.tsacquireMatcherreuse formatch,replace(string + callback),search,matchAll, 1000-call state leak checksrc/commands/search-engine/matcher.test.ts