Skip to content

perf(grep): up to 14.5× speedup via preFilter extensions and matcher reuse#248

Open
Hazzng wants to merge 5 commits into
vercel-labs:mainfrom
Hazzng:perf/grep-matcher
Open

perf(grep): up to 14.5× speedup via preFilter extensions and matcher reuse#248
Hazzng wants to merge 5 commits into
vercel-labs:mainfrom
Hazzng:perf/grep-matcher

Conversation

@Hazzng
Copy link
Copy Markdown
Contributor

@Hazzng Hazzng commented May 18, 2026

Summary

  • Anchored alternation preFilter (regex.ts): extractPreFilter now strips leading ^ / trailing $ from each alternative before extracting the literal needle. Patterns like ^def \|^async def (previously unoptimized) now get the String.indexOf fast-path instead of running the full RE2 NFA against every line.
  • Matcher reuse extended (user-regex.ts): match(), replace() (both string and callback paths), search(), and matchAll() now route through acquireMatcher(). Previously only test() and exec() used the cached matcher, leaving awk/sed hot-paths allocating a new RE2JS.Matcher on every call.
  • Multiline file-level preFilter (matcher.ts): searchContentMultiline() now receives preFilter and performs a whole-file content.includes(needle) check before splitting lines or invoking RE2 — files with no matching needle are rejected in O(n) string scan instead of O(n·m) NFA.
  • grep loop preFilter short-circuit (grep.ts): after readFile, a file-level needle check now skips searchContent (and content.split("\n")) entirely for files with no match. Count-only mode (-c) emits 0\n correctly without entering the line loop.

Benchmark results

100 files × 100 lines, 5 runs, median reported. Baseline: just-bash@3.0.1 (npm latest).

Pattern Before (npm) After (branch) Speedup
^def |^async def (anchored BRE — the key case) 139.5 ms 9.6 ms 14.5×
def (simple literal — baseline) 17.6 ms 7.7 ms 2.3×
def |async def (unanchored alternation) 29.9 ms 9.3 ms 3.2×

The baseline itself dropped 2.3× because acquireMatcher now covers match(), search(), and replace() — reducing GC pressure for all regex operations, not just the grep hot path.

Root cause (the ^def \|^async def case)

literalFromAlternative in regex.ts rejected any alternative containing ^ or $, treating them as regex metacharacters. Since ^L implies the line contains L, and L$ implies the line contains L, stripping outer anchors before needle extraction is provably sound (false positives are safe; RE2 re-checks the match). The fix is a 17-line anchor-strip in literalFromAlternative before the existing character loop.

Tests added

File New tests
src/commands/search-engine/regex.test.ts anchored single / alternation / mixed-anchor / anchor-only / escaped-anchor cases
src/regex/user-regex.test.ts acquireMatcher reuse for match, replace (string + callback), search, matchAll, 1000-call state leak check
src/commands/search-engine/matcher.test.ts multiline preFilter: no-needle early exit, needle-present normal run, invertMatch bypass

Hazzng added 4 commits May 16, 2026 17:43
Switch all UserRegex methods to acquireMatcher(input) to avoid
per-call allocations, and propagate preFilter into searchContentMultiline
so files with no needle are skipped without scanning every line.
…tions

- matchAll(): revert to fresh `_re2.matcher(input)`. As a generator that
  suspends at yield, sharing the cached `_matcher` risks corruption if a
  caller interleaves any other UserRegex method (test/exec/search/replace)
  between two next() calls — acquireMatcher would reset/repoint the shared
  matcher, breaking the in-progress iteration. All other synchronous
  methods continue to use acquireMatcher.

- matcher.test.ts: replace two toContain assertions in the multiline
  preFilter tests with full output equality (per AGENTS.md guidance), so
  regressions in line numbering or group separators surface.

- grep.ts: add file-level preFilter check right after readFile — skips
  searchContent (and the content.split("\n")) entirely when no needle
  exists in the file. Handles countOnly correctly (emits "0\n" or
  "filename:0\n" without entering the line loop).

- matcher.ts searchContentMultiline: fix preFilter early return to emit
  "0\n"/"filename:0\n" in count-only mode instead of empty string.

- user-regex.ts replace() callback path: capture matcher.start(0) and
  matcher.end(0) before invoking the callback — acquireMatcher mutates
  charSequence in-place, so a re-entrant call would corrupt those reads.
@Hazzng Hazzng requested a review from cramforce as a code owner May 18, 2026 13:27
@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

@Hazzng is attempting to deploy a commit to the Vercel Labs Team on Vercel.

A member of the Team first needs to authorize it.

@Hazzng
Copy link
Copy Markdown
Contributor Author

Hazzng commented May 18, 2026

@cramforce i've made some code optimization for grep so its getting quite abit faster now compared to the last PR. Can you have a look please ? thanks

Comment thread packages/just-bash/src/regex/user-regex.ts Outdated
The callback may re-enter the same UserRegex instance, which would route
through acquireMatcher and repoint the shared matcher's charSequence,
causing the next matcher.find(pos) to advance through the wrong input.
Mirrors the matchAll() fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants