Commit c8538c6
committed
search: pre-filter literal regex patterns with case-insensitive substring search
The new substring fast-path costs ~5-15 ms (~2-3% of wall time) — a
~40-60x reduction in string-matching cost.
`nix search nixpkgs gemma` against a warm eval cache spends ~52% of
its wall time inside libc++'s std::regex matcher. Every call into
regex_search / sregex_iterator is CPU-heavy (full NFA walk of the
input) and allocation-heavy (backtracking state on the heap, per
call). The overwhelming majority of derivations searched do not match
the user's pattern, so the regex engine does this work only to return
false.
When the pattern contains no POSIX-extended regex metacharacters
(e.g. plain words like 'gemma'), a case-insensitive substring search
is equivalent and orders of magnitude cheaper. Use it as a
pre-filter; if the literal isn't present in any of (path, name,
description), the regex cannot match either and we skip it. If it
*is* present, we fall through to the existing regex iterator so that
hiliteMatches still receives proper smatch objects.
Same treatment for excludeRegexes: literal exclude patterns are
skipped entirely if the literal is absent, avoiding the regex_search.
Measured on `nix search nixpkgs gemma`, hyperfine 30 runs (3 warmup),
warm cache:
before (vkwzvmrt): 1.193 s ± 0.012 s
after (this commit): 0.568 s ± 0.006 s
That's a 2.10x speedup, ~625 ms / ~52% off wall time. Cold cache also
benefits, though less so (14.690 s -> 13.738 s, ~6.5%), since cold
time is dominated by Nix expression evaluation, not regex.
Output is byte-for-byte identical for both literal and non-literal
patterns; verified with 'gemma', '^gem', '[Gg]emma', and
'gemma openllm' (AND-of-regexes).1 parent 616df97 commit c8538c6
1 file changed
Lines changed: 91 additions & 24 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
| 15 | + | |
14 | 16 | | |
| 17 | + | |
15 | 18 | | |
16 | 19 | | |
17 | 20 | | |
| |||
25 | 28 | | |
26 | 29 | | |
27 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
28 | 87 | | |
29 | 88 | | |
30 | 89 | | |
| |||
70 | 129 | | |
71 | 130 | | |
72 | 131 | | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
78 | 135 | | |
79 | | - | |
80 | | - | |
| 136 | + | |
81 | 137 | | |
82 | | - | |
| 138 | + | |
83 | 139 | | |
84 | 140 | | |
85 | 141 | | |
| |||
119 | 175 | | |
120 | 176 | | |
121 | 177 | | |
122 | | - | |
123 | | - | |
124 | | - | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
125 | 196 | | |
126 | 197 | | |
127 | 198 | | |
128 | | - | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
129 | 204 | | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
142 | 209 | | |
143 | 210 | | |
144 | 211 | | |
| |||
0 commit comments