Commit b02afdb
authored
Optimize find_react_components
Runtime improvement: the optimized code reduces end-to-end runtime from ~7.34 ms to ~5.82 ms — a 26% speedup — by removing Python-level work and repeated allocations in the hot path.
What changed (concrete optimizations)
- Cached source bytes: added an lru_cache-backed _encode_source(source) so repeated source.encode("utf-8") calls reuse the same bytes object instead of allocating/encoding every time.
- Faster hook extraction: replaced the Python-level regex iteration + seen-set loop with HOOK_EXTRACT_RE.findall(...) then list(dict.fromkeys(...)) to deduplicate while preserving first-seen order. This shifts most work into C (re.findall and dict construction) and removes per-match Python bookkeeping.
- Cheap early-exit for memo checks: added a fast substring check ("memo(" and "React.memo") to skip the more expensive AST-parent walk and repeated slice+decode operations when memo is not present in the source.
- Minor micro-alloc reduction: switched some ephemeral lists to tuples where appropriate (e.g., memo_patterns) and removed duplicated encode calls elsewhere.
Why these changes speed things up
- Avoiding repeated .encode calls eliminates expensive per-function memory allocations and Python function-call overhead. The original profiler showed significant time in source.encode() sites (e.g., _extract_props_type, _function_returns_jsx). Caching the encoded bytes eliminates these hotspots when the same source string is inspected multiple times (typical when scanning many functions in one file).
- Using regex.findall and dict.fromkeys moves the heavy lifting into C implementations (re engine and dict internals), cutting Python loop/branch overhead. The line profiler shows _extract_hooks_used time dropped substantially.
- The substring check for memo presence is O(n) at C speed and avoids the common-case cost of doing tree/parent inspection and repeated byte-slicing/decoding for every function when memo is not used in the file.
- Together these changes reduce per-function overhead in the main loop of find_react_components, which is where most time is spent for large files.
How this affects real workloads / hot paths
- find_react_components is used during project-wide discovery and in downstream analyzers (see integration tests). When scanning large files with many functions (the realistic hot path), per-function overhead dominates; these changes reduce that overhead, so the largest wins are for big files or many functions in a single source (the annotated large-scale tests show the biggest improvement: ~34% in that test).
- Small files or single-function files still benefit (microsecond-level wins) but the biggest impact is when the analyzer processes hundreds of functions in one source — exactly the scenario exercised by the large-scale annotated test and the integration flows that call find_react_components.
Which tests / cases benefit most
- Large-scale detection and deduping tests (thousands of functions, many repeated hook patterns) get the largest absolute wins because of eliminated allocations and cheaper hook extraction.
- Any test or real workload that repeatedly slices/decodes source bytes for props/memo detection benefits from the cached encoded bytes.
- Small, early-exit scenarios (files with "use server") are unaffected functionally and still return quickly.
Behavioral/implementation notes and trade-offs
- Semantics preserved: the changes do not change detection logic; they only change how data is extracted (same regex, same tree checks).
- Memory trade-off: lru_cache(maxsize=32) will keep recent encoded source bytes alive (small, bounded memory increase). This is an intentional and reasonable trade-off for eliminating repeated encodings in the common case of scanning many functions from the same file.
- The early substring check is conservative: it only avoids the AST/decoding work when memo-like identifiers are absent; when present, the full checks still run so detection correctness is unchanged.
Summary
- Primary benefit: 26% runtime reduction (7.34 ms → 5.82 ms) by cutting Python-level loops and repeated allocations in the hot path.
- Changes are low-risk, preserve behavior, and give the biggest improvements on large files and workloads that scan many functions in the same source (the common case for project analysis).1 parent 6ee3458 commit b02afdb
1 file changed
Lines changed: 41 additions & 18 deletions
Lines changed: 41 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
168 | 169 | | |
169 | 170 | | |
170 | 171 | | |
171 | | - | |
172 | 172 | | |
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
176 | 176 | | |
| 177 | + | |
177 | 178 | | |
178 | 179 | | |
179 | 180 | | |
| |||
194 | 195 | | |
195 | 196 | | |
196 | 197 | | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
206 | 206 | | |
207 | 207 | | |
208 | 208 | | |
209 | 209 | | |
210 | | - | |
| 210 | + | |
211 | 211 | | |
212 | 212 | | |
213 | 213 | | |
| |||
238 | 238 | | |
239 | 239 | | |
240 | 240 | | |
241 | | - | |
242 | | - | |
243 | | - | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
244 | 256 | | |
245 | 257 | | |
246 | | - | |
247 | 258 | | |
248 | 259 | | |
249 | 260 | | |
250 | 261 | | |
251 | | - | |
252 | | - | |
| 262 | + | |
253 | 263 | | |
254 | 264 | | |
255 | 265 | | |
256 | 266 | | |
257 | 267 | | |
258 | 268 | | |
259 | 269 | | |
260 | | - | |
| 270 | + | |
261 | 271 | | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
0 commit comments