⚡️ Speed up function _parse_and_collect_imports by 69% in PR #1498 (cf-simplify-context-extraction)#1499
Conversation
The optimization achieves a **68% runtime improvement** (23.5ms → 14.0ms) by replacing the expensive `ast.walk()` traversal with a targeted recursive collection strategy. **Key Performance Improvement:** The original code uses `ast.walk(tree)` which visits **every single node** in the AST tree (12,947 hits shown in line profiler), consuming 71.7% of total runtime. This includes unnecessary nodes like expressions, literals, and operators that can never contain `ImportFrom` statements. The optimized version implements a custom `collect_imports()` function that: 1. **Only traverses module body and control flow structures** where imports can legally appear (function/class definitions, if/while/for blocks, try/except) 2. **Skips irrelevant AST nodes** like expressions, literals, and operators entirely 3. **Recursively processes nested bodies** (body, orelse, finalbody, handlers) in a depth-first manner **Why This Works:** In Python, `from X import Y` statements can only appear: - At module level - Inside function/class definitions - Within control flow blocks (if/while/for/try) By checking `isinstance()` for only these container node types and recursively descending into their body attributes, we avoid traversing the entire AST subtree for each construct. This dramatically reduces the number of nodes visited while maintaining correctness. **Test Case Performance:** The optimization excels across all scales: - **Small imports** (single statements): 60-77% faster - **Large import lists** (100-500 items): 74-104% faster - **Many code blocks** (500-1000 lines): 70-77% faster - **Mixed code/imports** at scale: 70% faster The performance gain is particularly pronounced when the AST contains large amounts of non-import code (functions, classes, expressions), as shown by the `test_mixed_imports_and_code_large_scale` case improving from 9.31ms to 5.45ms (70.8% faster). **Impact on Workloads:** Given the function_references show this is used in code context extraction benchmarks, this optimization will significantly speed up any workflow that analyzes Python imports from large codebases or performs repeated import analysis during development workflows.
| elif isinstance( | ||
| node, | ||
| ( | ||
| ast.FunctionDef, | ||
| ast.AsyncFunctionDef, | ||
| ast.ClassDef, | ||
| ast.If, | ||
| ast.For, | ||
| ast.AsyncFor, | ||
| ast.While, | ||
| ast.With, | ||
| ast.AsyncWith, | ||
| ast.Try, | ||
| ast.ExceptHandler, | ||
| ), |
There was a problem hiding this comment.
Bug: Missing ast.Match (Python 3.10+) node type
The collect_imports recursive traversal handles Try, If, For, While, With, etc., but does not handle ast.Match (match/case statements, introduced in Python 3.10). If someone writes:
match value:
case 1:
from module import somethingThe ImportFrom inside the match_case body won't be found by this optimized traversal, whereas the original ast.walk() would find it.
Since the project targets Python 3.9+, this is a low-probability edge case (imports inside match/case are very uncommon), but it's a correctness gap vs the original implementation.
Consider adding ast.Match and match_case to the isinstance check (guarded by a version check or hasattr).
|
@claude Bug: Missing ast.Match (Python 3.10+) node type The collect_imports recursive traversal handles Try, If, For, While, With, etc., but does not handle ast.Match (match/case statements, introduced in Python 3.10). If someone writes: fix this bug |
PR Review SummaryPrek Checks✅ All checks passing after auto-fixes:
Code Review1 issue found (low severity): Missing No other critical bugs, security issues, or breaking API changes found. The optimization logic is sound — replacing Test Coverage
Last updated: 2026-02-16T21:10 UTC |
The optimized collect_imports missed match/case statements where imports can legally appear. Add hasattr-guarded handling for ast.Match nodes. Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
⚡️ This pull request contains optimizations for PR #1498
If you approve this dependent PR, these changes will be merged into the original PR branch
cf-simplify-context-extraction.📄 69% (0.69x) speedup for
_parse_and_collect_importsincodeflash/languages/python/context/code_context_extractor.py⏱️ Runtime :
23.5 milliseconds→14.0 milliseconds(best of30runs)📝 Explanation and details
The optimization achieves a 68% runtime improvement (23.5ms → 14.0ms) by replacing the expensive
ast.walk()traversal with a targeted recursive collection strategy.Key Performance Improvement:
The original code uses
ast.walk(tree)which visits every single node in the AST tree (12,947 hits shown in line profiler), consuming 71.7% of total runtime. This includes unnecessary nodes like expressions, literals, and operators that can never containImportFromstatements.The optimized version implements a custom
collect_imports()function that:Why This Works:
In Python,
from X import Ystatements can only appear:By checking
isinstance()for only these container node types and recursively descending into their body attributes, we avoid traversing the entire AST subtree for each construct. This dramatically reduces the number of nodes visited while maintaining correctness.Test Case Performance:
The optimization excels across all scales:
The performance gain is particularly pronounced when the AST contains large amounts of non-import code (functions, classes, expressions), as shown by the
test_mixed_imports_and_code_large_scalecase improving from 9.31ms to 5.45ms (70.8% faster).Impact on Workloads:
Given the function_references show this is used in code context extraction benchmarks, this optimization will significantly speed up any workflow that analyzes Python imports from large codebases or performs repeated import analysis during development workflows.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1498-2026-02-16T20.49.33and push.