⚡️ Speed up function `_parse_and_collect_imports` by 69% in PR #1498 (`cf-simplify-context-extraction`) by codeflash-ai[bot] · Pull Request #1499 · codeflash-ai/codeflash

codeflash-ai · 2026-02-16T20:49:39Z

⚡️ This pull request contains optimizations for PR #1498

If you approve this dependent PR, these changes will be merged into the original PR branch cf-simplify-context-extraction.

This PR will be automatically closed if the original PR is merged.

📄 69% (0.69x) speedup for `_parse_and_collect_imports` in `codeflash/languages/python/context/code_context_extractor.py`

⏱️ Runtime : 23.5 milliseconds → 14.0 milliseconds (best of 30 runs)

📝 Explanation and details

The optimization achieves a 68% runtime improvement (23.5ms → 14.0ms) by replacing the expensive ast.walk() traversal with a targeted recursive collection strategy.

Key Performance Improvement:

The original code uses ast.walk(tree) which visits every single node in the AST tree (12,947 hits shown in line profiler), consuming 71.7% of total runtime. This includes unnecessary nodes like expressions, literals, and operators that can never contain ImportFrom statements.

The optimized version implements a custom collect_imports() function that:

Only traverses module body and control flow structures where imports can legally appear (function/class definitions, if/while/for blocks, try/except)
Skips irrelevant AST nodes like expressions, literals, and operators entirely
Recursively processes nested bodies (body, orelse, finalbody, handlers) in a depth-first manner

Why This Works:

In Python, from X import Y statements can only appear:

At module level
Inside function/class definitions
Within control flow blocks (if/while/for/try)

By checking isinstance() for only these container node types and recursively descending into their body attributes, we avoid traversing the entire AST subtree for each construct. This dramatically reduces the number of nodes visited while maintaining correctness.

Test Case Performance:

The optimization excels across all scales:

Small imports (single statements): 60-77% faster
Large import lists (100-500 items): 74-104% faster
Many code blocks (500-1000 lines): 70-77% faster
Mixed code/imports at scale: 70% faster

The performance gain is particularly pronounced when the AST contains large amounts of non-import code (functions, classes, expressions), as shown by the test_mixed_imports_and_code_large_scale case improving from 9.31ms to 5.45ms (70.8% faster).

Impact on Workloads:

Given the function_references show this is used in code context extraction benchmarks, this optimization will significantly speed up any workflow that analyzes Python imports from large codebases or performs repeated import analysis during development workflows.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 44 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	84.6%

🌀 Click to see Generated Regression Tests

import ast  # used to assert the returned tree type and inspect nodes
from typing import List

# imports
import pytest  # used for our unit tests
from codeflash.languages.python.context.code_context_extractor import \
    _parse_and_collect_imports
from codeflash.models.models import CodeString, CodeStringsMarkdown

def test_single_from_import_no_alias():
    # Create a single CodeString containing a simple "from module import name"
    cs = CodeString(code="from math import sqrt", file_path=None, language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    # Call the function under test
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 17.7μs -> 10.6μs (67.0% faster)
    tree, imports = result

def test_from_import_with_alias_and_multiple_entries():
    # Code containing: a multi-alias ImportFrom, a star import (ignored), and a normal import (ignored)
    code = "\n".join(
        [
            "from os.path import join as pjoin, dirname",
            "from sys import *",
            "import collections",
        ]
    )
    cs = CodeString(code=code, file_path=None, language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 28.8μs -> 18.5μs (55.6% faster)
    tree, imports = result
    # We expect two entries from the os.path import, with alias respected for join
    expected = {"pjoin": "os.path", "dirname": "os.path"}

def test_empty_code_strings_list_returns_empty_mapping_and_tree():
    # An empty CodeStringsMarkdown (no code strings) should parse to an empty AST module and empty mapping
    context = CodeStringsMarkdown(code_strings=[])
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 14.4μs -> 9.65μs (48.9% faster)
    tree, imports = result

def test_relative_import_with_no_module_is_skipped_and_relative_with_module_included():
    # "from . import name" has node.module == None and should be skipped.
    # "from ..mod import x" will have module 'mod' (level > 0 but module present) and should be included.
    code = "\n".join(
        [
            "from . import rel",        # module is None -> should be ignored
            "from ..mod import x",     # module is 'mod' -> should be included
        ]
    )
    cs = CodeString(code=code, file_path=None, language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 23.3μs -> 15.1μs (54.1% faster)
    _, imports = result

def test_import_name_overwrite_by_later_import():
    # When the same imported name appears twice from different modules, the later import should overwrite
    code = "\n".join(
        [
            "from a import item",
            "from b import item as item",  # explicit asname same as name; this should overwrite
        ]
    )
    cs = CodeString(code=code, file_path=None, language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 22.8μs -> 13.6μs (68.0% faster)
    _, imports = result

def test_many_imports_across_many_code_strings():
    # Build a large list of CodeString objects, each declaring a From import.
    # Use 500 entries to be large enough to exercise scale without being excessive.
    n = 500
    code_strings: List[CodeString] = []
    for i in range(n):
        # Each code string imports a uniquely named symbol with an alias
        code_strings.append(
            CodeString(code=f"from mod{i} import name{i} as alias{i}", file_path=None, language="python")
        )
    context = CodeStringsMarkdown(code_strings=code_strings)
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 2.36ms -> 1.35ms (74.1% faster)
    _, imports = result

def test_many_empty_code_strings_are_handled_quickly():
    # Create many (1000) code strings that are empty; ensure the function returns an empty mapping
    # and does not error out. This checks handling of many inputs even when content is trivial.
    n = 1000
    code_strings = [CodeString(code="", file_path=None, language="python") for _ in range(n)]
    context = CodeStringsMarkdown(code_strings=code_strings)
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 94.6μs -> 85.0μs (11.4% faster)
    _, imports = result
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import ast
from pathlib import Path

# imports
import pytest
from codeflash.languages.python.context.code_context_extractor import \
    _parse_and_collect_imports
from codeflash.models.models import CodeString, CodeStringsMarkdown

def test_single_import_from_statement():
    """Test parsing a simple 'from X import Y' statement."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 19.2μs -> 11.6μs (65.3% faster)
    tree, imports = result

def test_multiple_imports_from_same_module():
    """Test parsing multiple imports from the same module."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path, environ, getcwd", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 21.6μs -> 12.7μs (69.5% faster)
    tree, imports = result

def test_import_with_alias():
    """Test parsing imports with 'as' aliases."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path as p", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 17.5μs -> 10.9μs (60.8% faster)
    tree, imports = result

def test_multiple_import_statements():
    """Test parsing multiple import statements from different modules."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path\nfrom sys import argv\nfrom json import loads", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 26.6μs -> 16.1μs (64.9% faster)
    tree, imports = result

def test_star_import_ignored():
    """Test that 'from X import *' is ignored and not added to imports."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import *", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 17.1μs -> 9.78μs (74.6% faster)
    tree, imports = result

def test_empty_code_strings():
    """Test parsing with an empty code_strings list."""
    code_markdown = CodeStringsMarkdown(code_strings=[])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 14.5μs -> 9.52μs (52.6% faster)
    tree, imports = result

def test_multiple_code_blocks_joined():
    """Test parsing multiple code blocks that are joined together."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path", file_path=None, language="python"),
        CodeString(code="from sys import argv", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 24.5μs -> 15.2μs (60.8% faster)
    tree, imports = result

def test_import_with_nested_module():
    """Test parsing imports from nested module paths."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from collections.abc import Iterable", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 18.8μs -> 11.5μs (64.1% faster)
    tree, imports = result

def test_non_import_code_ignored():
    """Test that non-import code is parsed but doesn't affect imports dict."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="x = 5\nprint(x)\nfrom os import path", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 37.3μs -> 23.0μs (62.4% faster)
    tree, imports = result

def test_ast_tree_structure_valid():
    """Test that returned AST tree is a valid Module with parseable body."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path\nx = 1", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 26.2μs -> 16.0μs (63.8% faster)
    tree, imports = result

def test_very_long_import_list():
    """Test parsing imports with many items from a single module."""
    import_items = ", ".join([f"item{i}" for i in range(100)])
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code=f"from module import {import_items}", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 156μs -> 76.9μs (104% faster)
    tree, imports = result
    for i in range(100):
        pass

def test_all_imports_with_aliases():
    """Test parsing imports where all items have aliases."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path as p, environ as e, getcwd as g", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 22.6μs -> 13.2μs (70.5% faster)
    tree, imports = result

def test_whitespace_variations():
    """Test parsing imports with various whitespace patterns."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from    os    import    path  ,  environ", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 19.9μs -> 11.6μs (70.8% faster)
    tree, imports = result

def test_single_code_string_newline():
    """Test parsing a single code block with only a newline."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 10.2μs -> 5.78μs (77.3% faster)
    tree, imports = result

def test_import_from_with_no_module():
    """Test that ImportFrom nodes without a module are skipped."""
    # Note: This is hard to construct directly since "from None import X" is syntax error
    # But we test the guard condition through normal code
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 17.7μs -> 10.8μs (64.7% faster)
    tree, imports = result

def test_duplicate_imports_last_wins():
    """Test behavior when the same name is imported multiple times."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path\nfrom sys import path", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 21.8μs -> 13.1μs (66.2% faster)
    tree, imports = result

def test_underscore_prefixed_imports():
    """Test parsing imports with underscore prefixes."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import _exit, _getfullpathname", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 19.7μs -> 11.2μs (75.8% faster)
    tree, imports = result

def test_uppercase_module_names():
    """Test parsing imports from modules with uppercase names."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from XML import ElementTree as ET", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 17.8μs -> 10.6μs (67.8% faster)
    tree, imports = result

def test_deeply_nested_module_path():
    """Test parsing imports from deeply nested module paths."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from a.b.c.d.e.f import something", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 18.8μs -> 12.0μs (57.0% faster)
    tree, imports = result

def test_code_with_multiline_import():
    """Test parsing multiline import statements."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import (\n    path,\n    environ,\n    getcwd\n)", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 21.5μs -> 12.5μs (72.7% faster)
    tree, imports = result

def test_code_with_comments_and_imports():
    """Test parsing code that includes comments alongside imports."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="# This is a comment\nfrom os import path  # inline comment", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 17.5μs -> 10.3μs (69.3% faster)
    tree, imports = result

def test_import_after_non_import_code():
    """Test that imports are found even when placed after other code."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="x = 1\ny = 2\ndef foo():\n    pass\nfrom os import path", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 42.0μs -> 26.9μs (56.2% faster)
    tree, imports = result

def test_code_with_class_and_function_definitions():
    """Test that imports are correctly collected alongside class/function definitions."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path\n\nclass MyClass:\n    pass\n\ndef my_func():\n    pass", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 34.5μs -> 24.1μs (43.2% faster)
    tree, imports = result

def test_import_names_with_numbers():
    """Test parsing imports that have numbers in their names."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from module import item1, item2, item3", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 21.3μs -> 12.4μs (72.0% faster)
    tree, imports = result

def test_multiple_blocks_with_path_info():
    """Test parsing multiple code blocks with file path information."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path", file_path=Path("file1.py"), language="python"),
        CodeString(code="from sys import argv", file_path=Path("file2.py"), language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 22.7μs -> 13.9μs (62.7% faster)
    tree, imports = result

def test_empty_string_code():
    """Test parsing an empty string code block."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 10.1μs -> 5.72μs (77.1% faster)
    tree, imports = result

def test_only_whitespace_code():
    """Test parsing code block with only whitespace."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="   \n  \n   ", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 9.87μs -> 5.52μs (78.8% faster)
    tree, imports = result

def test_indented_import_statement():
    """Test that indented import statements are correctly parsed."""
    # Note: This should work if inside a valid block like if statement
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="if True:\n    from os import path", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 25.0μs -> 16.8μs (49.2% faster)
    tree, imports = result

def test_large_number_of_import_statements():
    """Test parsing a very large number of import statements (500)."""
    import_lines = "\n".join([f"from module{i} import item{i}" for i in range(500)])
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code=import_lines, file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 1.96ms -> 1.14ms (71.4% faster)
    tree, imports = result
    for i in range(500):
        pass

def test_large_number_of_code_blocks():
    """Test parsing with 100 separate code blocks."""
    code_strings_list = [
        CodeString(code=f"from module{i} import item{i}", file_path=None, language="python")
        for i in range(100)
    ]
    code_markdown = CodeStringsMarkdown(code_strings=code_strings_list)
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 404μs -> 239μs (69.1% faster)
    tree, imports = result
    for i in range(100):
        pass

def test_very_long_code_block():
    """Test parsing a very large single code block with many lines."""
    # Create 1000 lines of mixed code and imports
    lines = []
    for i in range(1000):
        if i % 3 == 0:
            lines.append(f"from module{i} import item{i}")
        else:
            lines.append(f"x{i} = {i}")
    
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n".join(lines), file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 4.94ms -> 2.79ms (77.2% faster)
    tree, imports = result
    # Count how many imports we expect (1000 / 3 ≈ 333)
    expected_imports = len([i for i in range(1000) if i % 3 == 0])

def test_many_aliases_in_single_statement():
    """Test parsing a single import with 500 aliased items."""
    import_items = ", ".join([f"item{i} as alias{i}" for i in range(500)])
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code=f"from module import {import_items}", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 816μs -> 450μs (81.3% faster)
    tree, imports = result
    for i in range(500):
        pass

def test_large_ast_tree_processing():
    """Test that AST tree processing works efficiently with large code."""
    # Create a large code block with functions, classes, and imports
    lines = []
    lines.append("from os import path")
    for i in range(100):
        lines.append(f"def func{i}():\n    pass")
        lines.append(f"class Class{i}:\n    pass")
    lines.append("from sys import argv")
    
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n".join(lines), file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 1.13ms -> 787μs (43.6% faster)
    tree, imports = result

def test_many_nested_modules_in_imports():
    """Test parsing imports from many deeply nested module paths."""
    import_lines = "\n".join([
        f"from a{i}.b{i}.c{i}.d{i}.e{i}.f{i} import item{i}" for i in range(200)
    ])
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code=import_lines, file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 1.28ms -> 934μs (36.5% faster)
    tree, imports = result
    for i in range(200):
        pass

def test_mixed_imports_and_code_large_scale():
    """Test parsing with alternating imports and code statements at large scale."""
    lines = []
    for i in range(500):
        lines.append(f"from module{i} import item{i}")
        lines.append(f"x{i} = {i}")
        lines.append(f"def func{i}():\n    return x{i}")
    
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n".join(lines), file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 9.31ms -> 5.45ms (70.8% faster)
    tree, imports = result
    for i in range(500):
        pass

def test_imports_dict_always_string_values():
    """Test that all values in imports dict are strings (module names)."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n".join([
            f"from module{i} import item{i}" for i in range(100)
        ]), file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 396μs -> 225μs (75.5% faster)
    tree, imports = result
    for key, value in imports.items():
        pass

def test_imports_dict_keys_never_star():
    """Test that star imports never appear as keys in the imports dict."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n".join([
            "from os import *",
            "from sys import *",
            "from json import *",
        ]), file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 26.2μs -> 15.2μs (72.1% faster)
    tree, imports = result
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1498-2026-02-16T20.49.33 and push.

The optimization achieves a **68% runtime improvement** (23.5ms → 14.0ms) by replacing the expensive `ast.walk()` traversal with a targeted recursive collection strategy. **Key Performance Improvement:** The original code uses `ast.walk(tree)` which visits **every single node** in the AST tree (12,947 hits shown in line profiler), consuming 71.7% of total runtime. This includes unnecessary nodes like expressions, literals, and operators that can never contain `ImportFrom` statements. The optimized version implements a custom `collect_imports()` function that: 1. **Only traverses module body and control flow structures** where imports can legally appear (function/class definitions, if/while/for blocks, try/except) 2. **Skips irrelevant AST nodes** like expressions, literals, and operators entirely 3. **Recursively processes nested bodies** (body, orelse, finalbody, handlers) in a depth-first manner **Why This Works:** In Python, `from X import Y` statements can only appear: - At module level - Inside function/class definitions - Within control flow blocks (if/while/for/try) By checking `isinstance()` for only these container node types and recursively descending into their body attributes, we avoid traversing the entire AST subtree for each construct. This dramatically reduces the number of nodes visited while maintaining correctness. **Test Case Performance:** The optimization excels across all scales: - **Small imports** (single statements): 60-77% faster - **Large import lists** (100-500 items): 74-104% faster - **Many code blocks** (500-1000 lines): 70-77% faster - **Mixed code/imports** at scale: 70% faster The performance gain is particularly pronounced when the AST contains large amounts of non-import code (functions, classes, expressions), as shown by the `test_mixed_imports_and_code_large_scale` case improving from 9.31ms to 5.45ms (70.8% faster). **Impact on Workloads:** Given the function_references show this is used in code context extraction benchmarks, this optimization will significantly speed up any workflow that analyzes Python imports from large codebases or performs repeated import analysis during development workflows.

claude · 2026-02-16T20:54:48Z

+            elif isinstance(
+                node,
+                (
+                    ast.FunctionDef,
+                    ast.AsyncFunctionDef,
+                    ast.ClassDef,
+                    ast.If,
+                    ast.For,
+                    ast.AsyncFor,
+                    ast.While,
+                    ast.With,
+                    ast.AsyncWith,
+                    ast.Try,
+                    ast.ExceptHandler,
+                ),


Bug: Missing ast.Match (Python 3.10+) node type

The collect_imports recursive traversal handles Try, If, For, While, With, etc., but does not handle ast.Match (match/case statements, introduced in Python 3.10). If someone writes:

match value: case 1: from module import something

The ImportFrom inside the match_case body won't be found by this optimized traversal, whereas the original ast.walk() would find it.

Since the project targets Python 3.9+, this is a low-probability edge case (imports inside match/case are very uncommon), but it's a correctness gap vs the original implementation.

Consider adding ast.Match and match_case to the isinstance check (guarded by a version check or hasattr).

KRRT7 · 2026-02-16T21:00:24Z

@claude Bug: Missing ast.Match (Python 3.10+) node type

The collect_imports recursive traversal handles Try, If, For, While, With, etc., but does not handle ast.Match (match/case statements, introduced in Python 3.10). If someone writes:

fix this bug

claude · 2026-02-16T21:01:01Z

PR Review Summary

Prek Checks

✅ All checks passing after auto-fixes:

Fixed 8 ruff errors in code_context_extractor.py: 4× bad-quotes-inline-string, 2× trailing-whitespace, 2× blank-line-with-whitespace
Fixed ruff format issues (1 file reformatted)
Fixed mypy type error: added type annotation to collect_imports(nodes: list[ast.stmt]) -> None
Committed and pushed: style: auto-fix linting issues + fix: resolve mypy type errors in collect_imports

Code Review

1 issue found (low severity):

Missing ast.Match node type (Python 3.10+) — The optimized collect_imports recursion handles Try, If, For, While, With, etc., but omits ast.Match (match/case statements). An ImportFrom inside a match/case block would be missed by the new traversal but was found by the original ast.walk(). This is a correctness gap, though imports inside match/case are extremely uncommon in practice. See inline comment.

No other critical bugs, security issues, or breaking API changes found. The optimization logic is sound — replacing ast.walk() with targeted recursive traversal that only visits statement-level nodes where ImportFrom can legally appear.

Test Coverage

File	Stmts	Miss	Coverage
`codeflash/languages/python/context/code_context_extractor.py`	644	54	92%

✅ The optimized _parse_and_collect_imports function (lines 559-593) is fully covered by tests
The only uncovered lines near the change (553-554) are the pre-existing SyntaxError exception handler
2374 tests passed, 8 failed (all pre-existing failures in test_tracer.py, unrelated to this PR)

Last updated: 2026-02-16T21:10 UTC

The optimized collect_imports missed match/case statements where imports can legally appear. Add hasattr-guarded handling for ast.Match nodes. Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>

codeflash-ai Bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 16, 2026

codeflash-ai Bot mentioned this pull request Feb 16, 2026

refactor: move context extraction to languages/python/context/ #1498

Merged

3 tasks

github-actions Bot added 2 commits February 16, 2026 20:51

style: auto-fix linting issues

73e71d0

fix: resolve mypy type errors in collect_imports

29c0a66

claude Bot reviewed Feb 16, 2026

View reviewed changes

fix: handle ast.Match (Python 3.10+) in collect_imports traversal

bfa55cb

The optimized collect_imports missed match/case statements where imports can legally appear. Add hasattr-guarded handling for ast.Match nodes. Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>

KRRT7 merged commit 82b4002 into cf-simplify-context-extraction Feb 16, 2026
25 of 27 checks passed

KRRT7 deleted the codeflash/optimize-pr1498-2026-02-16T20.49.33 branch February 16, 2026 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `_parse_and_collect_imports` by 69% in PR #1498 (`cf-simplify-context-extraction`)#1499

⚡️ Speed up function `_parse_and_collect_imports` by 69% in PR #1498 (`cf-simplify-context-extraction`)#1499
KRRT7 merged 4 commits into
cf-simplify-context-extractionfrom
codeflash/optimize-pr1498-2026-02-16T20.49.33

codeflash-ai Bot commented Feb 16, 2026

Uh oh!

claude Bot Feb 16, 2026

Uh oh!

KRRT7 commented Feb 16, 2026

Uh oh!

claude Bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeflash-ai Bot commented Feb 16, 2026

⚡️ This pull request contains optimizations for PR #1498

📄 69% (0.69x) speedup for _parse_and_collect_imports in codeflash/languages/python/context/code_context_extractor.py

📝 Explanation and details

Uh oh!

claude Bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

KRRT7 commented Feb 16, 2026

Uh oh!

claude Bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Test Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 69% (0.69x) speedup for `_parse_and_collect_imports` in `codeflash/languages/python/context/code_context_extractor.py`

claude Bot commented Feb 16, 2026 •

edited

Loading