Skip to content

⚡️ Speed up function _parse_and_collect_imports by 69% in PR #1498 (cf-simplify-context-extraction)#1499

Merged
KRRT7 merged 4 commits into
cf-simplify-context-extractionfrom
codeflash/optimize-pr1498-2026-02-16T20.49.33
Feb 16, 2026
Merged

⚡️ Speed up function _parse_and_collect_imports by 69% in PR #1498 (cf-simplify-context-extraction)#1499
KRRT7 merged 4 commits into
cf-simplify-context-extractionfrom
codeflash/optimize-pr1498-2026-02-16T20.49.33

Conversation

@codeflash-ai
Copy link
Copy Markdown
Contributor

@codeflash-ai codeflash-ai Bot commented Feb 16, 2026

⚡️ This pull request contains optimizations for PR #1498

If you approve this dependent PR, these changes will be merged into the original PR branch cf-simplify-context-extraction.

This PR will be automatically closed if the original PR is merged.


📄 69% (0.69x) speedup for _parse_and_collect_imports in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 23.5 milliseconds 14.0 milliseconds (best of 30 runs)

📝 Explanation and details

The optimization achieves a 68% runtime improvement (23.5ms → 14.0ms) by replacing the expensive ast.walk() traversal with a targeted recursive collection strategy.

Key Performance Improvement:

The original code uses ast.walk(tree) which visits every single node in the AST tree (12,947 hits shown in line profiler), consuming 71.7% of total runtime. This includes unnecessary nodes like expressions, literals, and operators that can never contain ImportFrom statements.

The optimized version implements a custom collect_imports() function that:

  1. Only traverses module body and control flow structures where imports can legally appear (function/class definitions, if/while/for blocks, try/except)
  2. Skips irrelevant AST nodes like expressions, literals, and operators entirely
  3. Recursively processes nested bodies (body, orelse, finalbody, handlers) in a depth-first manner

Why This Works:

In Python, from X import Y statements can only appear:

  • At module level
  • Inside function/class definitions
  • Within control flow blocks (if/while/for/try)

By checking isinstance() for only these container node types and recursively descending into their body attributes, we avoid traversing the entire AST subtree for each construct. This dramatically reduces the number of nodes visited while maintaining correctness.

Test Case Performance:

The optimization excels across all scales:

  • Small imports (single statements): 60-77% faster
  • Large import lists (100-500 items): 74-104% faster
  • Many code blocks (500-1000 lines): 70-77% faster
  • Mixed code/imports at scale: 70% faster

The performance gain is particularly pronounced when the AST contains large amounts of non-import code (functions, classes, expressions), as shown by the test_mixed_imports_and_code_large_scale case improving from 9.31ms to 5.45ms (70.8% faster).

Impact on Workloads:

Given the function_references show this is used in code context extraction benchmarks, this optimization will significantly speed up any workflow that analyzes Python imports from large codebases or performs repeated import analysis during development workflows.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 44 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 84.6%
🌀 Click to see Generated Regression Tests
import ast  # used to assert the returned tree type and inspect nodes
from typing import List

# imports
import pytest  # used for our unit tests
from codeflash.languages.python.context.code_context_extractor import \
    _parse_and_collect_imports
from codeflash.models.models import CodeString, CodeStringsMarkdown

def test_single_from_import_no_alias():
    # Create a single CodeString containing a simple "from module import name"
    cs = CodeString(code="from math import sqrt", file_path=None, language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    # Call the function under test
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 17.7μs -> 10.6μs (67.0% faster)
    tree, imports = result

def test_from_import_with_alias_and_multiple_entries():
    # Code containing: a multi-alias ImportFrom, a star import (ignored), and a normal import (ignored)
    code = "\n".join(
        [
            "from os.path import join as pjoin, dirname",
            "from sys import *",
            "import collections",
        ]
    )
    cs = CodeString(code=code, file_path=None, language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 28.8μs -> 18.5μs (55.6% faster)
    tree, imports = result
    # We expect two entries from the os.path import, with alias respected for join
    expected = {"pjoin": "os.path", "dirname": "os.path"}

def test_empty_code_strings_list_returns_empty_mapping_and_tree():
    # An empty CodeStringsMarkdown (no code strings) should parse to an empty AST module and empty mapping
    context = CodeStringsMarkdown(code_strings=[])
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 14.4μs -> 9.65μs (48.9% faster)
    tree, imports = result

def test_relative_import_with_no_module_is_skipped_and_relative_with_module_included():
    # "from . import name" has node.module == None and should be skipped.
    # "from ..mod import x" will have module 'mod' (level > 0 but module present) and should be included.
    code = "\n".join(
        [
            "from . import rel",        # module is None -> should be ignored
            "from ..mod import x",     # module is 'mod' -> should be included
        ]
    )
    cs = CodeString(code=code, file_path=None, language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 23.3μs -> 15.1μs (54.1% faster)
    _, imports = result

def test_import_name_overwrite_by_later_import():
    # When the same imported name appears twice from different modules, the later import should overwrite
    code = "\n".join(
        [
            "from a import item",
            "from b import item as item",  # explicit asname same as name; this should overwrite
        ]
    )
    cs = CodeString(code=code, file_path=None, language="python")
    context = CodeStringsMarkdown(code_strings=[cs])
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 22.8μs -> 13.6μs (68.0% faster)
    _, imports = result

def test_many_imports_across_many_code_strings():
    # Build a large list of CodeString objects, each declaring a From import.
    # Use 500 entries to be large enough to exercise scale without being excessive.
    n = 500
    code_strings: List[CodeString] = []
    for i in range(n):
        # Each code string imports a uniquely named symbol with an alias
        code_strings.append(
            CodeString(code=f"from mod{i} import name{i} as alias{i}", file_path=None, language="python")
        )
    context = CodeStringsMarkdown(code_strings=code_strings)
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 2.36ms -> 1.35ms (74.1% faster)
    _, imports = result

def test_many_empty_code_strings_are_handled_quickly():
    # Create many (1000) code strings that are empty; ensure the function returns an empty mapping
    # and does not error out. This checks handling of many inputs even when content is trivial.
    n = 1000
    code_strings = [CodeString(code="", file_path=None, language="python") for _ in range(n)]
    context = CodeStringsMarkdown(code_strings=code_strings)
    codeflash_output = _parse_and_collect_imports(context); result = codeflash_output # 94.6μs -> 85.0μs (11.4% faster)
    _, imports = result
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import ast
from pathlib import Path

# imports
import pytest
from codeflash.languages.python.context.code_context_extractor import \
    _parse_and_collect_imports
from codeflash.models.models import CodeString, CodeStringsMarkdown

def test_single_import_from_statement():
    """Test parsing a simple 'from X import Y' statement."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 19.2μs -> 11.6μs (65.3% faster)
    tree, imports = result

def test_multiple_imports_from_same_module():
    """Test parsing multiple imports from the same module."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path, environ, getcwd", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 21.6μs -> 12.7μs (69.5% faster)
    tree, imports = result

def test_import_with_alias():
    """Test parsing imports with 'as' aliases."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path as p", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 17.5μs -> 10.9μs (60.8% faster)
    tree, imports = result

def test_multiple_import_statements():
    """Test parsing multiple import statements from different modules."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path\nfrom sys import argv\nfrom json import loads", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 26.6μs -> 16.1μs (64.9% faster)
    tree, imports = result

def test_star_import_ignored():
    """Test that 'from X import *' is ignored and not added to imports."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import *", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 17.1μs -> 9.78μs (74.6% faster)
    tree, imports = result

def test_empty_code_strings():
    """Test parsing with an empty code_strings list."""
    code_markdown = CodeStringsMarkdown(code_strings=[])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 14.5μs -> 9.52μs (52.6% faster)
    tree, imports = result

def test_multiple_code_blocks_joined():
    """Test parsing multiple code blocks that are joined together."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path", file_path=None, language="python"),
        CodeString(code="from sys import argv", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 24.5μs -> 15.2μs (60.8% faster)
    tree, imports = result

def test_import_with_nested_module():
    """Test parsing imports from nested module paths."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from collections.abc import Iterable", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 18.8μs -> 11.5μs (64.1% faster)
    tree, imports = result

def test_non_import_code_ignored():
    """Test that non-import code is parsed but doesn't affect imports dict."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="x = 5\nprint(x)\nfrom os import path", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 37.3μs -> 23.0μs (62.4% faster)
    tree, imports = result

def test_ast_tree_structure_valid():
    """Test that returned AST tree is a valid Module with parseable body."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path\nx = 1", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 26.2μs -> 16.0μs (63.8% faster)
    tree, imports = result

def test_very_long_import_list():
    """Test parsing imports with many items from a single module."""
    import_items = ", ".join([f"item{i}" for i in range(100)])
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code=f"from module import {import_items}", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 156μs -> 76.9μs (104% faster)
    tree, imports = result
    for i in range(100):
        pass

def test_all_imports_with_aliases():
    """Test parsing imports where all items have aliases."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path as p, environ as e, getcwd as g", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 22.6μs -> 13.2μs (70.5% faster)
    tree, imports = result

def test_whitespace_variations():
    """Test parsing imports with various whitespace patterns."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from    os    import    path  ,  environ", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 19.9μs -> 11.6μs (70.8% faster)
    tree, imports = result

def test_single_code_string_newline():
    """Test parsing a single code block with only a newline."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 10.2μs -> 5.78μs (77.3% faster)
    tree, imports = result

def test_import_from_with_no_module():
    """Test that ImportFrom nodes without a module are skipped."""
    # Note: This is hard to construct directly since "from None import X" is syntax error
    # But we test the guard condition through normal code
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 17.7μs -> 10.8μs (64.7% faster)
    tree, imports = result

def test_duplicate_imports_last_wins():
    """Test behavior when the same name is imported multiple times."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path\nfrom sys import path", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 21.8μs -> 13.1μs (66.2% faster)
    tree, imports = result

def test_underscore_prefixed_imports():
    """Test parsing imports with underscore prefixes."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import _exit, _getfullpathname", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 19.7μs -> 11.2μs (75.8% faster)
    tree, imports = result

def test_uppercase_module_names():
    """Test parsing imports from modules with uppercase names."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from XML import ElementTree as ET", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 17.8μs -> 10.6μs (67.8% faster)
    tree, imports = result

def test_deeply_nested_module_path():
    """Test parsing imports from deeply nested module paths."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from a.b.c.d.e.f import something", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 18.8μs -> 12.0μs (57.0% faster)
    tree, imports = result

def test_code_with_multiline_import():
    """Test parsing multiline import statements."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import (\n    path,\n    environ,\n    getcwd\n)", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 21.5μs -> 12.5μs (72.7% faster)
    tree, imports = result

def test_code_with_comments_and_imports():
    """Test parsing code that includes comments alongside imports."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="# This is a comment\nfrom os import path  # inline comment", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 17.5μs -> 10.3μs (69.3% faster)
    tree, imports = result

def test_import_after_non_import_code():
    """Test that imports are found even when placed after other code."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="x = 1\ny = 2\ndef foo():\n    pass\nfrom os import path", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 42.0μs -> 26.9μs (56.2% faster)
    tree, imports = result

def test_code_with_class_and_function_definitions():
    """Test that imports are correctly collected alongside class/function definitions."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path\n\nclass MyClass:\n    pass\n\ndef my_func():\n    pass", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 34.5μs -> 24.1μs (43.2% faster)
    tree, imports = result

def test_import_names_with_numbers():
    """Test parsing imports that have numbers in their names."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from module import item1, item2, item3", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 21.3μs -> 12.4μs (72.0% faster)
    tree, imports = result

def test_multiple_blocks_with_path_info():
    """Test parsing multiple code blocks with file path information."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="from os import path", file_path=Path("file1.py"), language="python"),
        CodeString(code="from sys import argv", file_path=Path("file2.py"), language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 22.7μs -> 13.9μs (62.7% faster)
    tree, imports = result

def test_empty_string_code():
    """Test parsing an empty string code block."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 10.1μs -> 5.72μs (77.1% faster)
    tree, imports = result

def test_only_whitespace_code():
    """Test parsing code block with only whitespace."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="   \n  \n   ", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 9.87μs -> 5.52μs (78.8% faster)
    tree, imports = result

def test_indented_import_statement():
    """Test that indented import statements are correctly parsed."""
    # Note: This should work if inside a valid block like if statement
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="if True:\n    from os import path", 
                   file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 25.0μs -> 16.8μs (49.2% faster)
    tree, imports = result

def test_large_number_of_import_statements():
    """Test parsing a very large number of import statements (500)."""
    import_lines = "\n".join([f"from module{i} import item{i}" for i in range(500)])
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code=import_lines, file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 1.96ms -> 1.14ms (71.4% faster)
    tree, imports = result
    for i in range(500):
        pass

def test_large_number_of_code_blocks():
    """Test parsing with 100 separate code blocks."""
    code_strings_list = [
        CodeString(code=f"from module{i} import item{i}", file_path=None, language="python")
        for i in range(100)
    ]
    code_markdown = CodeStringsMarkdown(code_strings=code_strings_list)
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 404μs -> 239μs (69.1% faster)
    tree, imports = result
    for i in range(100):
        pass

def test_very_long_code_block():
    """Test parsing a very large single code block with many lines."""
    # Create 1000 lines of mixed code and imports
    lines = []
    for i in range(1000):
        if i % 3 == 0:
            lines.append(f"from module{i} import item{i}")
        else:
            lines.append(f"x{i} = {i}")
    
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n".join(lines), file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 4.94ms -> 2.79ms (77.2% faster)
    tree, imports = result
    # Count how many imports we expect (1000 / 3 ≈ 333)
    expected_imports = len([i for i in range(1000) if i % 3 == 0])

def test_many_aliases_in_single_statement():
    """Test parsing a single import with 500 aliased items."""
    import_items = ", ".join([f"item{i} as alias{i}" for i in range(500)])
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code=f"from module import {import_items}", file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 816μs -> 450μs (81.3% faster)
    tree, imports = result
    for i in range(500):
        pass

def test_large_ast_tree_processing():
    """Test that AST tree processing works efficiently with large code."""
    # Create a large code block with functions, classes, and imports
    lines = []
    lines.append("from os import path")
    for i in range(100):
        lines.append(f"def func{i}():\n    pass")
        lines.append(f"class Class{i}:\n    pass")
    lines.append("from sys import argv")
    
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n".join(lines), file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 1.13ms -> 787μs (43.6% faster)
    tree, imports = result

def test_many_nested_modules_in_imports():
    """Test parsing imports from many deeply nested module paths."""
    import_lines = "\n".join([
        f"from a{i}.b{i}.c{i}.d{i}.e{i}.f{i} import item{i}" for i in range(200)
    ])
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code=import_lines, file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 1.28ms -> 934μs (36.5% faster)
    tree, imports = result
    for i in range(200):
        pass

def test_mixed_imports_and_code_large_scale():
    """Test parsing with alternating imports and code statements at large scale."""
    lines = []
    for i in range(500):
        lines.append(f"from module{i} import item{i}")
        lines.append(f"x{i} = {i}")
        lines.append(f"def func{i}():\n    return x{i}")
    
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n".join(lines), file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 9.31ms -> 5.45ms (70.8% faster)
    tree, imports = result
    for i in range(500):
        pass

def test_imports_dict_always_string_values():
    """Test that all values in imports dict are strings (module names)."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n".join([
            f"from module{i} import item{i}" for i in range(100)
        ]), file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 396μs -> 225μs (75.5% faster)
    tree, imports = result
    for key, value in imports.items():
        pass

def test_imports_dict_keys_never_star():
    """Test that star imports never appear as keys in the imports dict."""
    code_markdown = CodeStringsMarkdown(code_strings=[
        CodeString(code="\n".join([
            "from os import *",
            "from sys import *",
            "from json import *",
        ]), file_path=None, language="python")
    ])
    codeflash_output = _parse_and_collect_imports(code_markdown); result = codeflash_output # 26.2μs -> 15.2μs (72.1% faster)
    tree, imports = result
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1498-2026-02-16T20.49.33 and push.

Codeflash Static Badge

The optimization achieves a **68% runtime improvement** (23.5ms → 14.0ms) by replacing the expensive `ast.walk()` traversal with a targeted recursive collection strategy.

**Key Performance Improvement:**

The original code uses `ast.walk(tree)` which visits **every single node** in the AST tree (12,947 hits shown in line profiler), consuming 71.7% of total runtime. This includes unnecessary nodes like expressions, literals, and operators that can never contain `ImportFrom` statements.

The optimized version implements a custom `collect_imports()` function that:
1. **Only traverses module body and control flow structures** where imports can legally appear (function/class definitions, if/while/for blocks, try/except)
2. **Skips irrelevant AST nodes** like expressions, literals, and operators entirely
3. **Recursively processes nested bodies** (body, orelse, finalbody, handlers) in a depth-first manner

**Why This Works:**

In Python, `from X import Y` statements can only appear:
- At module level
- Inside function/class definitions
- Within control flow blocks (if/while/for/try)

By checking `isinstance()` for only these container node types and recursively descending into their body attributes, we avoid traversing the entire AST subtree for each construct. This dramatically reduces the number of nodes visited while maintaining correctness.

**Test Case Performance:**

The optimization excels across all scales:
- **Small imports** (single statements): 60-77% faster
- **Large import lists** (100-500 items): 74-104% faster  
- **Many code blocks** (500-1000 lines): 70-77% faster
- **Mixed code/imports** at scale: 70% faster

The performance gain is particularly pronounced when the AST contains large amounts of non-import code (functions, classes, expressions), as shown by the `test_mixed_imports_and_code_large_scale` case improving from 9.31ms to 5.45ms (70.8% faster).

**Impact on Workloads:**

Given the function_references show this is used in code context extraction benchmarks, this optimization will significantly speed up any workflow that analyzes Python imports from large codebases or performs repeated import analysis during development workflows.
@codeflash-ai codeflash-ai Bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 16, 2026
Comment on lines +567 to +581
elif isinstance(
node,
(
ast.FunctionDef,
ast.AsyncFunctionDef,
ast.ClassDef,
ast.If,
ast.For,
ast.AsyncFor,
ast.While,
ast.With,
ast.AsyncWith,
ast.Try,
ast.ExceptHandler,
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing ast.Match (Python 3.10+) node type

The collect_imports recursive traversal handles Try, If, For, While, With, etc., but does not handle ast.Match (match/case statements, introduced in Python 3.10). If someone writes:

match value:
    case 1:
        from module import something

The ImportFrom inside the match_case body won't be found by this optimized traversal, whereas the original ast.walk() would find it.

Since the project targets Python 3.9+, this is a low-probability edge case (imports inside match/case are very uncommon), but it's a correctness gap vs the original implementation.

Consider adding ast.Match and match_case to the isinstance check (guarded by a version check or hasattr).

@KRRT7
Copy link
Copy Markdown
Contributor

KRRT7 commented Feb 16, 2026

@claude Bug: Missing ast.Match (Python 3.10+) node type

The collect_imports recursive traversal handles Try, If, For, While, With, etc., but does not handle ast.Match (match/case statements, introduced in Python 3.10). If someone writes:

fix this bug

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Feb 16, 2026

PR Review Summary

Prek Checks

All checks passing after auto-fixes:

  • Fixed 8 ruff errors in code_context_extractor.py: 4× bad-quotes-inline-string, 2× trailing-whitespace, 2× blank-line-with-whitespace
  • Fixed ruff format issues (1 file reformatted)
  • Fixed mypy type error: added type annotation to collect_imports(nodes: list[ast.stmt]) -> None
  • Committed and pushed: style: auto-fix linting issues + fix: resolve mypy type errors in collect_imports

Code Review

1 issue found (low severity):

Missing ast.Match node type (Python 3.10+) — The optimized collect_imports recursion handles Try, If, For, While, With, etc., but omits ast.Match (match/case statements). An ImportFrom inside a match/case block would be missed by the new traversal but was found by the original ast.walk(). This is a correctness gap, though imports inside match/case are extremely uncommon in practice. See inline comment.

No other critical bugs, security issues, or breaking API changes found. The optimization logic is sound — replacing ast.walk() with targeted recursive traversal that only visits statement-level nodes where ImportFrom can legally appear.

Test Coverage

File Stmts Miss Coverage
codeflash/languages/python/context/code_context_extractor.py 644 54 92%
  • ✅ The optimized _parse_and_collect_imports function (lines 559-593) is fully covered by tests
  • The only uncovered lines near the change (553-554) are the pre-existing SyntaxError exception handler
  • 2374 tests passed, 8 failed (all pre-existing failures in test_tracer.py, unrelated to this PR)

Last updated: 2026-02-16T21:10 UTC

The optimized collect_imports missed match/case statements where imports
can legally appear. Add hasattr-guarded handling for ast.Match nodes.

Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>
@KRRT7 KRRT7 merged commit 82b4002 into cf-simplify-context-extraction Feb 16, 2026
25 of 27 checks passed
@KRRT7 KRRT7 deleted the codeflash/optimize-pr1498-2026-02-16T20.49.33 branch February 16, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant