⚡️ Speed up method `AsyncCallInstrumenter.visit_AsyncFunctionDef` by 123% in PR #769 (`clean-async-branch`) by codeflash-ai[bot] · Pull Request #780 · codeflash-ai/codeflash

codeflash-ai · 2025-09-27T02:50:09Z

⚡️ This pull request contains optimizations for PR #769

If you approve this dependent PR, these changes will be merged into the original PR branch clean-async-branch.

This PR will be automatically closed if the original PR is merged.

📄 123% (1.23x) speedup for `AsyncCallInstrumenter.visit_AsyncFunctionDef` in `codeflash/code_utils/instrument_existing_tests.py`

⏱️ Runtime : 9.25 milliseconds → 4.14 milliseconds (best of 186 runs)

📝 Explanation and details

The optimized code achieves a 123% speedup by replacing expensive AST traversal operations with more efficient alternatives:

Key Optimizations:

Decorator Search Optimization: Replaced the any() generator expression with a simple loop that breaks early when finding timeout_decorator.timeout. This avoids unnecessary attribute lookups and iterations through the decorator list, especially beneficial when the decorator is found early or when there are many decorators.
AST Traversal Replacement: The most significant optimization replaces ast.walk(stmt) with a manual stack-based depth-first search in _optimized_instrument_statement(). The original ast.walk() creates a list of every node in the AST subtree, which is memory-intensive and includes many irrelevant nodes. The optimized version:
- Uses a stack to traverse nodes manually
- Only explores child nodes via _fields attribute access
- Immediately returns when finding an ast.Await node that matches criteria
- Avoids creating intermediate collections

Performance Impact by Test Case:

Large-scale tests see the biggest improvements (125-129% faster) because they have many statements to traverse
Nested structures benefit significantly (57-93% faster) as the optimization avoids deep, unnecessary traversals
Simple test cases still see 29-48% improvements from the decorator optimization
Functions with many await calls show excellent scaling (123-127% faster) due to reduced per-statement traversal costs

The line profiler shows the critical bottleneck was in _instrument_statement() (96.4% of time originally), which is now reduced to 93.3% but with much lower absolute time, demonstrating the effectiveness of the AST traversal optimization.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 59 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import ast
import sys
import types

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.instrument_existing_tests import \
    AsyncCallInstrumenter


# Dummy imports for FunctionToOptimize, CodePosition, TestingMode
class CodePosition:
    def __init__(self, lineno, col_offset):
        self.lineno = lineno
        self.col_offset = col_offset

class TestingMode:
    BEHAVIOR = "behavior"
    COVERAGE = "coverage"

class Parent:
    def __init__(self, type_, name):
        self.type = type_
        self.name = name

class FunctionToOptimize:
    def __init__(self, function_name, parents, top_level_parent_name):
        self.function_name = function_name
        self.parents = parents
        self.top_level_parent_name = top_level_parent_name

# Helper to parse source and return ast.AsyncFunctionDef node
def get_async_func_node(source: str) -> ast.AsyncFunctionDef:
    mod = ast.parse(source)
    for node in ast.walk(mod):
        if isinstance(node, ast.AsyncFunctionDef):
            return node
    raise ValueError("No AsyncFunctionDef found")

# Helper to extract os.environ assignment from body
def find_os_environ_assignment(body):
    for stmt in body:
        if isinstance(stmt, ast.Assign):
            target = stmt.targets[0]
            if (
                isinstance(target, ast.Subscript)
                and isinstance(target.value, ast.Attribute)
                and target.value.attr == "environ"
                and isinstance(target.slice, ast.Constant)
                and target.slice.value == "CODEFLASH_CURRENT_LINE_ID"
            ):
                return stmt
    return None

# ------------------- Unit Tests -------------------

# 1. Basic Test Cases

def test_non_test_function_untouched():
    """Function name does not start with test_: should be returned unchanged."""
    src = "async def foo():\n    await target_func()"
    node = get_async_func_node(src)
    func = FunctionToOptimize("foo", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [], TestingMode.BEHAVIOR)
    orig_dump = ast.dump(node)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 892ns -> 892ns (0.000% faster)

def test_test_function_no_await_no_instrument():
    """test_ function, but no await: should not instrument."""
    src = "async def test_bar():\n    x = 1"
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_bar", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 8.18μs -> 6.33μs (29.1% faster)

def test_test_function_await_non_target_call():
    """test_ function, await on non-target call: should not instrument."""
    src = "async def test_bar():\n    await other_func()"
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_bar", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 8.99μs -> 6.23μs (44.2% faster)

def test_test_function_await_target_call_in_position():
    """test_ function, await on target call at target position: should instrument."""
    src = "async def test_bar():\n    await target_func()"
    node = get_async_func_node(src)
    # The call is at line 2, col_offset 4
    func = FunctionToOptimize("test_bar", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [CodePosition(2, 4)], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 8.74μs -> 5.91μs (47.8% faster)
    # Should have assignment to os.environ["CODEFLASH_CURRENT_LINE_ID"]
    assign = find_os_environ_assignment(result.body)

def test_test_function_multiple_target_calls():
    """test_ function, multiple target calls: should instrument each with incrementing index."""
    src = (
        "async def test_bar():\n"
        "    await target_func()\n"
        "    await target_func()\n"
        "    await other_func()\n"
        "    await target_func()"
    )
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_bar", [Parent("Module", "")], "")
    # Target calls at lines 2, 3, 5, col_offset 4
    instr = AsyncCallInstrumenter(
        func, "mod.py", "pytest",
        [CodePosition(2, 4), CodePosition(3, 4), CodePosition(5, 4)],
        TestingMode.BEHAVIOR,
    )
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 22.5μs -> 11.8μs (90.0% faster)
    # Find all os.environ assignments
    assigns = [stmt for stmt in result.body if find_os_environ_assignment([stmt])]
    for i, assign in enumerate(assigns):
        pass

def test_unittest_adds_timeout_decorator():
    """If test_framework is unittest, should add timeout_decorator if not present."""
    src = "async def test_bar():\n    await target_func()"
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_bar", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "unittest", [CodePosition(2, 4)], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 12.2μs -> 8.60μs (42.4% faster)
    found = False
    for dec in result.decorator_list:
        if (
            isinstance(dec, ast.Call)
            and isinstance(dec.func, ast.Name)
            and dec.func.id == "timeout_decorator.timeout"
        ):
            found = True

def test_unittest_preserves_existing_timeout_decorator():
    """If timeout_decorator.timeout already present, do not add another."""
    src = (
        "async def test_bar():\n"
        "@timeout_decorator.timeout(15)\n"
        "    await target_func()"
    )
    # Fix decorator position for ast parsing
    src = "@timeout_decorator.timeout(15)\nasync def test_bar():\n    await target_func()"
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_bar", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "unittest", [CodePosition(3, 4)], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 12.4μs -> 8.74μs (42.3% faster)
    # Should only be one timeout_decorator
    count = sum(
        1 for dec in result.decorator_list
        if isinstance(dec, ast.Call) and isinstance(dec.func, ast.Name) and dec.func.id == "timeout_decorator.timeout"
    )

# 2. Edge Test Cases

def test_no_body():
    """test_ function with empty body: should not fail or instrument."""
    src = "async def test_empty():\n    pass"
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_empty", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 4.43μs -> 3.04μs (45.9% faster)

def test_nested_await_target_call():
    """test_ function with target call inside If: should instrument if position matches."""
    src = (
        "async def test_nested():\n"
        "    if True:\n"
        "        await target_func()"
    )
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_nested", [Parent("Module", "")], "")
    # Await at line 3, col_offset 8
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [CodePosition(3, 8)], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 10.9μs -> 7.67μs (41.4% faster)
    # Should have assignment before the await
    assigns = [stmt for stmt in result.body if find_os_environ_assignment([stmt])]

def test_target_call_missing_lineno_col():
    """Target call missing lineno/col_offset: should not instrument."""
    # Manually create node with no lineno/col_offset
    node = ast.AsyncFunctionDef(
        name="test_missing",
        args=ast.arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]),
        body=[
            ast.Expr(value=ast.Await(value=ast.Call(func=ast.Name(id="target_func", ctx=ast.Load()), args=[], keywords=[])))
        ],
        decorator_list=[],
    )
    func = FunctionToOptimize("test_missing", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [CodePosition(2, 4)], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 10.2μs -> 5.71μs (79.5% faster)

def test_function_with_class_parent_sets_class_name():
    """If parent is ClassDef, class_name should be set."""
    func = FunctionToOptimize("test_bar", [Parent("ClassDef", "TestClass")], "TestClass")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [], TestingMode.BEHAVIOR)

def test_instrument_multiple_functions_independent_counters():
    """Instrument two test functions, counters should be independent."""
    src1 = "async def test_one():\n    await target_func()"
    src2 = "async def test_two():\n    await target_func()"
    node1 = get_async_func_node(src1)
    node2 = get_async_func_node(src2)
    func1 = FunctionToOptimize("test_one", [Parent("Module", "")], "")
    func2 = FunctionToOptimize("test_two", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func1, "mod.py", "pytest", [CodePosition(2, 4)], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node1); result1 = codeflash_output # 8.61μs -> 5.96μs (44.4% faster)
    instr2 = AsyncCallInstrumenter(func2, "mod.py", "pytest", [CodePosition(2, 4)], TestingMode.BEHAVIOR)
    codeflash_output = instr2.visit_AsyncFunctionDef(node2); result2 = codeflash_output # 6.27μs -> 3.44μs (82.5% faster)
    assign1 = find_os_environ_assignment(result1.body)
    assign2 = find_os_environ_assignment(result2.body)

def test_target_call_in_try_except_finally():
    """Instrument target call inside try/except/finally blocks."""
    src = (
        "async def test_try():\n"
        "    try:\n"
        "        await target_func()\n"
        "    except Exception:\n"
        "        await target_func()\n"
        "    finally:\n"
        "        await target_func()"
    )
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_try", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(
        func, "mod.py", "pytest",
        [CodePosition(3, 8), CodePosition(5, 8), CodePosition(7, 8)],
        TestingMode.BEHAVIOR,
    )
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 19.3μs -> 12.5μs (54.1% faster)
    assigns = [stmt for stmt in result.body if find_os_environ_assignment([stmt])]
    for i, assign in enumerate(assigns):
        pass

# 3. Large Scale Test Cases

def test_large_number_of_target_calls():
    """Instrument a function with hundreds of target calls, check counter increments."""
    lines = ["async def test_many():"] + [
        "    await target_func()" for _ in range(500)
    ]
    src = "\n".join(lines)
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_many", [Parent("Module", "")], "")
    positions = [CodePosition(i+2, 4) for i in range(500)]
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", positions, TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 2.05ms -> 905μs (127% faster)
    assigns = [stmt for stmt in result.body if find_os_environ_assignment([stmt])]
    # Check that each assignment is indexed correctly
    for i, assign in enumerate(assigns):
        pass

def test_large_number_of_non_target_calls():
    """Function with many non-target calls: should not instrument."""
    lines = ["async def test_many_non_targets():"] + [
        "    await other_func()" for _ in range(500)
    ]
    src = "\n".join(lines)
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_many_non_targets", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 2.05ms -> 894μs (129% faster)
    assigns = [stmt for stmt in result.body if find_os_environ_assignment([stmt])]

def test_large_mixed_calls():
    """Function with interleaved target and non-target calls, instrument only targets."""
    lines = ["async def test_mixed():"] + [
        "    await target_func()" if i % 3 == 0 else "    await other_func()" for i in range(300)
    ]
    src = "\n".join(lines)
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_mixed", [Parent("Module", "")], "")
    positions = [CodePosition(i+2, 4) for i in range(0, 300, 3)]
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", positions, TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 1.23ms -> 543μs (126% faster)
    assigns = [stmt for stmt in result.body if find_os_environ_assignment([stmt])]
    for i, assign in enumerate(assigns):
        pass

def test_large_body_no_targets():
    """Function with large body but no target calls: should not instrument."""
    lines = ["async def test_large():"] + [
        f"    x{i} = {i}" for i in range(900)
    ]
    src = "\n".join(lines)
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_large", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "pytest", [], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 3.17ms -> 1.41ms (125% faster)
    assigns = [stmt for stmt in result.body if find_os_environ_assignment([stmt])]

def test_large_decorator_list_preserved():
    """Function with many decorators: should preserve all and add timeout_decorator if needed."""
    decorators = "\n".join([f"@dec{i}" for i in range(20)])
    src = f"{decorators}\nasync def test_decs():\n    await target_func()"
    node = get_async_func_node(src)
    func = FunctionToOptimize("test_decs", [Parent("Module", "")], "")
    instr = AsyncCallInstrumenter(func, "mod.py", "unittest", [CodePosition(22, 4)], TestingMode.BEHAVIOR)
    codeflash_output = instr.visit_AsyncFunctionDef(node); result = codeflash_output # 14.0μs -> 10.2μs (37.6% faster)
    found = False
    for dec in result.decorator_list:
        if (
            isinstance(dec, ast.Call)
            and isinstance(dec.func, ast.Name)
            and dec.func.id == "timeout_decorator.timeout"
        ):
            found = True
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import ast
import os
import textwrap

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.instrument_existing_tests import \
    AsyncCallInstrumenter


class FunctionToOptimize:
    def __init__(self, function_name, parents=None, top_level_parent_name=None):
        self.function_name = function_name
        self.parents = parents if parents is not None else []
        self.top_level_parent_name = top_level_parent_name

class TestingMode:
    BEHAVIOR = "behavior"

# Helper to parse code and get AsyncFunctionDef node
def get_async_func_node(source_code: str, func_name: str = "test_func"):
    tree = ast.parse(textwrap.dedent(source_code))
    for node in ast.walk(tree):
        if isinstance(node, ast.AsyncFunctionDef) and node.name == func_name:
            return node
    raise ValueError("No async function found with name: " + func_name)

# Basic Test Cases

def test_returns_node_unmodified_for_non_test_function():
    # Should not instrument non-test async functions
    src = """
    async def not_a_test():
        await some_call()
    """
    node = get_async_func_node(src, "not_a_test")
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("not_a_test"), "mod.py", "pytest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 692ns -> 701ns (1.28% slower)

def test_instruments_single_await_call():
    # Should insert env assignment before awaited call
    src = """
    async def test_func():
        await some_call()
    """
    node = get_async_func_node(src)
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("test_func"), "mod.py", "pytest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 8.64μs -> 5.98μs (44.4% faster)
    # Check env assignment target and value
    assign = out_node.body[0]

def test_instruments_multiple_await_calls():
    # Should insert env assignment before each awaited call
    src = """
    async def test_func():
        await call_a()
        await call_b()
    """
    node = get_async_func_node(src)
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("test_func"), "mod.py", "pytest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 13.3μs -> 8.09μs (63.9% faster)

def test_unittest_adds_timeout_decorator():
    # Should add timeout_decorator if using unittest and not present
    src = """
    async def test_func():
        await call_a()
    """
    node = get_async_func_node(src)
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("test_func"), "mod.py", "unittest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 11.8μs -> 8.39μs (40.6% faster)
    # Should have a timeout_decorator in decorator_list
    found = False
    for d in out_node.decorator_list:
        if isinstance(d, ast.Call) and isinstance(d.func, ast.Name) and d.func.id == "timeout_decorator.timeout":
            found = True

def test_unittest_does_not_duplicate_timeout_decorator():
    # Should not add timeout_decorator if already present
    src = """
    @timeout_decorator.timeout(15)
    async def test_func():
        await call_a()
    """
    node = get_async_func_node(src)
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("test_func"), "mod.py", "unittest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 11.9μs -> 8.41μs (41.7% faster)
    count = 0
    for d in out_node.decorator_list:
        if isinstance(d, ast.Call) and isinstance(d.func, ast.Name) and d.func.id == "timeout_decorator.timeout":
            count += 1

# Edge Test Cases

def test_handles_no_body():
    # Should handle function with empty body
    src = """
    async def test_func():
        pass
    """
    node = get_async_func_node(src)
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("test_func"), "mod.py", "pytest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 4.30μs -> 3.02μs (42.1% faster)

def test_handles_nested_await_calls():
    # Should instrument nested await calls (e.g., in If/For)
    src = """
    async def test_func():
        if True:
            await call_a()
        for i in range(2):
            await call_b()
    """
    node = get_async_func_node(src)
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("test_func"), "mod.py", "pytest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 21.3μs -> 13.5μs (57.9% faster)
    # Should have env assignment before each await in nested blocks
    # In If: body[0] is If, body[1] is For
    if_stmt = out_node.body[0]
    for_stmt = out_node.body[1]


def test_handles_non_await_statements():
    # Should not instrument non-await statements
    src = """
    async def test_func():
        x = 1
        print(x)
    """
    node = get_async_func_node(src)
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("test_func"), "mod.py", "pytest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 13.6μs -> 9.05μs (50.8% faster)

def test_handles_await_call_with_multiple_args():
    # Should instrument await call with multiple arguments
    src = """
    async def test_func():
        await some_call(1, 2, 3)
    """
    node = get_async_func_node(src)
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("test_func"), "mod.py", "pytest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 11.6μs -> 8.07μs (43.2% faster)

# Large Scale Test Cases

def test_instruments_many_await_calls():
    # Should scale to instrument many await calls
    src_lines = ["async def test_func():"] + [f"    await call_{i}()" for i in range(100)]
    src = "\n".join(src_lines)
    node = get_async_func_node(src)
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("test_func"), "mod.py", "pytest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 411μs -> 184μs (123% faster)
    for i in range(0, 200, 2):
        pass

def test_instruments_large_nested_structure():
    # Should instrument await calls in nested structures
    src = "async def test_func():\n"
    for i in range(10):
        src += f"    if x == {i}:\n        await call_{i}()\n"
    node = get_async_func_node(src)
    instrumenter = AsyncCallInstrumenter(
        FunctionToOptimize("test_func"), "mod.py", "pytest", [], TestingMode.BEHAVIOR
    )
    codeflash_output = instrumenter.visit_AsyncFunctionDef(node); out_node = codeflash_output # 89.3μs -> 46.3μs (92.6% faster)
    # Each If should have env assignment and await
    for idx, stmt in enumerate(out_node.body):
        pass

To edit these changes git checkout codeflash/optimize-pr769-2025-09-27T02.50.03 and push.

The optimized code achieves a **123% speedup** by replacing expensive AST traversal operations with more efficient alternatives: **Key Optimizations:** 1. **Decorator Search Optimization**: Replaced the `any()` generator expression with a simple loop that breaks early when finding `timeout_decorator.timeout`. This avoids unnecessary attribute lookups and iterations through the decorator list, especially beneficial when the decorator is found early or when there are many decorators. 2. **AST Traversal Replacement**: The most significant optimization replaces `ast.walk(stmt)` with a manual stack-based depth-first search in `_optimized_instrument_statement()`. The original `ast.walk()` creates a list of every node in the AST subtree, which is memory-intensive and includes many irrelevant nodes. The optimized version: - Uses a stack to traverse nodes manually - Only explores child nodes via `_fields` attribute access - Immediately returns when finding an `ast.Await` node that matches criteria - Avoids creating intermediate collections **Performance Impact by Test Case:** - **Large-scale tests** see the biggest improvements (125-129% faster) because they have many statements to traverse - **Nested structures** benefit significantly (57-93% faster) as the optimization avoids deep, unnecessary traversals - **Simple test cases** still see 29-48% improvements from the decorator optimization - **Functions with many await calls** show excellent scaling (123-127% faster) due to reduced per-statement traversal costs The line profiler shows the critical bottleneck was in `_instrument_statement()` (96.4% of time originally), which is now reduced to 93.3% but with much lower absolute time, demonstrating the effectiveness of the AST traversal optimization.

codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Sep 27, 2025

codeflash-ai Bot mentioned this pull request Sep 27, 2025

enable async function optimization #769

Merged

KRRT7 merged commit be2ffea into clean-async-branch Sep 27, 2025
18 of 19 checks passed

codeflash-ai Bot deleted the codeflash/optimize-pr769-2025-09-27T02.50.03 branch September 27, 2025 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `AsyncCallInstrumenter.visit_AsyncFunctionDef` by 123% in PR #769 (`clean-async-branch`)#780

⚡️ Speed up method `AsyncCallInstrumenter.visit_AsyncFunctionDef` by 123% in PR #769 (`clean-async-branch`)#780
KRRT7 merged 1 commit into
clean-async-branchfrom
codeflash/optimize-pr769-2025-09-27T02.50.03

codeflash-ai Bot commented Sep 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeflash-ai Bot commented Sep 27, 2025

⚡️ This pull request contains optimizations for PR #769

📄 123% (1.23x) speedup for AsyncCallInstrumenter.visit_AsyncFunctionDef in codeflash/code_utils/instrument_existing_tests.py

📝 Explanation and details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 123% (1.23x) speedup for `AsyncCallInstrumenter.visit_AsyncFunctionDef` in `codeflash/code_utils/instrument_existing_tests.py`