Skip to content

Commit 48817d7

Browse files
Optimize extract_dependent_function
The optimized code achieves a **197% speedup (28.5ms → 9.57ms)** through three strategic optimizations that dramatically reduce expensive AST parsing operations: ## Key Optimizations **1. Early String Filtering (74% time reduction in parsing)** The optimization adds a lightweight heuristic check `if "def" not in code_string.code` before calling `ast.parse()`. Since function definitions require the `def` keyword, strings without it can be skipped entirely. In the profiler results, this reduced AST parsing from 32.5ms (80.5% of original runtime) to 9.9ms (74.2% of optimized runtime). The test results show dramatic improvements for large-scale scenarios: - `test_large_scale_many_code_strings_single_dependent_function`: **6839% faster** (4.45ms → 64.1μs) - `test_large_scale_with_preexisting_objects_and_many_irrelevant_entries`: **4193% faster** (2.26ms → 52.7μs) **2. Hoisted Main Function Name Computation** Moving `bare_main` calculation outside the loop (from line 13 to line 10) eliminates redundant string operations that were executed once per code string. This simple reordering saves repeated `rsplit()` calls. **3. Early Exit on Multiple Dependencies** The optimization checks `if len(dependent_functions) > 1: return False` immediately after adding each function name, rather than waiting until all code strings are processed. This allows the function to short-circuit as soon as it detects the failure condition, avoiding unnecessary AST parsing of remaining code strings. ## Why This Matters Based on the function references, `extract_dependent_function` is called during test generation workflows where it processes potentially hundreds or thousands of code strings. The optimization is particularly effective when: - Most code strings don't contain function definitions (common in test contexts with imports, variables, etc.) - Multiple dependent functions exist (early exit prevents wasted parsing) - Code bases have many test-related code strings that aren't function definitions The optimizations preserve exact behavior while intelligently avoiding expensive operations, making the code significantly more efficient in real-world usage patterns where the function processes large volumes of code strings.
1 parent c4ed6e3 commit 48817d7

1 file changed

Lines changed: 20 additions & 7 deletions

File tree

codeflash/code_utils/coverage_utils.py

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,29 @@
1313
def extract_dependent_function(main_function: str, code_context: CodeOptimizationContext) -> str | Literal[False]:
1414
"""Extract the single dependent function from the code context excluding the main function."""
1515
dependent_functions = set()
16-
for code_string in code_context.testgen_context.code_strings:
17-
ast_tree = ast.parse(code_string.code)
18-
dependent_functions.update(
19-
{node.name for node in ast_tree.body if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef))}
20-
)
2116

2217
# Compare using bare name since AST extracts bare function names
2318
bare_main = main_function.rsplit(".", 1)[-1] if "." in main_function else main_function
24-
if bare_main in dependent_functions:
25-
dependent_functions.discard(bare_main)
19+
20+
for code_string in code_context.testgen_context.code_strings:
21+
# Quick heuristic: skip parsing entirely if there is no 'def' token,
22+
# since no function definitions can be present without it.
23+
if "def" not in code_string.code:
24+
continue
25+
26+
ast_tree = ast.parse(code_string.code)
27+
# Add function names directly, skipping the bare main name.
28+
for node in ast_tree.body:
29+
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
30+
name = node.name
31+
if name == bare_main:
32+
continue
33+
dependent_functions.add(name)
34+
# If more than one dependent function (other than the main) is found,
35+
# we can return False early since the final result cannot be a single name.
36+
if len(dependent_functions) > 1:
37+
return False
38+
2639

2740
if not dependent_functions:
2841
return False

0 commit comments

Comments
 (0)