Skip to content

Commit 4ff9865

Browse files
Optimize collect_existing_class_names
The optimized code achieves a **350% speedup** (2.36ms → 523μs) by replacing the generic `ast.walk()` traversal with a targeted stack-based iteration that only visits nodes where class definitions can appear. **Key Performance Improvement:** The original implementation uses `ast.walk(tree)`, which performs an exhaustive depth-first traversal of **every single node** in the AST—including expressions, literals, operators, and other leaf nodes that can never contain class definitions. For a typical Python module, this means checking thousands of irrelevant nodes. The optimized version uses a stack-based approach that only descends into structural nodes (ClassDef, FunctionDef, If, For, While, With, Try blocks) where classes can actually be defined. This dramatically reduces the number of nodes visited and `isinstance()` checks performed. **Why This Matters:** From the test results, we see consistent 200-700% speedups across all scenarios: - Empty modules: 579% faster (5.37μs → 791ns) - minimal traversal overhead - Simple cases: 200-400% faster - fewer nodes to check - Complex nested structures: 405% faster (37.2μs → 7.37μs) - targeted descent pays off - Large modules (500 classes): 280% faster (869μs → 228μs) - scales better - Mixed workloads: 558% faster (799μs → 121μs) - avoids non-class nodes **Impact on Workloads:** Based on the function references showing this is called from `build_testgen_context`, this optimization benefits test generation workflows that analyze Python code structure. Since class extraction is likely performed repeatedly during code analysis, the 4x speedup directly improves overall test generation throughput. The optimization is particularly effective for large codebases with many classes and complex nesting patterns, as demonstrated by the benchmark results.
1 parent fadf6d4 commit 4ff9865

1 file changed

Lines changed: 22 additions & 1 deletion

File tree

codeflash/languages/python/context/code_context_extractor.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -563,7 +563,28 @@ def _parse_and_collect_imports(code_context: CodeStringsMarkdown) -> tuple[ast.M
563563

564564

565565
def collect_existing_class_names(tree: ast.Module) -> set[str]:
566-
return {node.name for node in ast.walk(tree) if isinstance(node, ast.ClassDef)}
566+
class_names = set()
567+
stack = list(tree.body)
568+
569+
while stack:
570+
node = stack.pop()
571+
if isinstance(node, ast.ClassDef):
572+
class_names.add(node.name)
573+
stack.extend(node.body)
574+
elif isinstance(node, ast.FunctionDef):
575+
stack.extend(node.body)
576+
elif isinstance(node, (ast.If, ast.For, ast.While, ast.With)):
577+
stack.extend(node.body)
578+
if hasattr(node, 'orelse'):
579+
stack.extend(node.orelse)
580+
elif isinstance(node, ast.Try):
581+
stack.extend(node.body)
582+
stack.extend(node.orelse)
583+
stack.extend(node.finalbody)
584+
for handler in node.handlers:
585+
stack.extend(handler.body)
586+
587+
return class_names
567588

568589

569590
def enrich_testgen_context(code_context: CodeStringsMarkdown, project_root_path: Path) -> CodeStringsMarkdown:

0 commit comments

Comments
 (0)