Skip to content

Commit ce46509

Browse files
Optimize discover_functions_from_source
Runtime improvement: the optimized version reduces end-to-end execution time from ~10.7ms to ~8.24ms (~29% speedup), with the biggest wins on workloads that enumerate many methods (the 1,000-method tests show ~26–34% faster). What changed (specific optimizations) - Hoisted the default file Path("unknown.java") out of the per-method allocation and cached it in resolved_file_path = file_path or Path("unknown.java"). That avoids constructing a Path object every time a FunctionToOptimize is created. - Built the parents list in one expression (conditional single-expression list) instead of creating an empty list and possibly calling .append() per-method. - Localized frequently accessed method attributes in _should_include_method (name, class_name, return_type) into local variables to reduce repeated attribute lookups inside the hot predicate logic. Why this speeds things up (mechanics) - Path() allocation cost: the original code executed file_path or Path("unknown.java") inside each loop iteration when constructing FunctionToOptimize. The profiler shows that line as one of the dominant costs. Moving that work outside the loop removes an allocation and Python-call overhead from each iteration, so the cost reduction scales with number of methods. - Fewer attribute lookups: accessing method.name and other attributes repeatedly in the tight filter loop triggers repeated attribute descriptor lookups in Python (C overhead). Binding them to local variables (fast loads) reduces that overhead for every conditional, which matters when the loop runs thousands of times. - Fewer temporaries/operations: replacing a two-step parents creation (list + append) with a single expression reduces bytecode and small allocations per method. Behavior / dependency changes - No behavioral change: the filters and returned FunctionToOptimize objects are constructed the same; the code still uses the same analyzer and criteria. No new dependencies were added or removed. - Minor implementation detail: resolved_file_path is computed once rather than evaluating file_path or Path(...) repeatedly — purely a micro-optimization. Impact on workloads and hot paths - This function is in a hot path: discover_functions_from_source is called by code that parses Java files and then extracts contexts (see tests and function_references). For large files or projects (many methods per file), the per-method savings compound, so throughput and latency improve noticeably. - Best-case scenarios: large-scale processing of many methods per file (the large tests show the biggest relative gains). - Small inputs: for tiny inputs (zero or one method), the constant overhead of the extra assignment and micro-benchmark noise can make some individual tests appear slightly slower. The profiler and annotated tests show a few micro-test regressions, but these are small absolute changes and are a reasonable trade-off for the large-scale improvements. Test signal - Unit tests and regression tests remain functionally equivalent in the provided suite; tests that exercise large numbers of methods show consistent speedups. A handful of very small-case tests report marginally slower times due to fixed per-call overheads — acceptable given the throughput gains on real workloads. In short: the optimization focuses on reducing per-method CPU and allocation overhead in a hot loop (avoid repeated Path allocations, reduce attribute lookups, and remove small temporaries). Those reductions compound across many methods and produce the observed ~29% runtime improvement.
1 parent c8d4fd3 commit ce46509

1 file changed

Lines changed: 21 additions & 10 deletions

File tree

codeflash/languages/java/discovery.py

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ def discover_functions_from_source(
7979
include_static=True,
8080
)
8181

82+
83+
# Cache the resolved file_path to avoid creating Path("unknown.java") repeatedly
84+
resolved_file_path = file_path or Path("unknown.java")
85+
8286
functions: list[FunctionToOptimize] = []
8387

8488
for method in methods:
@@ -87,21 +91,22 @@ def discover_functions_from_source(
8791
continue
8892

8993
# Build parents list
90-
parents: list[FunctionParent] = []
91-
if method.class_name:
92-
parents.append(FunctionParent(name=method.class_name, type="ClassDef"))
94+
parents: list[FunctionParent] = (
95+
[FunctionParent(name=method.class_name, type="ClassDef")] if method.class_name else []
96+
)
97+
9398

9499
functions.append(
95100
FunctionToOptimize(
96101
function_name=method.name,
97-
file_path=file_path or Path("unknown.java"),
102+
file_path=resolved_file_path,
98103
starting_line=method.start_line,
99104
ending_line=method.end_line,
100105
starting_col=method.start_col,
101106
ending_col=method.end_col,
102107
parents=parents,
103108
is_async=False, # Java doesn't have async keyword
104-
is_method=method.class_name is not None,
109+
is_method=(method.class_name is not None),
105110
language="java",
106111
doc_start_line=method.javadoc_start_line,
107112
return_type=method.return_type,
@@ -130,30 +135,36 @@ def _should_include_method(
130135
True if the method should be included.
131136
132137
"""
138+
# Skip abstract methods (no implementation to optimize)
139+
# Localize frequently used attributes to reduce attribute lookups in the hot path
140+
name = method.name
141+
class_name = method.class_name
142+
return_type = method.return_type
143+
133144
# Skip abstract methods (no implementation to optimize)
134145
if method.is_abstract:
135146
return False
136147

137148
# Skip constructors (special case - could be optimized but usually not)
138-
if method.name == method.class_name:
149+
if name == class_name:
139150
return False
140151

141152
# Check include patterns
142-
if not criteria.matches_include_patterns(method.name):
153+
if not criteria.matches_include_patterns(name):
143154
return False
144155

145156
# Check exclude patterns
146-
if criteria.matches_exclude_patterns(method.name):
157+
if criteria.matches_exclude_patterns(name):
147158
return False
148159

149160
# Check require_return - void methods are allowed (verified via test pass/fail),
150161
# but non-void methods must have an actual return statement
151162
if criteria.require_return:
152-
if method.return_type != "void" and not analyzer.has_return_statement(method, source):
163+
if return_type != "void" and not analyzer.has_return_statement(method, source):
153164
return False
154165

155166
# Check include_methods - in Java, all functions in classes are methods
156-
if not criteria.include_methods and method.class_name is not None:
167+
if not criteria.include_methods and class_name is not None:
157168
return False
158169

159170
# Check line count

0 commit comments

Comments
 (0)