Skip to content

Commit 39ab73d

Browse files
Optimize _add_timing_instrumentation
The optimized code achieves a **103% speedup** (126ms → 62.2ms) by replacing an expensive O(n²) byte concatenation pattern with an efficient O(n) segment-based approach. **Key Optimization:** The original code assembled the instrumented source using repeated immutable byte concatenation in reverse order: ```python for start, end, new_bytes in sorted(replacements, key=lambda item: item[0], reverse=True): updated = updated[:start] + new_bytes + updated[end:] ``` This pattern consumed **76.7% of total execution time** because each concatenation creates a new bytes object, copying all existing data. With 1,340 replacements per test run, this quadratic behavior dominated performance. The optimized version builds a list of byte segments and joins them once: ```python segments: list[bytes] = [] prev_end = 0 for start, end, new_bytes in sorted(replacements, key=lambda item: item[0]): segments.append(source_bytes[prev_end:start]) segments.append(new_bytes) prev_end = end segments.append(source_bytes[prev_end:]) updated = b"".join(segments) ``` This reduces the replacement loop from 447ms to ~1ms (447× faster for this section), accounting for the majority of the overall 2× speedup. **Why This Works:** 1. **Linear memory copies**: Each byte range is copied exactly once into the segment list 2. **Single final join**: `b"".join()` allocates the exact final size and copies each segment once 3. **Forward iteration**: Eliminates the need for reverse sorting and repeated slicing operations **Impact on Workloads:** Based on the test results and function references, this optimization is particularly effective for: - **Large-scale instrumentation**: The `test_performance_with_massive_call_count` (1000 @test methods) shows a **176% speedup** (97.4ms → 35.3ms) - **Files with many test methods**: `test_many_test_methods_with_calls` (200 methods) achieves **23.3% speedup** - **Files with many statements**: `test_large_number_of_statements_is_handled_quickly_and_correctly` benefits from the reduced per-replacement overhead The function is called during Java test instrumentation for JIT warmup timing, so this speedup directly reduces the overhead of preparing performance benchmarks in the hot path of the Codeflash workflow.
1 parent 32bac21 commit 39ab73d

1 file changed

Lines changed: 8 additions & 3 deletions

File tree

codeflash/languages/java/instrumentation.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1235,9 +1235,14 @@ def build_instrumented_body(
12351235
wrapper_id = method_ordinal if new_wrapper_id == next_wrapper_id else new_wrapper_id
12361236
replacements.append((body_start, body_end, new_body.encode("utf8")))
12371237

1238-
updated = source_bytes
1239-
for start, end, new_bytes in sorted(replacements, key=lambda item: item[0], reverse=True):
1240-
updated = updated[:start] + new_bytes + updated[end:]
1238+
segments: list[bytes] = []
1239+
prev_end = 0
1240+
for start, end, new_bytes in sorted(replacements, key=lambda item: item[0]):
1241+
segments.append(source_bytes[prev_end:start])
1242+
segments.append(new_bytes)
1243+
prev_end = end
1244+
segments.append(source_bytes[prev_end:])
1245+
updated = b"".join(segments)
12411246
return updated.decode("utf8")
12421247

12431248

0 commit comments

Comments
 (0)