Optimize transform_java_assertions

codeflash-ai[bot] · web-flow · commit 105a42856997 · 2026-02-20T12:35:02.000Z
This optimization achieves a **71% runtime improvement** through three key changes that reduce repeated work and CPU overhead:

## What Changed

1. **Module-level regex compilation**: The assignment-detection regex (`_ASSIGN_RE`) is now compiled once at module import time instead of being recompiled for every `JavaAssertTransformer` instance. In the original code, line profiler shows `re.compile()` consuming **78.5% of `__init__` time** (671μs per call × 42 calls). The optimized version reduces this to **47.1%** (157μs per call), saving ~520μs total across all instances.

2. **Lazy analyzer initialization**: The `JavaAnalyzer` is now created on-demand in the `transform()` method only when needed, rather than eagerly in `__init__`. This eliminates unnecessary analyzer creation when instances don't end up calling `transform()`. The optimized code shows the lazy check taking only 13.7μs versus the eager initialization cost.

3. **O(n²) → O(n) nested assertion detection**: The original code used a nested loop to filter nested assertions, comparing every assertion against every other assertion (1.28M comparisons for 1,884 assertions, consuming **75.5% of transform() time**). The optimized version uses a single-pass algorithm with a running `max_end` tracker, reducing this to just 1,884 comparisons (~0.3% of transform time).

4. **Linear string building**: The original code applied replacements in reverse order using repeated string slicing (`result[:start] + replacement + result[end:]`), which created intermediate string copies. The optimized version builds a list of string parts in a single forward pass and joins them once, eliminating redundant memory allocations.

## Why It's Faster

- **Reduced redundant work**: Compiling the same regex pattern 42 times was pure overhead - the pattern never changes between instances.
- **Algorithmic improvement**: The nested loop performed O(n²) comparisons where O(n) sufficed. With typical test files having hundreds of assertions, this quadratic behavior was the primary bottleneck (consuming 75.5% of runtime).
- **Memory efficiency**: Building strings incrementally via slicing creates n intermediate copies for n replacements. The parts-list approach allocates once and assembles once.

## Impact on Workloads

The function references show `transform_java_assertions()` is called extensively in test transformation workflows. The optimization particularly benefits:

- **Large test files**: The `test_large_source_file` case (500 assertions) improved by **53.1%** (41.9ms → 27.4ms)
- **Very large files**: The `test_1000_line_source` case (1000 assertions) improved by **115%** (115ms → 53.7ms)
- **Many repeated calls**: The `test_many_assertions` case (100 assertions) improved by **10.4%** (5.88ms → 5.32ms)

Since test files often contain dozens to hundreds of assertion statements, and the function is called once per test transformation, these improvements compound significantly in CI/CD pipelines processing entire test suites.

The optimization is most effective for test files with many assertions, where the O(n²) nested detection becomes the dominant bottleneck.
diff --git a/codeflash/languages/java/remove_asserts.py b/codeflash/languages/java/remove_asserts.py
@@ -28,6 +28,8 @@
     from codeflash.discovery.functions_to_optimize import FunctionToOptimize
     from codeflash.languages.java.parser import JavaAnalyzer
 
+_ASSIGN_RE = re.compile(r"(\w+(?:<[^>]+>)?)\s+(\w+)\s*=\s*$")
+
 logger = logging.getLogger(__name__)
 
 
@@ -206,6 +208,12 @@ def transform(self, source: str) -> str:
         if not source or not source.strip():
             return source
 
+        # Detect framework from imports
+
+        # Lazily create analyzer if it was not provided at construction time.
+        if self.analyzer is None:
+            self.analyzer = get_java_analyzer()
+
         # Detect framework from imports
         self._detected_framework = self._detect_framework(source)
 
@@ -220,28 +228,36 @@ def transform(self, source: str) -> str:
 
         # Filter out nested assertions (e.g., assertEquals inside assertAll)
         non_nested: list[AssertionMatch] = []
-        for i, assertion in enumerate(assertions):
-            is_nested = False
-            for j, other in enumerate(assertions):
-                if i != j:
-                    if other.start_pos <= assertion.start_pos and assertion.end_pos <= other.end_pos:
-                        is_nested = True
-                        break
-            if not is_nested:
-                non_nested.append(assertion)
+        max_end = -1
+        for assertion in assertions:
+            # If any previous assertion ends at or after this one's end, this is nested.
+            if max_end >= assertion.end_pos:
+                continue
+            non_nested.append(assertion)
+            if assertion.end_pos > max_end:
+                max_end = assertion.end_pos
+
+        # Pre-compute all replacements with correct counter values
 
         # Pre-compute all replacements with correct counter values
         replacements: list[tuple[int, int, str]] = []
         for assertion in non_nested:
             replacement = self._generate_replacement(assertion)
             replacements.append((assertion.start_pos, assertion.end_pos, replacement))
 
-        # Apply replacements in reverse order to preserve positions
-        result = source
-        for start_pos, end_pos, replacement in reversed(replacements):
-            result = result[:start_pos] + replacement + result[end_pos:]
+        # Apply replacements in ascending order by assembling parts to avoid repeated slicing.
+        if not replacements:
+            return source
+
+        parts: list[str] = []
+        prev = 0
+        for start_pos, end_pos, replacement in replacements:
+            parts.append(source[prev:start_pos])
+            parts.append(replacement)
+            prev = end_pos
+        parts.append(source[prev:])
 
-        return result
+        return "".join(parts)
 
     def _detect_framework(self, source: str) -> str:
         """Detect which testing framework is being used from imports.