Skip to content

Commit 9075ad2

Browse files
misrasaurabh1claude
andcommitted
fix: continue benchmark looping when some tests fail but timing markers exist
Previously, the benchmark loop stopped immediately when Maven returned non-zero (any test failure). This was too aggressive because: - Generated tests may have some failures - Passing tests still produce valid timing markers - We need multiple loops for accurate measurements Now the loop continues if timing markers are present, only stopping when: - No timing markers are found (all tests failed) - Target duration is reached - Max loops is reached This allows proper multi-loop benchmarking even when some generated tests fail, improving measurement accuracy. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 79fbd2b commit 9075ad2

1 file changed

Lines changed: 25 additions & 5 deletions

File tree

codeflash/languages/java/test_runner.py

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -640,9 +640,20 @@ def _run_benchmarking_tests_maven(
640640
)
641641
break
642642

643+
# Check if we have timing markers even if some tests failed
644+
# We should continue looping if we're getting valid timing data
643645
if result.returncode != 0:
644-
logger.warning("Tests failed in Maven loop %d, stopping", loop_idx)
645-
break
646+
import re
647+
timing_pattern = re.compile(r"!######[^:]*:[^:]*:[^:]*:[^:]*:[^:]+:[^:]+######!")
648+
has_timing_markers = bool(timing_pattern.search(result.stdout or ""))
649+
if not has_timing_markers:
650+
logger.warning("Tests failed in Maven loop %d with no timing markers, stopping", loop_idx)
651+
break
652+
else:
653+
logger.debug(
654+
"Some tests failed in Maven loop %d but timing markers present, continuing",
655+
loop_idx,
656+
)
646657

647658
combined_stdout = "\n".join(all_stdout)
648659
combined_stderr = "\n".join(all_stderr)
@@ -840,10 +851,19 @@ def run_benchmarking_tests(
840851
)
841852
break
842853

843-
# Check if tests failed - don't continue looping
854+
# Check if tests failed - continue looping if we have timing markers
844855
if result.returncode != 0:
845-
logger.warning("Tests failed in loop %d, stopping benchmark", loop_idx)
846-
break
856+
import re
857+
timing_pattern = re.compile(r"!######[^:]*:[^:]*:[^:]*:[^:]*:[^:]+:[^:]+######!")
858+
has_timing_markers = bool(timing_pattern.search(result.stdout or ""))
859+
if not has_timing_markers:
860+
logger.warning("Tests failed in loop %d with no timing markers, stopping benchmark", loop_idx)
861+
break
862+
else:
863+
logger.debug(
864+
"Some tests failed in loop %d but timing markers present, continuing",
865+
loop_idx,
866+
)
847867

848868
# Create a combined result with all stdout
849869
combined_stdout = "\n".join(all_stdout)

0 commit comments

Comments
 (0)