Skip to content

⚡️ Speed up function _find_java_executable by 15,704% in PR #1496 (fix/java/e2e/test)#1512

Closed
codeflash-ai[bot] wants to merge 1 commit into
fix/java/e2e/testfrom
codeflash/optimize-pr1496-2026-02-18T00.45.28
Closed

⚡️ Speed up function _find_java_executable by 15,704% in PR #1496 (fix/java/e2e/test)#1512
codeflash-ai[bot] wants to merge 1 commit into
fix/java/e2e/testfrom
codeflash/optimize-pr1496-2026-02-18T00.45.28

Conversation

@codeflash-ai

@codeflash-ai codeflash-ai Bot commented Feb 18, 2026

Copy link
Copy Markdown
Contributor

⚡️ This pull request contains optimizations for PR #1496

If you approve this dependent PR, these changes will be merged into the original PR branch fix/java/e2e/test.

This PR will be automatically closed if the original PR is merged.


📄 15,704% (157.04x) speedup for _find_java_executable in codeflash/languages/java/comparator.py

⏱️ Runtime : 790 milliseconds 5.00 milliseconds (best of 60 runs)

📝 Explanation and details

The optimization achieves a 157x speedup (from 790ms to 5ms) by adding @lru_cache(maxsize=1) to memoize the Java executable lookup result.

What changed:

  • Added from functools import lru_cache import
  • Decorated _find_java_executable() with @lru_cache(maxsize=1)

Why this is faster:
The original implementation performs expensive operations on every call:

  1. Subprocess calls (93.3% of runtime): subprocess.run([java_path, "--version"]) to verify Java works
  2. File system checks: Multiple Path.exists() calls across JAVA_HOME and Homebrew locations
  3. Environment lookups: platform.system(), os.environ.get(), shutil.which()

The line profiler shows the subprocess verification taking 890ms out of 953ms total time per invocation. Since Java's location is stable during a process's lifetime (environment variables and filesystem don't change), these checks are redundant after the first call.

With lru_cache, the function executes its expensive logic only once. Subsequent calls return the cached result in microseconds, bypassing all subprocess calls and filesystem operations.

Test case performance:
The test_repeated_calls_with_path_java_are_consistent_and_scale test demonstrates the optimization's impact most clearly—it calls the function 1000 times. Without caching, each call would spawn a subprocess and verify Java (1000 × ~890ms = 890 seconds). With caching, only the first call is expensive; the remaining 999 return instantly.

Impact on workloads:
This optimization is particularly beneficial when:

  • Java executable lookup happens multiple times during application lifecycle
  • The function is called in initialization code that runs repeatedly (e.g., per-compilation unit in a build system)
  • Java detection is part of validation logic executed in loops or across multiple operations

The trade-off is that environment changes during process execution won't be detected, but this is acceptable since JAVA_HOME and system paths rarely change at runtime.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1008 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 93.3%
🌀 Click to see Generated Regression Tests
import os
import stat
import subprocess
from pathlib import Path

import pytest  # used for our unit tests
from codeflash.languages.java.comparator import _find_java_executable

# Helper to create a simple shell "java" or "mvn" executable script in a directory.
def _write_executable(path: Path, script_text: str):
    """
    Write a file at `path` containing `script_text` and make it executable.
    Uses a POSIX shell script shebang so subprocess.run(...) will execute it.
    This helper only uses real filesystem operations (no mocks).
    """
    path.write_text(script_text)
    # Make file executable by owner/group/others
    mode = path.stat().st_mode
    path.chmod(mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)

def test_returns_none_when_no_java_present(tmp_path):
    # Save and restore relevant environment variables to avoid cross-test pollution
    old_path = os.environ.get("PATH")
    old_java_home = os.environ.pop("JAVA_HOME", None)

    try:
        # Ensure PATH contains only an empty temporary directory so there's no "java"
        os.environ["PATH"] = str(tmp_path)

        # No JAVA_HOME and no java in PATH -> function should return None
        codeflash_output = _find_java_executable(); result = codeflash_output
    finally:
        # Restore environment
        if old_path is None:
            os.environ.pop("PATH", None)
        else:
            os.environ["PATH"] = old_path
        if old_java_home is not None:
            os.environ["JAVA_HOME"] = old_java_home

def test_uses_java_home_when_present_even_if_not_executable(tmp_path):
    # Create a fake JAVA_HOME directory with bin/java file (exists but not executable)
    java_home = tmp_path / "fake-java-home"
    bin_dir = java_home / "bin"
    bin_dir.mkdir(parents=True)
    java_file = bin_dir / "java"
    java_file.write_text("# fake java binary placeholder\n")  # do not make executable

    # Set env to this JAVA_HOME and restrict PATH so we won't accidentally pick up a system java
    old_path = os.environ.get("PATH")
    old_java_home = os.environ.get("JAVA_HOME")
    try:
        os.environ["JAVA_HOME"] = str(java_home)
        os.environ["PATH"] = str(tmp_path / "empty")  # empty non-containing-path

        # The function checks JAVA_HOME first and uses Path.exists() -> should return file path
        codeflash_output = _find_java_executable(); result = codeflash_output
        expected = str(java_file)
    finally:
        # Restore environment
        if old_path is None:
            os.environ.pop("PATH", None)
        else:
            os.environ["PATH"] = old_path
        if old_java_home is None:
            os.environ.pop("JAVA_HOME", None)
        else:
            os.environ["JAVA_HOME"] = old_java_home

def test_java_home_with_spaces_in_path(tmp_path):
    # Create a JAVA_HOME path with spaces and ensure it's handled
    java_home = tmp_path / "path with spaces"
    bin_dir = java_home / "bin"
    bin_dir.mkdir(parents=True)
    (bin_dir / "java").write_text("placeholder\n")  # existence is enough

    old_path = os.environ.get("PATH")
    old_java_home = os.environ.get("JAVA_HOME")
    try:
        os.environ["JAVA_HOME"] = str(java_home)
        os.environ["PATH"] = ""  # prevent fallback to system java

        # Should return the path containing the space character correctly
        codeflash_output = _find_java_executable(); result = codeflash_output
    finally:
        if old_path is None:
            os.environ.pop("PATH", None)
        else:
            os.environ["PATH"] = old_path
        if old_java_home is None:
            os.environ.pop("JAVA_HOME", None)
        else:
            os.environ["JAVA_HOME"] = old_java_home

def test_prefers_java_home_over_path_java(tmp_path):
    # Create a valid JAVA_HOME/bin/java (exists)
    java_home = tmp_path / "jh"
    (java_home / "bin").mkdir(parents=True)
    (java_home / "bin" / "java").write_text("jh-binary\n")

    # Also create a PATH java executable which would be found if JAVA_HOME missing
    path_dir = tmp_path / "p"
    path_dir.mkdir()
    path_java = path_dir / "java"
    _write_executable(path_java, "#!/bin/sh\n[ \"$1\" = \"--version\" ] && echo ok && exit 0\nexit 0\n")

    old_path = os.environ.get("PATH")
    old_java_home = os.environ.get("JAVA_HOME")
    try:
        os.environ["JAVA_HOME"] = str(java_home)
        # Put path_dir in PATH too, but JAVA_HOME should take precedence
        os.environ["PATH"] = str(path_dir)

        codeflash_output = _find_java_executable(); result = codeflash_output
    finally:
        if old_path is None:
            os.environ.pop("PATH", None)
        else:
            os.environ["PATH"] = old_path
        if old_java_home is None:
            os.environ.pop("JAVA_HOME", None)
        else:
            os.environ["JAVA_HOME"] = old_java_home

def test_path_java_returned_when_executable_and_returns_zero(tmp_path):
    # No JAVA_HOME set; create an executable "java" in a temp PATH that returns exit code 0 for --version
    path_dir = tmp_path / "bin"
    path_dir.mkdir()
    java_exec = path_dir / "java"

    # Script prints version and exits 0 when invoked with --version
    script = (
        "#!/bin/sh\n"
        "if [ \"$1\" = \"--version\" ]; then\n"
        "  echo 'openjdk 17'\n"
        "  exit 0\n"
        "fi\n"
        "exit 0\n"
    )
    _write_executable(java_exec, script)

    old_path = os.environ.get("PATH")
    old_java_home = os.environ.pop("JAVA_HOME", None)
    try:
        # Ensure only our directory is searched by which
        os.environ["PATH"] = str(path_dir)

        codeflash_output = _find_java_executable(); result = codeflash_output
    finally:
        if old_path is None:
            os.environ.pop("PATH", None)
        else:
            os.environ["PATH"] = old_path
        if old_java_home is not None:
            os.environ["JAVA_HOME"] = old_java_home

def test_ignores_path_java_when_nonzero_exit(tmp_path):
    # No JAVA_HOME set; create an executable "java" that returns non-zero for --version
    path_dir = tmp_path / "bin_nonzero"
    path_dir.mkdir()
    java_exec = path_dir / "java"

    # Script returns non-zero when asked for --version -> the function should ignore it
    script = (
        "#!/bin/sh\n"
        "if [ \"$1\" = \"--version\" ]; then\n"
        "  echo 'not java'\n"
        "  exit 1\n"
        "fi\n"
        "exit 0\n"
    )
    _write_executable(java_exec, script)

    old_path = os.environ.get("PATH")
    old_java_home = os.environ.pop("JAVA_HOME", None)
    try:
        os.environ["PATH"] = str(path_dir)

        codeflash_output = _find_java_executable(); result = codeflash_output
    finally:
        if old_path is None:
            os.environ.pop("PATH", None)
        else:
            os.environ["PATH"] = old_path
        if old_java_home is not None:
            os.environ["JAVA_HOME"] = old_java_home

def test_java_home_missing_but_path_used(tmp_path):
    # JAVA_HOME points to a nonexistent folder -> should fall back to PATH java
    fake_java_home = tmp_path / "nonexistent-home"  # do not create it

    # Create PATH java that returns 0
    path_dir = tmp_path / "p2"
    path_dir.mkdir()
    java_exec = path_dir / "java"
    _write_executable(java_exec, "#!/bin/sh\n[ \"$1\" = \"--version\" ] && echo ok && exit 0\nexit 0\n")

    old_path = os.environ.get("PATH")
    old_java_home = os.environ.get("JAVA_HOME")
    try:
        os.environ["JAVA_HOME"] = str(fake_java_home)
        os.environ["PATH"] = str(path_dir)

        # Should ignore non-existent JAVA_HOME and return PATH java
        codeflash_output = _find_java_executable(); result = codeflash_output
    finally:
        if old_path is None:
            os.environ.pop("PATH", None)
        else:
            os.environ["PATH"] = old_path
        if old_java_home is None:
            os.environ.pop("JAVA_HOME", None)
        else:
            os.environ["JAVA_HOME"] = old_java_home

def test_repeated_calls_with_path_java_are_consistent_and_scale(tmp_path):
    # Create a PATH java executable that always returns 0 for --version.
    path_dir = tmp_path / "consistent"
    path_dir.mkdir()
    java_exec = path_dir / "java"
    _write_executable(
        java_exec,
        "#!/bin/sh\nif [ \"$1\" = \"--version\" ]; then echo 'openjdk 17'; exit 0; fi\nexit 0\n",
    )

    old_path = os.environ.get("PATH")
    old_java_home = os.environ.pop("JAVA_HOME", None)
    try:
        os.environ["PATH"] = str(path_dir)

        # Call the function many times (up to 1000) to exercise repeated subprocess invocations.
        for i in range(1000):
            codeflash_output = _find_java_executable(); res = codeflash_output
    finally:
        if old_path is None:
            os.environ.pop("PATH", None)
        else:
            os.environ["PATH"] = old_path
        if old_java_home is not None:
            os.environ["JAVA_HOME"] = old_java_home
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import os
import subprocess
import sys
import tempfile
from pathlib import Path
from unittest import mock

import pytest
from codeflash.languages.java.comparator import _find_java_executable

def test_returns_consistent_result_on_multiple_calls():
    """Test that multiple calls to the function return consistent results."""
    with tempfile.TemporaryDirectory() as tmpdir:
        java_home = Path(tmpdir) / "java"
        bin_dir = java_home / "bin"
        bin_dir.mkdir(parents=True, exist_ok=True)
        java_exe = bin_dir / "java"
        java_exe.touch()
        
        with mock.patch.dict(os.environ, {"JAVA_HOME": str(java_home)}, clear=True):
            results = [_find_java_executable() for _ in range(100)]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1496-2026-02-18T00.45.28 and push.

Codeflash Static Badge

The optimization achieves a **157x speedup** (from 790ms to 5ms) by adding `@lru_cache(maxsize=1)` to memoize the Java executable lookup result.

**What changed:**
- Added `from functools import lru_cache` import
- Decorated `_find_java_executable()` with `@lru_cache(maxsize=1)`

**Why this is faster:**
The original implementation performs expensive operations on every call:
1. **Subprocess calls** (93.3% of runtime): `subprocess.run([java_path, "--version"])` to verify Java works
2. **File system checks**: Multiple `Path.exists()` calls across JAVA_HOME and Homebrew locations
3. **Environment lookups**: `platform.system()`, `os.environ.get()`, `shutil.which()`

The line profiler shows the subprocess verification taking 890ms out of 953ms total time per invocation. Since Java's location is stable during a process's lifetime (environment variables and filesystem don't change), these checks are redundant after the first call.

With `lru_cache`, the function executes its expensive logic only once. Subsequent calls return the cached result in microseconds, bypassing all subprocess calls and filesystem operations.

**Test case performance:**
The `test_repeated_calls_with_path_java_are_consistent_and_scale` test demonstrates the optimization's impact most clearly—it calls the function 1000 times. Without caching, each call would spawn a subprocess and verify Java (1000 × ~890ms = 890 seconds). With caching, only the first call is expensive; the remaining 999 return instantly.

**Impact on workloads:**
This optimization is particularly beneficial when:
- Java executable lookup happens multiple times during application lifecycle
- The function is called in initialization code that runs repeatedly (e.g., per-compilation unit in a build system)
- Java detection is part of validation logic executed in loops or across multiple operations

The trade-off is that environment changes during process execution won't be detected, but this is acceptable since JAVA_HOME and system paths rarely change at runtime.
@codeflash-ai codeflash-ai Bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 18, 2026
@codeflash-ai codeflash-ai Bot mentioned this pull request Feb 18, 2026
@mohammedahmed18

Copy link
Copy Markdown
Contributor

@HeshamHM28 sounds good to me

@codeflash-ai codeflash-ai Bot closed this Feb 18, 2026
@codeflash-ai

codeflash-ai Bot commented Feb 18, 2026

Copy link
Copy Markdown
Contributor Author

This PR has been automatically closed because the original PR #1496 by HeshamHM28 was closed.

@codeflash-ai codeflash-ai Bot deleted the codeflash/optimize-pr1496-2026-02-18T00.45.28 branch February 18, 2026 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant