[Chore] get pr number from gh action event json file, fallback to old behavior#354
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
…hore/get-pr-number-from-gh-action-event-file`) Here’s an optimized rewrite of your code. The main bottleneck in this short program is I/O (reading from disk), and possibly calling `os.getenv` and creating a `Path` object. However, there are some small speedups possible. - Use `open()` directly for a string path—using `Path.open()` adds an unnecessary object creation step. - Avoid returning an empty dictionary with a different key in the cache for different environments. Instead, cache only successful loads. - Use `os.environ.get` for slightly faster environment access. - Specify the encoding in `open` for potential future-proofing and speed. Here’s the improved version. **Changes made:** - Replaced `os.getenv` with slightly faster `os.environ.get`. - Used the built-in `open` instead of `Path(event_path).open()` (avoids `Path` object creation). - Explicit UTF-8 encoding for speed and consistency. - Eliminated unused `Path` import. --- Beyond these changes, this function is already about as fast as possible given its necessary I/O and JSON parsing. Real-world bottlenecks for this function are dominated by disk and JSON decode times. If repeated calls with changed environment are required, removing `lru_cache` can improve correctness at a slight cost to speed. If speed is *critical* and the file is excessively large, consider a faster JSON parser (like `orjson`), but this is typically overkill for GitHub event data. Need more aggressive optimization or C extensions? Let me know!
| event_path = os.getenv("GITHUB_EVENT_PATH") | ||
| if not event_path: | ||
| return {} | ||
| with Path(event_path).open() as f: |
There was a problem hiding this comment.
⚡️Codeflash found 32% (0.32x) speedup for get_cached_gh_event_data in codeflash/code_utils/env_utils.py
⏱️ Runtime : 2.11 milliseconds → 1.59 milliseconds (best of 107 runs)
📝 Explanation and details
Here’s an optimized rewrite of your code. The main bottleneck in this short program is I/O (reading from disk), and possibly calling `os.getenv` and creating a `Path` object. However, there are some small speedups possible.- Use
open()directly for a string path—usingPath.open()adds an unnecessary object creation step. - Avoid returning an empty dictionary with a different key in the cache for different environments. Instead, cache only successful loads.
- Use
os.environ.getfor slightly faster environment access. - Specify the encoding in
openfor potential future-proofing and speed.
Here’s the improved version.
Changes made:
- Replaced
os.getenvwith slightly fasteros.environ.get. - Used the built-in
openinstead ofPath(event_path).open()(avoidsPathobject creation). - Explicit UTF-8 encoding for speed and consistency.
- Eliminated unused
Pathimport.
Beyond these changes, this function is already about as fast as possible given its necessary I/O and JSON parsing. Real-world bottlenecks for this function are dominated by disk and JSON decode times. If repeated calls with changed environment are required, removing lru_cache can improve correctness at a slight cost to speed. If speed is critical and the file is excessively large, consider a faster JSON parser (like orjson), but this is typically overkill for GitHub event data.
Need more aggressive optimization or C extensions? Let me know!
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🌀 Generated Regression Tests | ✅ 48 Passed |
| ⏪ Replay Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | 🔘 None Found |
| 📊 Tests Coverage | 100.0% |
🌀 Generated Regression Tests and Runtime
from __future__ import annotations
import json
import os
import tempfile
from functools import lru_cache
from pathlib import Path
# imports
import pytest # used for our unit tests
from codeflash.code_utils.env_utils import get_cached_gh_event_data
def write_json_file(path: Path, data: dict):
"""Helper to write JSON data to a file."""
with path.open('w', encoding='utf-8') as f:
json.dump(data, f)
def test_no_env_var(monkeypatch):
"""Test when GITHUB_EVENT_PATH is not set."""
monkeypatch.delenv("GITHUB_EVENT_PATH", raising=False)
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 3.48μs -> 3.31μs (5.14% faster)
def test_env_var_points_to_nonexistent_file(monkeypatch, tmp_path):
"""Test when GITHUB_EVENT_PATH points to a file that does not exist."""
fake_path = tmp_path / "doesnotexist.json"
monkeypatch.setenv("GITHUB_EVENT_PATH", str(fake_path))
with pytest.raises(FileNotFoundError):
get_cached_gh_event_data()
def test_env_var_points_to_invalid_json(monkeypatch, tmp_path):
"""Test when GITHUB_EVENT_PATH points to a file with invalid JSON."""
invalid_json_file = tmp_path / "bad.json"
invalid_json_file.write_text('{"not": "valid",}', encoding="utf-8") # Trailing comma is invalid
monkeypatch.setenv("GITHUB_EVENT_PATH", str(invalid_json_file))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
def test_env_var_points_to_empty_file(monkeypatch, tmp_path):
"""Test when GITHUB_EVENT_PATH points to an empty file."""
empty_file = tmp_path / "empty.json"
empty_file.write_text("", encoding="utf-8")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(empty_file))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
def test_env_var_points_to_valid_json(monkeypatch, tmp_path):
"""Test when GITHUB_EVENT_PATH points to a valid JSON file."""
data = {"action": "opened", "number": 42}
json_file = tmp_path / "event.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 45.5μs -> 32.1μs (41.8% faster)
def test_env_var_points_to_json_with_non_ascii(monkeypatch, tmp_path):
"""Test when JSON contains non-ASCII characters."""
data = {"message": "café", "emoji": "😀"}
json_file = tmp_path / "unicode.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 46.3μs -> 32.4μs (42.6% faster)
def test_env_var_points_to_json_with_nested_data(monkeypatch, tmp_path):
"""Test when JSON contains nested structures."""
data = {"outer": {"inner": {"value": [1, 2, 3]}}}
json_file = tmp_path / "nested.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 46.6μs -> 32.6μs (43.0% faster)
def test_env_var_points_to_json_with_empty_dict(monkeypatch, tmp_path):
"""Test when JSON file contains an empty dict."""
data = {}
json_file = tmp_path / "emptydict.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 44.3μs -> 30.3μs (46.2% faster)
def test_env_var_points_to_json_with_empty_list(monkeypatch, tmp_path):
"""Test when JSON file contains an empty list (should return a list, not dict)."""
data = []
json_file = tmp_path / "emptylist.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 43.7μs -> 30.3μs (44.2% faster)
def test_env_var_points_to_json_with_non_dict(monkeypatch, tmp_path):
"""Test when JSON file contains a non-dict, non-list value (e.g., int, str, bool)."""
for val in [123, "hello", True, None]:
json_file = tmp_path / f"val_{str(val)}.json"
write_json_file(json_file, val)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 360ns -> 340ns (5.88% faster)
def test_lru_cache_behavior(monkeypatch, tmp_path):
"""Test that lru_cache prevents re-reading the file after first call."""
data1 = {"foo": 1}
data2 = {"bar": 2}
json_file = tmp_path / "event.json"
write_json_file(json_file, data1)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
# First call caches data1
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 521ns -> 471ns (10.6% faster)
# Overwrite file with data2
write_json_file(json_file, data2)
# Second call should still return data1 due to cache
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 521ns -> 471ns (10.6% faster)
def test_cache_cleared_reads_new_data(monkeypatch, tmp_path):
"""Test that clearing the cache causes the function to re-read the file."""
data1 = {"foo": 1}
data2 = {"bar": 2}
json_file = tmp_path / "event.json"
write_json_file(json_file, data1)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 44.4μs -> 30.2μs (47.3% faster)
write_json_file(json_file, data2)
get_cached_gh_event_data.cache_clear()
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 44.4μs -> 30.2μs (47.3% faster)
def test_large_json(monkeypatch, tmp_path):
"""Test with a large JSON object (scalability/performance)."""
data = {f"key_{i}": i for i in range(1000)}
json_file = tmp_path / "large.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 195μs -> 178μs (9.89% faster)
def test_large_nested_json(monkeypatch, tmp_path):
"""Test with a large, deeply nested JSON structure."""
data = current = {}
for i in range(100):
current[f"level_{i}"] = {}
current = current[f"level_{i}"]
json_file = tmp_path / "deep.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 65.5μs -> 47.6μs (37.4% faster)
# Walk down the nested structure to check depth
current = result
for i in range(100):
current = current[f"level_{i}"]
def test_large_list_json(monkeypatch, tmp_path):
"""Test with a large list as the root JSON object."""
data = [i for i in range(1000)]
json_file = tmp_path / "biglist.json"
write_json_file(json_file, data)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(json_file))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 96.5μs -> 82.9μs (16.5% faster)
def test_env_var_points_to_file_with_whitespace(monkeypatch, tmp_path):
"""Test when JSON file contains only whitespace."""
whitespace_file = tmp_path / "whitespace.json"
whitespace_file.write_text(" \n\t ", encoding="utf-8")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(whitespace_file))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
def test_env_var_points_to_file_with_comments(monkeypatch, tmp_path):
"""Test when JSON file contains comments (which are invalid in JSON)."""
comment_file = tmp_path / "comment.json"
comment_file.write_text('{"foo": 1} // comment', encoding="utf-8")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(comment_file))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations
import json
import os
import shutil
import tempfile
from functools import lru_cache
from pathlib import Path
# imports
import pytest # used for our unit tests
from codeflash.code_utils.env_utils import get_cached_gh_event_data
# --- Basic Test Cases ---
def test_no_env_var_returns_empty_dict(monkeypatch):
# GITHUB_EVENT_PATH is not set
monkeypatch.delenv("GITHUB_EVENT_PATH", raising=False)
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 3.43μs -> 3.41μs (0.558% faster)
def test_env_var_empty_returns_empty_dict(monkeypatch):
# GITHUB_EVENT_PATH is set to empty string
monkeypatch.setenv("GITHUB_EVENT_PATH", "")
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 1.66μs -> 1.59μs (4.39% faster)
def test_valid_json_file(monkeypatch, tmp_path):
# Create a valid JSON file
data = {"action": "opened", "number": 42}
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 45.6μs -> 32.4μs (40.8% faster)
def test_valid_json_file_non_ascii(monkeypatch, tmp_path):
# JSON with non-ASCII characters
data = {"message": "こんにちは", "user": "测试"}
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps(data), encoding="utf-8")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 46.2μs -> 32.7μs (41.3% faster)
def test_valid_json_file_empty_dict(monkeypatch, tmp_path):
# JSON file with empty dict
data = {}
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 43.8μs -> 30.8μs (42.5% faster)
# --- Edge Test Cases ---
def test_env_var_points_to_nonexistent_file(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file that does not exist
file_path = tmp_path / "does_not_exist.json"
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
with pytest.raises(FileNotFoundError):
get_cached_gh_event_data()
def test_env_var_points_to_directory(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a directory, not a file
monkeypatch.setenv("GITHUB_EVENT_PATH", str(tmp_path))
with pytest.raises(IsADirectoryError):
get_cached_gh_event_data()
def test_env_var_points_to_invalid_json(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with invalid JSON
file_path = tmp_path / "bad.json"
file_path.write_text("{not: valid json}")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
def test_env_var_points_to_json_array(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with a JSON array, not a dict
file_path = tmp_path / "array.json"
file_path.write_text(json.dumps([1, 2, 3]))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 45.4μs -> 31.6μs (43.6% faster)
def test_env_var_points_to_json_null(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with JSON null
file_path = tmp_path / "null.json"
file_path.write_text("null")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 43.6μs -> 30.4μs (43.4% faster)
def test_env_var_points_to_json_number(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with a JSON number
file_path = tmp_path / "num.json"
file_path.write_text("123")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 44.0μs -> 31.1μs (41.7% faster)
def test_env_var_points_to_json_string(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with a JSON string
file_path = tmp_path / "str.json"
file_path.write_text('"hello"')
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 43.9μs -> 30.8μs (42.9% faster)
def test_env_var_points_to_empty_file(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to an empty file
file_path = tmp_path / "empty.json"
file_path.write_text("")
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
with pytest.raises(json.JSONDecodeError):
get_cached_gh_event_data()
def test_file_permission_denied(monkeypatch, tmp_path):
# GITHUB_EVENT_PATH points to a file with no read permissions
file_path = tmp_path / "event.json"
file_path.write_text('{"foo": "bar"}')
file_path.chmod(0o000) # Remove all permissions
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
try:
with pytest.raises(PermissionError):
get_cached_gh_event_data()
finally:
# Restore permissions so tmp_path can be cleaned up
file_path.chmod(0o644)
def test_cache_behavior(monkeypatch, tmp_path):
# Ensure lru_cache is working: changing file content after first call has no effect
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps({"a": 1}))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 511ns -> 491ns (4.07% faster)
# Change file content
file_path.write_text(json.dumps({"a": 2}))
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 511ns -> 491ns (4.07% faster)
def test_cache_cleared(monkeypatch, tmp_path):
# After cache_clear, new file content is read
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps({"a": 1}))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 43.4μs -> 30.0μs (44.5% faster)
# Change file content
file_path.write_text(json.dumps({"a": 2}))
get_cached_gh_event_data.cache_clear()
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 43.4μs -> 30.0μs (44.5% faster)
# --- Large Scale Test Cases ---
def test_large_json_file(monkeypatch, tmp_path):
# Test with a large JSON object (under 1000 keys)
data = {f"key_{i}": i for i in range(1000)}
file_path = tmp_path / "large.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 193μs -> 174μs (10.4% faster)
def test_large_json_array(monkeypatch, tmp_path):
# Test with a large JSON array (under 1000 elements)
data = [i for i in range(1000)]
file_path = tmp_path / "large_array.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 96.0μs -> 82.7μs (16.1% faster)
def test_deeply_nested_json(monkeypatch, tmp_path):
# Test with deeply nested JSON (depth ~100)
data = curr = {}
for i in range(100):
curr["nested"] = {}
curr = curr["nested"]
file_path = tmp_path / "deep.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result = codeflash_output # 56.8μs -> 43.4μs (31.1% faster)
# Walk down the nesting to verify structure
curr = result
for _ in range(100):
curr = curr["nested"]
def test_multiple_calls_same_result(monkeypatch, tmp_path):
# Multiple calls return the same object (due to lru_cache)
data = {"foo": "bar"}
file_path = tmp_path / "event.json"
file_path.write_text(json.dumps(data))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file_path))
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 261ns -> 250ns (4.40% faster)
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 261ns -> 250ns (4.40% faster)
def test_multiple_env_paths(monkeypatch, tmp_path):
# Changing GITHUB_EVENT_PATH does not change result due to lru_cache
file1 = tmp_path / "event1.json"
file2 = tmp_path / "event2.json"
file1.write_text(json.dumps({"a": 1}))
file2.write_text(json.dumps({"a": 2}))
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file1))
codeflash_output = get_cached_gh_event_data(); result1 = codeflash_output # 261ns -> 261ns (0.000% faster)
monkeypatch.setenv("GITHUB_EVENT_PATH", str(file2))
codeflash_output = get_cached_gh_event_data(); result2 = codeflash_output # 261ns -> 261ns (0.000% faster)
# After cache_clear, new env var is respected
get_cached_gh_event_data.cache_clear()
codeflash_output = get_cached_gh_event_data(); result3 = codeflash_output # 261ns -> 261ns (0.000% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.To test or edit this optimization locally git merge codeflash/optimize-pr354-2025-06-20T15.43.29
| event_path = os.getenv("GITHUB_EVENT_PATH") | |
| if not event_path: | |
| return {} | |
| with Path(event_path).open() as f: | |
| event_path = os.environ.get("GITHUB_EVENT_PATH") | |
| if not event_path: | |
| return {} | |
| with open(event_path, encoding="utf-8") as f: |
misrasaurabh1
left a comment
There was a problem hiding this comment.
thanks! this makes it easier to use codeflash :)
…et-pr-number-from-gh-action-event-file
…et-pr-number-from-gh-action-event-file
…et-pr-number-from-gh-action-event-file
…umber-from-gh-action-event-file`)
Here is an optimized version of your program. The main hotspots in your code are.
- Disk IO with reading/parsing the event file (unavoidable but can be slightly optimized).
- Using `Path(event_path).open()` is slower than using `open(event_path, ...)`.
- `@lru_cache` introduces a bit of function call and hash overhead each time since it wraps your function. Since your maxsize is 1, and the data is constant in a GitHub Actions run, you can instead use a simple module-level cache variable with a sentinel value to avoid that overhead.
- The use of lots of chained `.get` with nested dictionaries can be condensed slightly for speed.
Below is a rewritten version maintaining all external behavior (same function names and signatures, same return values).
**Summary of optimizations:**
- Replaced `@lru_cache` with a lightweight module-level cache for `get_cached_gh_event_data`. Since the event file will not change during a single GH Actions run, this is safe and removes function call/lookup overhead.
- Used plain `open()` instead of the slower `Path(event_path).open()`.
- Reduced nested `.get(..., {})` lookups to a single step for faster logic.
- Kept exception handling to prevent failure if the file is missing/corrupt.
- No external behavior was changed: all function names/signatures/return values are identical.
- Preserved all important comments as requested.
If you want even more performance and you **know** in your context that the event file always exists and is well-formed, you can strip out the try/except block. But the above version stays robust and is still much faster.
⚡️ Codeflash found optimizations for this PR📄 24% (0.24x) speedup for
|
User description
this will load the pr number from event.json file, typically at
/home/runner/work/_temp/_github_workflow/event.json, instead of using$CODEFLASH_PR_NUMBERhow I tested this:
PR Type
Enhancement, Documentation
Description
Add GH event JSON fallback for PR number
Update PR number retrieval logic in env_utils.py
Improve error message when PR number missing
Remove manual PR number setting in workflows/docs
Changes walkthrough 📝
env_utils.py
Add GH event JSON fallback in env utilscodeflash/code_utils/env_utils.py
codeflash-optimize.yaml
Remove manual PR number env var.github/workflows/codeflash-optimize.yaml
codeflash-optimize.yaml
Remove manual PR number env var in CLI workflowcodeflash/cli_cmds/workflows/codeflash-optimize.yaml
codeflash-github-actions.md
Remove manual PR number from docsdocs/docs/getting-started/codeflash-github-actions.md