Skip to content

Commit 9bb9117

Browse files
authored
feat: resume reviews from cached codex state (#21)
1 parent 9343fef commit 9bb9117

21 files changed

Lines changed: 1630 additions & 74 deletions

.github/workflows/codex-review.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ jobs:
3030
with:
3131
mode: review
3232
openai_api_key: ${{ secrets.OPENAI_API_KEY }}
33-
model: gpt-5.1-codex-max
33+
model: gpt-5.4
3434
reasoning_effort: medium
3535
debug_level: 1
3636

README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,13 +117,22 @@ jobs:
117117
- **PR-level summary** as an issue comment on each run (refreshed on re-runs; prior summaries are deleted).
118118
- **Multi-line suggestions** only when contiguous and short; otherwise a single-line comment.
119119

120+
## Review Continuation
121+
122+
On repeated `pull_request` review runs, the action now tries to continue the prior Codex review instead of restarting from scratch.
123+
124+
1. The PR summary stores the previously reviewed head SHA in hidden metadata.
125+
2. Review mode caches an isolated Codex home keyed by repository, PR number, model, and reviewed SHA.
126+
3. On the next push, the action restores that cache, resumes the latest stored review thread, and scopes the prompt to the delta since the previously reviewed SHA.
127+
4. If the prior SHA is no longer an ancestor, the cache is missing, or no thread can be restored, the action falls back to a fresh full review.
128+
120129
## Deduplication on Repeated Runs
121130

122131
When a prior Codex review exists on the PR, reruns only reuse **unresolved Codex-authored review threads** as context.
123132

124133
1. **Inline semantic dedup** — prior unresolved Codex comments are passed to the model's structured-output turn so it can avoid reposting the same issue as a new finding.
125134
2. **Re-adjudicated carry-forward** — the model separately marks which of those prior unresolved Codex comments are still relevant now. Only those count toward the PR summary.
126-
3. **Separated counts** — the summary reports new findings from the current run separately from prior Codex findings that still appear relevant.
135+
3. **Separated counts** — the summary reports new findings and still-relevant prior findings separately.
127136

128137
## Security & Permissions
129138

action.yml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,29 @@ runs:
7474
if ! contains_word "$valid_search_modes" "${{ inputs.web_search_mode }}"; then
7575
echo "::error::Invalid web_search_mode: ${{ inputs.web_search_mode }} (allowed: disabled|cached|live)"; exit 2;
7676
fi
77+
- name: Prepare review resume state
78+
if: ${{ inputs.mode == 'review' }}
79+
id: review_resume_state
80+
shell: bash
81+
env:
82+
GITHUB_TOKEN: ${{ github.token }}
83+
GITHUB_API_URL: ${{ github.api_url }}
84+
GITHUB_EVENT_PATH: ${{ github.event_path }}
85+
GITHUB_REPOSITORY: ${{ github.repository }}
86+
RUNNER_TEMP: ${{ runner.temp }}
87+
CODEX_MODEL_INPUT: ${{ inputs.model }}
88+
GITHUB_ACTION_PATH: ${{ github.action_path }}
89+
run: |
90+
set -euo pipefail
91+
PYTHONPATH="${GITHUB_ACTION_PATH}:${PYTHONPATH:-}" \
92+
python3 -m cli.review.prepare_resume_state
93+
- name: Restore review Codex cache
94+
if: ${{ inputs.mode == 'review' && steps.review_resume_state.outputs.restore_key != '' }}
95+
id: review_codex_cache
96+
uses: actions/cache/restore@v4
97+
with:
98+
path: ${{ steps.review_resume_state.outputs.codex_home }}
99+
key: ${{ steps.review_resume_state.outputs.restore_key }}
77100
- name: Install Python dependencies
78101
shell: bash
79102
run: |
@@ -82,6 +105,7 @@ runs:
82105
python3 -m pip install ${{ inputs.extra_pip_args }} -r "${{ github.action_path }}/requirements.txt"
83106
84107
- name: Run Codex autonomous review (CLI)
108+
id: run_codex_cli
85109
shell: bash
86110
env:
87111
# Tokens
@@ -98,8 +122,17 @@ runs:
98122
DEBUG_CODEREVIEW: ${{ inputs.debug_level }}
99123
DRY_RUN: ${{ inputs.dry_run }}
100124
STREAM_AGENT_MESSAGES: ${{ inputs.stream_agent_messages }}
125+
CODEX_HOME: ${{ steps.review_resume_state.outputs.codex_home }}
126+
CODEX_REVIEW_PREVIOUS_HEAD_SHA: ${{ steps.review_resume_state.outputs.previous_reviewed_sha }}
127+
CODEX_REVIEW_CACHE_HIT: ${{ steps.review_codex_cache.outputs.cache-hit }}
101128
run: |
102129
set -euo pipefail
103130
# Execute the CLI directly; it detects GitHub Actions via env
104131
PYTHONPATH="${{ github.action_path }}:${PYTHONPATH:-}" \
105132
python3 -m cli.main
133+
- name: Save review Codex cache
134+
if: ${{ inputs.mode == 'review' && steps.run_codex_cli.outcome == 'success' && steps.review_resume_state.outputs.current_cache_key != '' && !(steps.review_codex_cache.outputs.cache-hit == 'true' && steps.review_resume_state.outputs.restore_key == steps.review_resume_state.outputs.current_cache_key) }}
135+
uses: actions/cache/save@v4
136+
with:
137+
path: ${{ steps.review_resume_state.outputs.codex_home }}
138+
key: ${{ steps.review_resume_state.outputs.current_cache_key }}

cli/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,15 @@ pytest tests/ -v
138138
- **Codex-thread attribution**: only unresolved review threads whose root author matches a prior Codex summary author are reused as rerun context.
139139
- **Inline semantic dedup**: the structured-output turn uses those prior Codex comments to decide which issues are new vs already covered.
140140
- **Re-adjudicated summary carry-forward**: the model returns prior comment IDs that still seem relevant, and the summary reports those separately from new findings.
141+
- **Auto-resolution of fixed Codex threads**: the model can also mark prior unresolved Codex comments as fixed, and review mode resolves those GitHub review threads automatically.
142+
143+
## Review Resume Between Pushes
144+
145+
- Review mode can resume the previous Codex thread when a PR receives new commits.
146+
- The summary issue comment stores the last reviewed head SHA in hidden metadata.
147+
- GitHub Actions review runs restore an isolated review-only `CODEX_HOME` cache keyed by repository, PR number, model, and reviewed SHA.
148+
- When the prior reviewed SHA is still an ancestor of the current head and the cached session index contains a thread, the workflow resumes that thread and narrows the prompt to `previous_reviewed_sha..HEAD`.
149+
- Small incremental diffs are embedded directly in the prompt; larger deltas are referenced by commit range and inspected with git during the review turn.
141150

142151
### Customizing the Review Prompt
143152

cli/clients/codex_client.py

Lines changed: 58 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
from __future__ import annotations
22

3+
import os
34
import sys
45
from collections.abc import Callable
56
from dataclasses import dataclass, field
67
from typing import Any, Literal, cast
78

8-
from codex import Codex, CodexOptions, ThreadStartOptions, TurnOptions
9+
from codex import Codex, CodexOptions, ThreadResumeOptions, ThreadStartOptions, TurnOptions
910
from codex.errors import CodexParseError, ThreadRunError
1011
from codex.protocol import types as protocol
1112
from codex.thread import CodexTurnStream, Thread
@@ -92,13 +93,15 @@ def execute_text(
9293
reasoning_effort: str | None = None,
9394
suppress_stream: bool = False,
9495
sandbox_mode: str = "read-only",
96+
resume_thread_id: str | None = None,
9597
) -> str:
9698
"""Run a single text turn and return the final agent text."""
9799
return self._run_session(
98100
model_name=model_name,
99101
reasoning_effort=reasoning_effort,
100102
suppress_stream=suppress_stream,
101103
sandbox_mode=sandbox_mode,
104+
resume_thread_id=resume_thread_id,
102105
session_runner=lambda thread, effort, stream_enabled: self._run_text_session(
103106
thread=thread,
104107
prompt=prompt,
@@ -117,13 +120,15 @@ def execute_structured(
117120
reasoning_effort: str | None = None,
118121
suppress_stream: bool = False,
119122
sandbox_mode: str = "read-only",
123+
resume_thread_id: str | None = None,
120124
) -> str:
121125
"""Run an agentic turn followed by a schema-enforced output turn."""
122126
return self._run_session(
123127
model_name=model_name,
124128
reasoning_effort=reasoning_effort,
125129
suppress_stream=suppress_stream,
126130
sandbox_mode=sandbox_mode,
131+
resume_thread_id=resume_thread_id,
127132
session_runner=lambda thread, effort, stream_enabled: self._run_structured_session(
128133
thread=thread,
129134
prompt=prompt,
@@ -141,15 +146,17 @@ def _run_session(
141146
reasoning_effort: str | None,
142147
suppress_stream: bool,
143148
sandbox_mode: str,
149+
resume_thread_id: str | None,
144150
session_runner: Callable[[Thread, str, bool], str],
145151
) -> str:
146152
effort = self._resolve_effort(reasoning_effort)
147153
stream_enabled = self._should_stream(suppress_stream)
148154

149155
try:
150-
thread = self._create_thread(
156+
thread = self._start_or_resume_thread(
151157
model_name=model_name,
152158
sandbox_mode=sandbox_mode,
159+
resume_thread_id=resume_thread_id,
153160
)
154161
return session_runner(thread, effort, stream_enabled)
155162
except ThreadRunError as run_err:
@@ -419,18 +426,62 @@ def _codex_web_search_mode(self) -> Literal["disabled", "cached", "live"]:
419426
self._debug(1, f"Invalid web search mode '{mode}', falling back to 'live'")
420427
return "live"
421428

422-
def _create_thread(self, *, model_name: str | None, sandbox_mode: str) -> Thread:
423-
resolved_sandbox_mode = self._normalize_sandbox_mode(sandbox_mode, "read-only")
429+
def _codex_process_env(self) -> dict[str, str] | None:
430+
codex_home = os.environ.get("CODEX_HOME")
431+
if not isinstance(codex_home, str):
432+
return None
433+
normalized = codex_home.strip()
434+
if not normalized:
435+
return None
436+
return {"CODEX_HOME": normalized}
437+
438+
def _resolved_model_name(self, model_name: str | None) -> str:
439+
resolved_model_name = model_name if model_name is not None else self.config.model_name
440+
return resolved_model_name.strip()
441+
442+
def _make_codex_client(self) -> Codex:
424443
return Codex(
425444
options=CodexOptions(
426445
config=cast(Any, {"show_raw_agent_reasoning": self.config.debug_level >= 2}),
427446
api_key=self._resolve_api_key(),
447+
env=self._codex_process_env(),
428448
)
429-
).start_thread(
449+
)
450+
451+
def _thread_config(self) -> dict[str, Literal["disabled", "cached", "live"]]:
452+
return {"web_search": self._codex_web_search_mode()}
453+
454+
def _start_or_resume_thread(
455+
self,
456+
*,
457+
model_name: str | None,
458+
sandbox_mode: str,
459+
resume_thread_id: str | None,
460+
) -> Thread:
461+
resolved_sandbox_mode = self._normalize_sandbox_mode(sandbox_mode, "read-only")
462+
resolved_model_name = self._resolved_model_name(model_name)
463+
codex_client = self._make_codex_client()
464+
if resume_thread_id:
465+
try:
466+
self._debug(1, f"Attempting to resume Codex thread {resume_thread_id}")
467+
return codex_client.resume_thread(
468+
resume_thread_id,
469+
ThreadResumeOptions(
470+
model=resolved_model_name,
471+
sandbox=cast(Any, resolved_sandbox_mode),
472+
config=cast(Any, self._thread_config()),
473+
),
474+
)
475+
except Exception as exc:
476+
self._debug(
477+
1,
478+
f"Failed to resume Codex thread {resume_thread_id}: {exc}; starting fresh",
479+
)
480+
return codex_client.start_thread(
430481
ThreadStartOptions(
431-
model=(model_name or self.config.model_name).strip(),
482+
model=resolved_model_name,
432483
sandbox=cast(Any, resolved_sandbox_mode),
433-
config=cast(Any, {"web_search": self._codex_web_search_mode()}),
484+
config=cast(Any, self._thread_config()),
434485
)
435486
)
436487

cli/clients/git_ops.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,36 @@ def git_is_ancestor(older_sha: str, newer_sha: str) -> bool:
191191
)
192192

193193

194+
def git_diff_text(revision_range: str, *, unified: int = 3) -> str:
195+
"""Return the git diff for ``revision_range``.
196+
197+
Raises:
198+
subprocess.CalledProcessError: Git diff failed.
199+
"""
200+
result = _run_git(
201+
["diff", f"--unified={unified}", "--no-color", revision_range],
202+
capture_output=True,
203+
)
204+
if result.returncode != 0:
205+
_raise_git_result_error(result)
206+
return result.stdout
207+
208+
209+
def git_commit_shas(revision_range: str) -> list[str]:
210+
"""Return commit SHAs in ``revision_range`` from oldest to newest.
211+
212+
Raises:
213+
subprocess.CalledProcessError: Git log probe failed.
214+
"""
215+
result = _run_git(
216+
["rev-list", "--reverse", revision_range],
217+
capture_output=True,
218+
)
219+
if result.returncode != 0:
220+
_raise_git_result_error(result)
221+
return [line.strip() for line in result.stdout.splitlines() if line.strip()]
222+
223+
194224
def git_rebase_in_progress() -> bool:
195225
"""Return whether repository is in an active rebase state.
196226

cli/core/exceptions.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,7 @@ class PromptError(CodexReviewError):
3131

3232
class ReviewContractError(CodexReviewError):
3333
"""Structured review payload or metadata contract violations."""
34+
35+
36+
class ReviewResumeError(CodexReviewError):
37+
"""Review resume invariant or infrastructure failures."""

cli/core/models.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,14 +165,16 @@ def as_dict(self) -> dict[str, Any]:
165165

166166

167167
@dataclass(frozen=True)
168-
class ExistingReviewComment:
169-
"""Structured inline review comment used for local dedupe."""
168+
class PriorCodexReviewComment:
169+
"""Unresolved Codex-authored review thread comment reused on reruns."""
170170

171171
id: str
172+
thread_id: str
172173
path: str
173174
line: int
174175
body: str
175176
current_code: str
177+
is_currently_applicable: bool
176178

177179

178180
@dataclass(frozen=True)

cli/main.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -299,11 +299,14 @@ def _run_mode_workflow(config: ReviewConfig) -> int:
299299

300300
summary = result.summary
301301
if summary.carried_forward_count > 0:
302+
extra_parts: list[str] = []
303+
if summary.carried_forward_count > 0:
304+
extra_parts.append(f"{summary.carried_forward_count} prior findings still relevant")
302305
print(
303306
"\nReview completed: "
304307
f"{summary.overall_correctness}, "
305308
f"{summary.current_findings_count} new findings, "
306-
f"{summary.carried_forward_count} prior findings still relevant "
309+
f"{', '.join(extra_parts)} "
307310
f"({summary.active_findings_count} active total)"
308311
)
309312
else:

0 commit comments

Comments
 (0)