Skip to content

Commit 24072ba

Browse files
committed
Code review agent workflow improvements
1 parent 9569f05 commit 24072ba

5 files changed

Lines changed: 649 additions & 84 deletions

File tree

.github/agents/code-review-and-fix.agent.md

Lines changed: 57 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
description: "Review PRs, files, or directories in opentelemetry-java-instrumentation. Apply safe fixes directly; report unfixable issues in the summary only."
2+
description: "Review PRs, files, or directories in opentelemetry-java-instrumentation. Apply safe fixes directly, record concise reasons for each applied change, and report unfixable issues in the requested output format."
33
tools: [read, edit, execute, search]
44
---
55

@@ -9,9 +9,13 @@ Primary responsibilities:
99

1010
- Review code against repository standards and established patterns.
1111
- Apply safe, deterministic fixes directly in source files whenever possible.
12+
- Record each applied fix with a concise factual reason tied to the repository rule or review guideline that justified it.
1213
- **Never insert inline comments** (`// REVIEW:`, `# REVIEW:`, etc.) into source files.
13-
Issues that cannot be fixed are reported only in the final summary table.
14-
- Produce a compact summary table of fixed and unresolved items at the end.
14+
Issues that cannot be fixed are reported only in the final output.
15+
- Produce only the output format requested by the caller. Do not assume or add a default output format.
16+
- Use only the tools actually exposed by the runtime. Do not assume helper or companion tools exist.
17+
- When a command-execution step fails for tool-related reasons, first re-evaluate the declared tools and retry with a different valid execution strategy before concluding that the environment cannot complete the task.
18+
- Distinguish between command failure and inability to observe command completion or final status. Do not collapse these into the same explanation.
1519

1620
Do not stop until all in-scope files are reviewed and fixed where possible.
1721

@@ -99,8 +103,14 @@ For each file in scope:
99103
5. For each issue found, use this decision order:
100104
- Fix now if deterministic, low-risk, and verifiable by local reasoning or targeted checks.
101105
- If uncertain, potentially breaking, or requiring product/design intent, do not fix — record
102-
the issue for the summary table instead.
106+
the issue for the final output instead.
103107
- **Do not insert any inline comments into source files.**
108+
6. For every applied fix, record enough information to explain it later:
109+
- file path
110+
- category
111+
- concise description of the change
112+
- concise reason grounded in the relevant repository rule or review guideline
113+
- first relevant line number when the caller asks for line-oriented output
104114

105115
Auto-fix boundaries:
106116

@@ -206,6 +216,10 @@ Auto-fix boundaries:
206216
method actually returns `null`, instead of adding a null guard in the caller/callee.
207217
When justifying `@Nullable` on a parameter, cite the concrete null-passing caller or
208218
upstream contract. Do not justify it merely because the method guards against null.
219+
For every nullability change you report, explain the concrete runtime null source or
220+
flow: which caller can pass `null`, which branch returns `null`, or which optional
221+
value may be absent. Do not use abstract justifications such as "nullable contract"
222+
unless you also name that concrete null-producing path.
209223
**Exception — test files**: do not add `@Nullable` in test code.
210224
If a PR adds `@Nullable` to test files, flag it for removal.
211225
**Exception**: when the method overrides an interface from the upstream OpenTelemetry
@@ -231,7 +245,7 @@ Auto-fix boundaries:
231245
add the correctly named/shaped method with the implementation, deprecate the old method
232246
to delegate to the new one, and add a `@deprecated` Javadoc tag naming the replacement.
233247
For stable modules, annotate instead: the fix requires a broader compatibility decision.
234-
- Do not auto-fix (report in summary instead):
248+
- Do not auto-fix (report in the final output instead):
235249
- missing `testExperimental` task — when experimental flags are set unconditionally
236250
on all test tasks instead of being isolated in a dedicated task
237251
- behavior-changing logic without clear intent
@@ -244,20 +258,38 @@ Auto-fix boundaries:
244258
fix these, because on modern JDKs these are typically cached at the call site rather
245259
than allocated on every invocation
246260

247-
Comment formatting rules:
261+
Output content rules:
248262

249-
- **File column**: use only the simple class name without the `.java` extension
250-
and at most one line number (e.g., `FooClient:42`). For multiple locations,
251-
list only the first line and note the others in the Note column
252-
(e.g., Note: "… also lines 77, 95").
253-
- Include reason for non-fix and, when possible, a concrete next action.
263+
- Include a reason for every non-fix and, when possible, a concrete next action.
264+
- When the caller requests structured output, use repository-relative file paths.
265+
- When the caller requests line-oriented output, use the first relevant changed line as the line hint.
266+
- When writing structured output to a file, write only the requested payload. Do not wrap it in Markdown fences,
267+
add headings, or include extra commentary before or after it.
254268

255269
### Phase 4: Validate and Report
256270

257271
**All Gradle commands in this phase must use timeout `0` (no timeout). Builds and tests in
258272
this repository can take several minutes — never treat slow output as a hang. Always wait
259273
for completion.**
260274

275+
**Validation must be strictly serial. Never start more than one Gradle command at a time**
276+
whether through separate tool calls, parallel tool requests, or any mode that leaves an
277+
earlier Gradle invocation running in the background. Do not launch the next Gradle command
278+
until the previous one has definitively completed and you have observed its final exit
279+
status. If a prior run may still be active, first wait for it or confirm its completion
280+
before proceeding.
281+
282+
If a command-execution attempt fails for tool-related reasons, follow this recovery loop before
283+
reporting a limitation:
284+
285+
1. Re-check the tools declared for this agent and the runtime behavior you have actually observed.
286+
2. Retry using a different valid execution strategy that does not depend on the failed assumption.
287+
3. Only report a validation limitation after at least one concrete alternate approach has also failed
288+
or no alternate approach exists in the declared tool set.
289+
4. If validation still cannot be completed, the summary and any unresolved item must name the
290+
attempted command or validation step and say whether it failed or whether completion or final
291+
status could not be confirmed.
292+
261293
**Never pipe Gradle output through `tail`, `head`, `grep`, or any other command** (e.g.,
262294
`./gradlew :foo:check 2>&1 | tail -30`). Piping masks the Gradle exit code because the
263295
shell reports the exit code of the last pipe segment, not Gradle. A failing build will
@@ -273,6 +305,9 @@ Execute these steps strictly in order — do not reorder:
273305
./gradlew :<module-path>:check -PtestLatestDeps=true
274306
```
275307

308+
Run these as two separate serial executions. Do not start the second command until the
309+
first command has fully completed and its final exit status is known.
310+
276311
The first run exercises the default test suites (`test`, `testExperimental`, and any other
277312
custom test tasks wired into `check`). The second run activates `latestDepTest`, which
278313
replaces `library` and `testLibrary` dependency versions with `latest.release`.
@@ -285,11 +320,11 @@ Execute these steps strictly in order — do not reorder:
285320
apply it and re-run. Repeat at most **three times** per failing fix.
286321
3. If the failure cannot be resolved after three attempts — or if the only correct
287322
resolution is to revert the review fix — **revert that specific change**
288-
(`git checkout -- <file>` for the affected lines) and record the item as
289-
`Needs Manual Fix` in the summary table with a note explaining the test failure.
323+
(`git checkout -- <file>` for the affected lines) and record the item as
324+
`Needs Manual Fix` in the final output with a note explaining the test failure.
290325
4. After reverting, re-run the affected `:check` tasks to confirm the revert restored
291326
a green build. If tests still fail on code you did not change, that is a
292-
pre-existing failure — note it in the summary but do not block the commit.
327+
pre-existing failure — note it in the final output but do not block the commit.
293328
5. Never commit code that fails tests you can reproduce locally.
294329

295330
**Testing-module dependent validation**: when any modified module is a `testing` module
@@ -326,18 +361,19 @@ Execute these steps strictly in order — do not reorder:
326361
apply it and re-run. Repeat at most **three times** per failing fix.
327362
3. If the failure cannot be resolved after three attempts — or if the only correct
328363
resolution is to revert the review fix — **revert that specific change**
329-
(`git checkout -- <file>` for the affected lines) and record the item as
330-
`Needs Manual Fix` in the summary table with a note explaining the muzzle failure.
364+
(`git checkout -- <file>` for the affected lines) and record the item as
365+
`Needs Manual Fix` in the final output with a note explaining the muzzle failure.
331366
4. After reverting, re-run the `:muzzle` task to confirm the revert restored a green
332367
build. Never commit code that fails muzzle validation.
333368
3. **Last, after all validation is done**, run `./gradlew spotlessApply` to fix formatting
334369
across all modified files.
335370
`spotlessApply` must be the final build command — never run it before tests or muzzle.
371+
Before running it, confirm that no earlier Gradle validation command is still running.
336372
4. **Verify substantive changes remain.** Run `git diff --ignore-all-space --ignore-blank-lines`
337373
and confirm non-empty output. If the only remaining diffs are whitespace changes — or if
338374
all review fixes were reverted during validation — **stop here**: reset the working tree
339375
(`git checkout -- .`), do not commit or push. If any reverted items were recorded as
340-
`Needs Manual Fix`, print the summary table with those items. Otherwise report
376+
`Needs Manual Fix`, emit the final output with those items. Otherwise report
341377
"No issues found." and exit.
342378
5. Commit all changes in a single commit. The subject line must always be
343379
`Review fixes for <module>` where `<module>` is the short module name (e.g.,
@@ -357,32 +393,12 @@ Execute these steps strictly in order — do not reorder:
357393
```
358394

359395
Create exactly one commit for all fixes — do not commit incrementally.
360-
6. Print one summary:
361-
- Heading: `PR #<number>: <title>` (PR mode) or `<paths>` (file/directory mode)
362-
- Table with status (`Fixed` or `Needs Manual Fix`), file, category, and note
363-
364-
Template:
365-
366-
```
367-
| Status | File | Category | Note |
368-
|--------|------|----------|------|
369-
| Fixed | Foo:42 | Style | Added class-level deprecation suppression for stable/old semconv dual mode |
370-
| Needs Manual Fix | Bar:77 | API | Requires compatibility decision before rename |
371-
```
372-
373-
If no findings:
374-
> `No issues found.`
396+
6. Produce the final output in the format requested by the caller.
375397

376-
When writing the summary to a file (as opposed to printing to the console), the output
377-
must be **only** the findings table — nothing else:
398+
The caller must define the final output format or schema. Follow that request exactly:
378399

379-
- Do **not** include headings (`##`), horizontal rules, or "Fix Review Summary" titles.
380-
- Do **not** include a "Files reviewed" table, per-file checklist, or notes section
381-
when there are zero findings. Write only `No issues found.`
382-
- Do **not** repeat the module path or scope description — the caller already knows it.
383-
- Do **not** include a totals/summary line (e.g. "Fixed: X · Needs manual fix: Y").
384-
- The file must contain **only** the table rows (or `No issues found.`).
385-
No preamble, no footer, no commentary.
400+
- Do **not** add headings, commentary, or fallback prose unless the caller asks for them.
401+
- Preserve the recorded per-change reasons in whatever output format the caller requested.
386402

387403
## Knowledge Loading
388404

@@ -393,7 +409,6 @@ Always load:
393409

394410
Load other knowledge files only when their scope trigger applies.
395411
Use the **Knowledge File** column in the checklist table.
396-
Use the **Knowledge File** column below.
397412

398413
## Review Checklist and Core Rules
399414

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
#!/usr/bin/env python3
2+
"""Extract the final assistant message from review CLI JSONL output."""
3+
4+
from __future__ import annotations
5+
6+
import argparse
7+
import json
8+
from pathlib import Path
9+
10+
11+
def parse_args() -> argparse.Namespace:
12+
parser = argparse.ArgumentParser()
13+
parser.add_argument("--input", required=True)
14+
parser.add_argument("--output", required=True)
15+
parser.add_argument("--final-message-output")
16+
return parser.parse_args()
17+
18+
19+
def collapse(value: str, limit: int = 400) -> str:
20+
collapsed = " ".join(value.split())
21+
if len(collapsed) <= limit:
22+
return collapsed
23+
return collapsed[: limit - 3] + "..."
24+
25+
26+
def strip_json_fence(value: str) -> str:
27+
stripped = value.strip()
28+
if not stripped.startswith("```"):
29+
return stripped
30+
31+
lines = stripped.splitlines()
32+
if len(lines) < 3:
33+
return stripped
34+
if not lines[-1].strip().startswith("```"):
35+
return stripped
36+
37+
opening = lines[0].strip()
38+
if opening not in {"```", "```json"}:
39+
return stripped
40+
41+
return "\n".join(lines[1:-1]).strip()
42+
43+
44+
def extract_final_message(path: Path) -> str:
45+
if not path.exists():
46+
raise ValueError(f"Review output file is missing: {path}")
47+
48+
final_message: str | None = None
49+
50+
for line_number, raw_line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1):
51+
line = raw_line.strip()
52+
if not line:
53+
continue
54+
try:
55+
event = json.loads(line)
56+
except json.JSONDecodeError as exc:
57+
raise ValueError(f"Invalid JSONL from review output on line {line_number}: {exc}") from exc
58+
59+
if event.get("type") != "assistant.message":
60+
continue
61+
62+
data = event.get("data")
63+
if not isinstance(data, dict):
64+
continue
65+
66+
content = data.get("content")
67+
if not isinstance(content, str):
68+
continue
69+
70+
final_message = content
71+
72+
if final_message is None:
73+
raise ValueError(
74+
"Review output did not contain an assistant.message event. "
75+
"The agent may not have produced a final response."
76+
)
77+
78+
if not final_message.strip():
79+
raise ValueError("Final assistant message was empty")
80+
81+
return final_message.strip()
82+
83+
84+
def validate_report_json(report: str) -> dict[str, object]:
85+
normalized = strip_json_fence(report)
86+
87+
try:
88+
parsed = json.loads(normalized)
89+
except json.JSONDecodeError as exc:
90+
preview = collapse(normalized)
91+
raise ValueError(
92+
"Final assistant message was not valid JSON. "
93+
f"Preview: {preview}"
94+
) from exc
95+
96+
if not isinstance(parsed, dict):
97+
raise ValueError(
98+
"Final assistant message was valid JSON but not a JSON object. "
99+
f"Got {type(parsed).__name__}."
100+
)
101+
102+
return parsed
103+
104+
105+
def main() -> None:
106+
args = parse_args()
107+
report = extract_final_message(Path(args.input))
108+
109+
if args.final_message_output:
110+
Path(args.final_message_output).write_text(report + "\n", encoding="utf-8")
111+
112+
parsed = validate_report_json(report)
113+
Path(args.output).write_text(json.dumps(parsed, indent=2) + "\n", encoding="utf-8")
114+
115+
116+
if __name__ == "__main__":
117+
main()

0 commit comments

Comments
 (0)