Skip to content

Commit 27d519d

Browse files
committed
Code review agent workflow improvements
1 parent 9569f05 commit 27d519d

5 files changed

Lines changed: 579 additions & 83 deletions

File tree

.github/agents/code-review-and-fix.agent.md

Lines changed: 46 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
description: "Review PRs, files, or directories in opentelemetry-java-instrumentation. Apply safe fixes directly; report unfixable issues in the summary only."
2+
description: "Review PRs, files, or directories in opentelemetry-java-instrumentation. Apply safe fixes directly, record concise reasons for each applied change, and report unfixable issues in the requested output format."
33
tools: [read, edit, execute, search]
44
---
55

@@ -9,9 +9,13 @@ Primary responsibilities:
99

1010
- Review code against repository standards and established patterns.
1111
- Apply safe, deterministic fixes directly in source files whenever possible.
12+
- Record each applied fix with a concise factual reason tied to the repository rule or review guideline that justified it.
1213
- **Never insert inline comments** (`// REVIEW:`, `# REVIEW:`, etc.) into source files.
13-
Issues that cannot be fixed are reported only in the final summary table.
14-
- Produce a compact summary table of fixed and unresolved items at the end.
14+
Issues that cannot be fixed are reported only in the final output.
15+
- Produce only the output format requested by the caller. Do not assume or add a default output format.
16+
- Use only the tools actually exposed by the runtime. Do not assume helper or companion tools exist.
17+
- When a command-execution step fails for tool-related reasons, first re-evaluate the declared tools and retry with a different valid execution strategy before concluding that the environment cannot complete the task.
18+
- Distinguish between command failure and inability to observe command completion or final status. Do not collapse these into the same explanation.
1519

1620
Do not stop until all in-scope files are reviewed and fixed where possible.
1721

@@ -99,8 +103,14 @@ For each file in scope:
99103
5. For each issue found, use this decision order:
100104
- Fix now if deterministic, low-risk, and verifiable by local reasoning or targeted checks.
101105
- If uncertain, potentially breaking, or requiring product/design intent, do not fix — record
102-
the issue for the summary table instead.
106+
the issue for the final output instead.
103107
- **Do not insert any inline comments into source files.**
108+
6. For every applied fix, record enough information to explain it later:
109+
- file path
110+
- category
111+
- concise description of the change
112+
- concise reason grounded in the relevant repository rule or review guideline
113+
- first relevant line number when the caller asks for line-oriented output
104114

105115
Auto-fix boundaries:
106116

@@ -206,6 +216,10 @@ Auto-fix boundaries:
206216
method actually returns `null`, instead of adding a null guard in the caller/callee.
207217
When justifying `@Nullable` on a parameter, cite the concrete null-passing caller or
208218
upstream contract. Do not justify it merely because the method guards against null.
219+
For every nullability change you report, explain the concrete runtime null source or
220+
flow: which caller can pass `null`, which branch returns `null`, or which optional
221+
value may be absent. Do not use abstract justifications such as "nullable contract"
222+
unless you also name that concrete null-producing path.
209223
**Exception — test files**: do not add `@Nullable` in test code.
210224
If a PR adds `@Nullable` to test files, flag it for removal.
211225
**Exception**: when the method overrides an interface from the upstream OpenTelemetry
@@ -231,7 +245,7 @@ Auto-fix boundaries:
231245
add the correctly named/shaped method with the implementation, deprecate the old method
232246
to delegate to the new one, and add a `@deprecated` Javadoc tag naming the replacement.
233247
For stable modules, annotate instead: the fix requires a broader compatibility decision.
234-
- Do not auto-fix (report in summary instead):
248+
- Do not auto-fix (report in the final output instead):
235249
- missing `testExperimental` task — when experimental flags are set unconditionally
236250
on all test tasks instead of being isolated in a dedicated task
237251
- behavior-changing logic without clear intent
@@ -244,20 +258,31 @@ Auto-fix boundaries:
244258
fix these, because on modern JDKs these are typically cached at the call site rather
245259
than allocated on every invocation
246260

247-
Comment formatting rules:
261+
Output content rules:
248262

249-
- **File column**: use only the simple class name without the `.java` extension
250-
and at most one line number (e.g., `FooClient:42`). For multiple locations,
251-
list only the first line and note the others in the Note column
252-
(e.g., Note: "… also lines 77, 95").
253-
- Include reason for non-fix and, when possible, a concrete next action.
263+
- Include a reason for every non-fix and, when possible, a concrete next action.
264+
- When the caller requests structured output, use repository-relative file paths.
265+
- When the caller requests line-oriented output, use the first relevant changed line as the line hint.
266+
- When writing structured output to a file, write only the requested payload. Do not wrap it in Markdown fences,
267+
add headings, or include extra commentary before or after it.
254268

255269
### Phase 4: Validate and Report
256270

257271
**All Gradle commands in this phase must use timeout `0` (no timeout). Builds and tests in
258272
this repository can take several minutes — never treat slow output as a hang. Always wait
259273
for completion.**
260274

275+
If a command-execution attempt fails for tool-related reasons, follow this recovery loop before
276+
reporting a limitation:
277+
278+
1. Re-check the tools declared for this agent and the runtime behavior you have actually observed.
279+
2. Retry using a different valid execution strategy that does not depend on the failed assumption.
280+
3. Only report a validation limitation after at least one concrete alternate approach has also failed
281+
or no alternate approach exists in the declared tool set.
282+
4. If validation still cannot be completed, the summary and any unresolved item must name the
283+
attempted command or validation step and say whether it failed or whether completion or final
284+
status could not be confirmed.
285+
261286
**Never pipe Gradle output through `tail`, `head`, `grep`, or any other command** (e.g.,
262287
`./gradlew :foo:check 2>&1 | tail -30`). Piping masks the Gradle exit code because the
263288
shell reports the exit code of the last pipe segment, not Gradle. A failing build will
@@ -285,11 +310,11 @@ Execute these steps strictly in order — do not reorder:
285310
apply it and re-run. Repeat at most **three times** per failing fix.
286311
3. If the failure cannot be resolved after three attempts — or if the only correct
287312
resolution is to revert the review fix — **revert that specific change**
288-
(`git checkout -- <file>` for the affected lines) and record the item as
289-
`Needs Manual Fix` in the summary table with a note explaining the test failure.
313+
(`git checkout -- <file>` for the affected lines) and record the item as
314+
`Needs Manual Fix` in the final output with a note explaining the test failure.
290315
4. After reverting, re-run the affected `:check` tasks to confirm the revert restored
291316
a green build. If tests still fail on code you did not change, that is a
292-
pre-existing failure — note it in the summary but do not block the commit.
317+
pre-existing failure — note it in the final output but do not block the commit.
293318
5. Never commit code that fails tests you can reproduce locally.
294319

295320
**Testing-module dependent validation**: when any modified module is a `testing` module
@@ -326,8 +351,8 @@ Execute these steps strictly in order — do not reorder:
326351
apply it and re-run. Repeat at most **three times** per failing fix.
327352
3. If the failure cannot be resolved after three attempts — or if the only correct
328353
resolution is to revert the review fix — **revert that specific change**
329-
(`git checkout -- <file>` for the affected lines) and record the item as
330-
`Needs Manual Fix` in the summary table with a note explaining the muzzle failure.
354+
(`git checkout -- <file>` for the affected lines) and record the item as
355+
`Needs Manual Fix` in the final output with a note explaining the muzzle failure.
331356
4. After reverting, re-run the `:muzzle` task to confirm the revert restored a green
332357
build. Never commit code that fails muzzle validation.
333358
3. **Last, after all validation is done**, run `./gradlew spotlessApply` to fix formatting
@@ -337,7 +362,7 @@ Execute these steps strictly in order — do not reorder:
337362
and confirm non-empty output. If the only remaining diffs are whitespace changes — or if
338363
all review fixes were reverted during validation — **stop here**: reset the working tree
339364
(`git checkout -- .`), do not commit or push. If any reverted items were recorded as
340-
`Needs Manual Fix`, print the summary table with those items. Otherwise report
365+
`Needs Manual Fix`, emit the final output with those items. Otherwise report
341366
"No issues found." and exit.
342367
5. Commit all changes in a single commit. The subject line must always be
343368
`Review fixes for <module>` where `<module>` is the short module name (e.g.,
@@ -357,32 +382,12 @@ Execute these steps strictly in order — do not reorder:
357382
```
358383

359384
Create exactly one commit for all fixes — do not commit incrementally.
360-
6. Print one summary:
361-
- Heading: `PR #<number>: <title>` (PR mode) or `<paths>` (file/directory mode)
362-
- Table with status (`Fixed` or `Needs Manual Fix`), file, category, and note
363-
364-
Template:
365-
366-
```
367-
| Status | File | Category | Note |
368-
|--------|------|----------|------|
369-
| Fixed | Foo:42 | Style | Added class-level deprecation suppression for stable/old semconv dual mode |
370-
| Needs Manual Fix | Bar:77 | API | Requires compatibility decision before rename |
371-
```
372-
373-
If no findings:
374-
> `No issues found.`
385+
6. Produce the final output in the format requested by the caller.
375386

376-
When writing the summary to a file (as opposed to printing to the console), the output
377-
must be **only** the findings table — nothing else:
387+
The caller must define the final output format or schema. Follow that request exactly:
378388

379-
- Do **not** include headings (`##`), horizontal rules, or "Fix Review Summary" titles.
380-
- Do **not** include a "Files reviewed" table, per-file checklist, or notes section
381-
when there are zero findings. Write only `No issues found.`
382-
- Do **not** repeat the module path or scope description — the caller already knows it.
383-
- Do **not** include a totals/summary line (e.g. "Fixed: X · Needs manual fix: Y").
384-
- The file must contain **only** the table rows (or `No issues found.`).
385-
No preamble, no footer, no commentary.
389+
- Do **not** add headings, commentary, or fallback prose unless the caller asks for them.
390+
- Preserve the recorded per-change reasons in whatever output format the caller requested.
386391

387392
## Knowledge Loading
388393

@@ -393,7 +398,6 @@ Always load:
393398

394399
Load other knowledge files only when their scope trigger applies.
395400
Use the **Knowledge File** column in the checklist table.
396-
Use the **Knowledge File** column below.
397401

398402
## Review Checklist and Core Rules
399403

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
#!/usr/bin/env python3
2+
"""Extract the final assistant message from review CLI JSONL output."""
3+
4+
from __future__ import annotations
5+
6+
import argparse
7+
import json
8+
from pathlib import Path
9+
10+
11+
def parse_args() -> argparse.Namespace:
12+
parser = argparse.ArgumentParser()
13+
parser.add_argument("--input", required=True)
14+
parser.add_argument("--output", required=True)
15+
return parser.parse_args()
16+
17+
18+
def extract_final_message(path: Path) -> str:
19+
if not path.exists():
20+
raise ValueError(f"Review output file is missing: {path}")
21+
22+
final_message: str | None = None
23+
24+
for line_number, raw_line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1):
25+
line = raw_line.strip()
26+
if not line:
27+
continue
28+
try:
29+
event = json.loads(line)
30+
except json.JSONDecodeError as exc:
31+
raise ValueError(f"Invalid JSONL from review output on line {line_number}: {exc}") from exc
32+
33+
if event.get("type") != "assistant.message":
34+
continue
35+
36+
data = event.get("data")
37+
if not isinstance(data, dict):
38+
continue
39+
40+
content = data.get("content")
41+
if not isinstance(content, str):
42+
continue
43+
44+
final_message = content
45+
46+
if final_message is None:
47+
raise ValueError(
48+
"Review output did not contain an assistant.message event. "
49+
"The agent may not have produced a final response."
50+
)
51+
52+
if not final_message.strip():
53+
raise ValueError("Final assistant message was empty")
54+
55+
return final_message.strip()
56+
57+
58+
def main() -> None:
59+
args = parse_args()
60+
report = extract_final_message(Path(args.input))
61+
Path(args.output).write_text(report + "\n", encoding="utf-8")
62+
63+
64+
if __name__ == "__main__":
65+
main()
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
#!/usr/bin/env python3
2+
"""Print concise diagnostics for Copilot review CLI JSONL output."""
3+
4+
from __future__ import annotations
5+
6+
import argparse
7+
import json
8+
from collections import Counter
9+
from pathlib import Path
10+
11+
12+
def parse_args() -> argparse.Namespace:
13+
parser = argparse.ArgumentParser()
14+
parser.add_argument("--input", required=True)
15+
parser.add_argument("--tail", type=int, default=15)
16+
return parser.parse_args()
17+
18+
19+
def collapse(value: str, limit: int = 240) -> str:
20+
collapsed = " ".join(value.split())
21+
if len(collapsed) <= limit:
22+
return collapsed
23+
return collapsed[: limit - 3] + "..."
24+
25+
26+
def summarize_field(value: object) -> str:
27+
if isinstance(value, str):
28+
return collapse(value)
29+
if isinstance(value, bool):
30+
return str(value).lower()
31+
if isinstance(value, (int, float)):
32+
return str(value)
33+
if value is None:
34+
return "null"
35+
if isinstance(value, list):
36+
return f"list[{len(value)}]"
37+
if isinstance(value, dict):
38+
return f"object({', '.join(sorted(value.keys())[:6])})"
39+
return type(value).__name__
40+
41+
42+
def summarize_event(event: dict[str, object]) -> str:
43+
parts: list[str] = []
44+
data = event.get("data")
45+
46+
for key in ("type", "id"):
47+
if key in event:
48+
parts.append(f"{key}={summarize_field(event[key])}")
49+
50+
if isinstance(data, dict):
51+
for key in (
52+
"role",
53+
"name",
54+
"tool",
55+
"toolName",
56+
"command",
57+
"status",
58+
"exitCode",
59+
"exit_code",
60+
"content",
61+
"error",
62+
"message",
63+
):
64+
if key in data:
65+
parts.append(f"{key}={summarize_field(data[key])}")
66+
if len(parts) <= 2:
67+
parts.append(f"dataKeys={','.join(sorted(data.keys())[:8])}")
68+
69+
return "; ".join(parts)
70+
71+
72+
def main() -> None:
73+
args = parse_args()
74+
path = Path(args.input)
75+
76+
if not path.exists():
77+
print(f"Diagnostics input missing: {path}")
78+
return
79+
80+
raw_lines = path.read_text(encoding="utf-8").splitlines()
81+
print(f"Diagnostics for {path}")
82+
print(f"Line count: {len(raw_lines)}")
83+
84+
event_types: Counter[str] = Counter()
85+
assistant_messages: list[str] = []
86+
tail_events: list[tuple[int, str]] = []
87+
88+
for line_number, raw_line in enumerate(raw_lines, start=1):
89+
line = raw_line.strip()
90+
if not line:
91+
continue
92+
try:
93+
event = json.loads(line)
94+
except json.JSONDecodeError as exc:
95+
print(f"Invalid JSON on line {line_number}: {exc}")
96+
continue
97+
98+
event_type = str(event.get("type", "<missing>"))
99+
event_types[event_type] += 1
100+
101+
data = event.get("data")
102+
if event_type == "assistant.message" and isinstance(data, dict):
103+
content = data.get("content")
104+
if isinstance(content, str) and content.strip():
105+
assistant_messages.append(collapse(content, limit=500))
106+
107+
tail_events.append((line_number, summarize_event(event)))
108+
if len(tail_events) > args.tail:
109+
tail_events.pop(0)
110+
111+
print("Event types:")
112+
for event_type, count in event_types.most_common():
113+
print(f" {event_type}: {count}")
114+
115+
if assistant_messages:
116+
print("Last assistant message:")
117+
print(f" {assistant_messages[-1]}")
118+
else:
119+
print("Last assistant message:")
120+
print(" <none>")
121+
122+
print(f"Last {len(tail_events)} events:")
123+
for line_number, summary in tail_events:
124+
print(f" line {line_number}: {summary}")
125+
126+
127+
if __name__ == "__main__":
128+
main()

0 commit comments

Comments
 (0)