trask
diff --git a/‎.github/agents/code-review-and-fix.agent.md‎
Lines changed: 57 additions & 42 deletions b/‎.github/agents/code-review-and-fix.agent.md‎
Lines changed: 57 additions & 42 deletions
diff --git a/‎.github/scripts/code-review-extract-report.py‎
Lines changed: 117 additions & 0 deletions b/‎.github/scripts/code-review-extract-report.py‎
Lines changed: 117 additions & 0 deletions
@@ -1,5 +1,5 @@
 ---
-description: "Review PRs, files, or directories in opentelemetry-java-instrumentation. Apply safe fixes directly; report unfixable issues in the summary only."
+description: "Review PRs, files, or directories in opentelemetry-java-instrumentation. Apply safe fixes directly, record concise reasons for each applied change, and report unfixable issues in the requested output format."
 tools: [read, edit, execute, search]
 ---
 
@@ -9,9 +9,13 @@ Primary responsibilities:
 
 - Review code against repository standards and established patterns.
 - Apply safe, deterministic fixes directly in source files whenever possible.
+- Record each applied fix with a concise factual reason tied to the repository rule or review guideline that justified it.
 - **Never insert inline comments** (`// REVIEW:`, `# REVIEW:`, etc.) into source files.
-  Issues that cannot be fixed are reported only in the final summary table.
-- Produce a compact summary table of fixed and unresolved items at the end.
+  Issues that cannot be fixed are reported only in the final output.
+- Produce only the output format requested by the caller. Do not assume or add a default output format.
+- Use only the tools actually exposed by the runtime. Do not assume helper or companion tools exist.
+- When a command-execution step fails for tool-related reasons, first re-evaluate the declared tools and retry with a different valid execution strategy before concluding that the environment cannot complete the task.
+- Distinguish between command failure and inability to observe command completion or final status. Do not collapse these into the same explanation.
 
 Do not stop until all in-scope files are reviewed and fixed where possible.
 
@@ -99,8 +103,14 @@ For each file in scope:
 5. For each issue found, use this decision order:
    - Fix now if deterministic, low-risk, and verifiable by local reasoning or targeted checks.
    - If uncertain, potentially breaking, or requiring product/design intent, do not fix — record
-     the issue for the summary table instead.
+     the issue for the final output instead.
    - **Do not insert any inline comments into source files.**
+6. For every applied fix, record enough information to explain it later:
+  - file path
+  - category
+  - concise description of the change
+  - concise reason grounded in the relevant repository rule or review guideline
+  - first relevant line number when the caller asks for line-oriented output
 
 Auto-fix boundaries:
 
@@ -206,6 +216,10 @@ Auto-fix boundaries:
     method actually returns `null`, instead of adding a null guard in the caller/callee.
     When justifying `@Nullable` on a parameter, cite the concrete null-passing caller or
     upstream contract. Do not justify it merely because the method guards against null.
+    For every nullability change you report, explain the concrete runtime null source or
+    flow: which caller can pass `null`, which branch returns `null`, or which optional
+    value may be absent. Do not use abstract justifications such as "nullable contract"
+    unless you also name that concrete null-producing path.
     **Exception — test files**: do not add `@Nullable` in test code.
     If a PR adds `@Nullable` to test files, flag it for removal.
     **Exception**: when the method overrides an interface from the upstream OpenTelemetry
@@ -231,7 +245,7 @@ Auto-fix boundaries:
     add the correctly named/shaped method with the implementation, deprecate the old method
     to delegate to the new one, and add a `@deprecated` Javadoc tag naming the replacement.
     For stable modules, annotate instead: the fix requires a broader compatibility decision.
-- Do not auto-fix (report in summary instead):
+- Do not auto-fix (report in the final output instead):
   - missing `testExperimental` task — when experimental flags are set unconditionally
     on all test tasks instead of being isolated in a dedicated task
   - behavior-changing logic without clear intent
@@ -244,20 +258,38 @@ Auto-fix boundaries:
     fix these, because on modern JDKs these are typically cached at the call site rather
     than allocated on every invocation
 
-Comment formatting rules:
+Output content rules:
 
-- **File column**: use only the simple class name without the `.java` extension
-  and at most one line number (e.g., `FooClient:42`). For multiple locations,
-  list only the first line and note the others in the Note column
-  (e.g., Note: "… also lines 77, 95").
-- Include reason for non-fix and, when possible, a concrete next action.
+- Include a reason for every non-fix and, when possible, a concrete next action.
+- When the caller requests structured output, use repository-relative file paths.
+- When the caller requests line-oriented output, use the first relevant changed line as the line hint.
+- When writing structured output to a file, write only the requested payload. Do not wrap it in Markdown fences,
+  add headings, or include extra commentary before or after it.
 
 ### Phase 4: Validate and Report
 
 **All Gradle commands in this phase must use timeout `0` (no timeout). Builds and tests in
 this repository can take several minutes — never treat slow output as a hang. Always wait
 for completion.**
 
+**Validation must be strictly serial. Never start more than one Gradle command at a time**
+whether through separate tool calls, parallel tool requests, or any mode that leaves an
+earlier Gradle invocation running in the background. Do not launch the next Gradle command
+until the previous one has definitively completed and you have observed its final exit
+status. If a prior run may still be active, first wait for it or confirm its completion
+before proceeding.
+
+If a command-execution attempt fails for tool-related reasons, follow this recovery loop before
+reporting a limitation:
+
+1. Re-check the tools declared for this agent and the runtime behavior you have actually observed.
+2. Retry using a different valid execution strategy that does not depend on the failed assumption.
+3. Only report a validation limitation after at least one concrete alternate approach has also failed
+  or no alternate approach exists in the declared tool set.
+4. If validation still cannot be completed, the summary and any unresolved item must name the
+  attempted command or validation step and say whether it failed or whether completion or final
+  status could not be confirmed.
+
 **Never pipe Gradle output through `tail`, `head`, `grep`, or any other command** (e.g.,
 `./gradlew :foo:check 2>&1 | tail -30`). Piping masks the Gradle exit code because the
 shell reports the exit code of the last pipe segment, not Gradle. A failing build will
@@ -273,6 +305,9 @@ Execute these steps strictly in order — do not reorder:
    ./gradlew :<module-path>:check -PtestLatestDeps=true
    ```
 
+    Run these as two separate serial executions. Do not start the second command until the
+    first command has fully completed and its final exit status is known.
+
    The first run exercises the default test suites (`test`, `testExperimental`, and any other
    custom test tasks wired into `check`). The second run activates `latestDepTest`, which
    replaces `library` and `testLibrary` dependency versions with `latest.release`.
@@ -285,11 +320,11 @@ Execute these steps strictly in order — do not reorder:
       apply it and re-run. Repeat at most **three times** per failing fix.
    3. If the failure cannot be resolved after three attempts — or if the only correct
       resolution is to revert the review fix — **revert that specific change**
-      (`git checkout -- <file>` for the affected lines) and record the item as
-      `Needs Manual Fix` in the summary table with a note explaining the test failure.
+    (`git checkout -- <file>` for the affected lines) and record the item as
+    `Needs Manual Fix` in the final output with a note explaining the test failure.
    4. After reverting, re-run the affected `:check` tasks to confirm the revert restored
       a green build. If tests still fail on code you did not change, that is a
-      pre-existing failure — note it in the summary but do not block the commit.
+    pre-existing failure — note it in the final output but do not block the commit.
    5. Never commit code that fails tests you can reproduce locally.
 
    **Testing-module dependent validation**: when any modified module is a `testing` module
@@ -326,18 +361,19 @@ Execute these steps strictly in order — do not reorder:
       apply it and re-run. Repeat at most **three times** per failing fix.
    3. If the failure cannot be resolved after three attempts — or if the only correct
       resolution is to revert the review fix — **revert that specific change**
-      (`git checkout -- <file>` for the affected lines) and record the item as
-      `Needs Manual Fix` in the summary table with a note explaining the muzzle failure.
+    (`git checkout -- <file>` for the affected lines) and record the item as
+    `Needs Manual Fix` in the final output with a note explaining the muzzle failure.
    4. After reverting, re-run the `:muzzle` task to confirm the revert restored a green
       build. Never commit code that fails muzzle validation.
 3. **Last, after all validation is done**, run `./gradlew spotlessApply` to fix formatting
    across all modified files.
    `spotlessApply` must be the final build command — never run it before tests or muzzle.
+   Before running it, confirm that no earlier Gradle validation command is still running.
 4. **Verify substantive changes remain.** Run `git diff --ignore-all-space --ignore-blank-lines`
    and confirm non-empty output. If the only remaining diffs are whitespace changes — or if
    all review fixes were reverted during validation — **stop here**: reset the working tree
    (`git checkout -- .`), do not commit or push. If any reverted items were recorded as
-   `Needs Manual Fix`, print the summary table with those items. Otherwise report
+  `Needs Manual Fix`, emit the final output with those items. Otherwise report
    "No issues found." and exit.
 5. Commit all changes in a single commit. The subject line must always be
    `Review fixes for <module>` where `<module>` is the short module name (e.g.,
@@ -357,32 +393,12 @@ Execute these steps strictly in order — do not reorder:
    ```
 
    Create exactly one commit for all fixes — do not commit incrementally.
-6. Print one summary:
-   - Heading: `PR #<number>: <title>` (PR mode) or `<paths>` (file/directory mode)
-   - Table with status (`Fixed` or `Needs Manual Fix`), file, category, and note
-
-Template:
-
-```
-| Status | File | Category | Note |
-|--------|------|----------|------|
-| Fixed | Foo:42 | Style | Added class-level deprecation suppression for stable/old semconv dual mode |
-| Needs Manual Fix | Bar:77 | API | Requires compatibility decision before rename |
-```
-
-If no findings:
-> `No issues found.`
+6. Produce the final output in the format requested by the caller.
 
-When writing the summary to a file (as opposed to printing to the console), the output
-must be **only** the findings table — nothing else:
+The caller must define the final output format or schema. Follow that request exactly:
 
-- Do **not** include headings (`##`), horizontal rules, or "Fix Review Summary" titles.
-- Do **not** include a "Files reviewed" table, per-file checklist, or notes section
-  when there are zero findings. Write only `No issues found.`
-- Do **not** repeat the module path or scope description — the caller already knows it.
-- Do **not** include a totals/summary line (e.g. "Fixed: X · Needs manual fix: Y").
-- The file must contain **only** the table rows (or `No issues found.`).
-  No preamble, no footer, no commentary.
+- Do **not** add headings, commentary, or fallback prose unless the caller asks for them.
+- Preserve the recorded per-change reasons in whatever output format the caller requested.
 
 ## Knowledge Loading
 
@@ -393,7 +409,6 @@ Always load:
 
 Load other knowledge files only when their scope trigger applies.
 Use the **Knowledge File** column in the checklist table.
-Use the **Knowledge File** column below.
 
 ## Review Checklist and Core Rules
 
 
@@ -0,0 +1,117 @@
+#!/usr/bin/env python3
+"""Extract the final assistant message from review CLI JSONL output."""
+
+from __future__ import annotations
+
+import argparse
+import json
+from pathlib import Path
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input", required=True)
+    parser.add_argument("--output", required=True)
+    parser.add_argument("--final-message-output")
+    return parser.parse_args()
+
+
+def collapse(value: str, limit: int = 400) -> str:
+    collapsed = " ".join(value.split())
+    if len(collapsed) <= limit:
+        return collapsed
+    return collapsed[: limit - 3] + "..."
+
+
+def strip_json_fence(value: str) -> str:
+    stripped = value.strip()
+    if not stripped.startswith("```"):
+        return stripped
+
+    lines = stripped.splitlines()
+    if len(lines) < 3:
+        return stripped
+    if not lines[-1].strip().startswith("```"):
+        return stripped
+
+    opening = lines[0].strip()
+    if opening not in {"```", "```json"}:
+        return stripped
+
+    return "\n".join(lines[1:-1]).strip()
+
+
+def extract_final_message(path: Path) -> str:
+    if not path.exists():
+        raise ValueError(f"Review output file is missing: {path}")
+
+    final_message: str | None = None
+
+    for line_number, raw_line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1):
+        line = raw_line.strip()
+        if not line:
+            continue
+        try:
+            event = json.loads(line)
+        except json.JSONDecodeError as exc:
+            raise ValueError(f"Invalid JSONL from review output on line {line_number}: {exc}") from exc
+
+        if event.get("type") != "assistant.message":
+            continue
+
+        data = event.get("data")
+        if not isinstance(data, dict):
+            continue
+
+        content = data.get("content")
+        if not isinstance(content, str):
+            continue
+
+        final_message = content
+
+    if final_message is None:
+        raise ValueError(
+            "Review output did not contain an assistant.message event. "
+            "The agent may not have produced a final response."
+        )
+
+    if not final_message.strip():
+        raise ValueError("Final assistant message was empty")
+
+    return final_message.strip()
+
+
+def validate_report_json(report: str) -> dict[str, object]:
+    normalized = strip_json_fence(report)
+
+    try:
+        parsed = json.loads(normalized)
+    except json.JSONDecodeError as exc:
+        preview = collapse(normalized)
+        raise ValueError(
+            "Final assistant message was not valid JSON. "
+            f"Preview: {preview}"
+        ) from exc
+
+    if not isinstance(parsed, dict):
+        raise ValueError(
+            "Final assistant message was valid JSON but not a JSON object. "
+            f"Got {type(parsed).__name__}."
+        )
+
+    return parsed
+
+
+def main() -> None:
+    args = parse_args()
+    report = extract_final_message(Path(args.input))
+
+    if args.final_message_output:
+        Path(args.final_message_output).write_text(report + "\n", encoding="utf-8")
+
+    parsed = validate_report_json(report)
+    Path(args.output).write_text(json.dumps(parsed, indent=2) + "\n", encoding="utf-8")
+
+
+if __name__ == "__main__":
+    main()