feat: write validate skill output directly to files instead of terminal

Bharatram-altimate-ai · Bharatram-altimate-ai · commit 4c0538ca1f80 · 2026-03-11T12:29:10.000+05:30
- Step 4 now writes per-trace results directly to `trace_N_&lt;id&gt;.md` via Write tool instead of printing to terminal first
- Step 5 now writes summary directly to `SUMMARY.md` via Write tool instead of printing to terminal first
- After each file is written, Claude notifies the user with the file path
- Removed Step 6 (log summary append) as it is no longer needed
diff --git a/packages/opencode/src/skill/validate/SKILL.md b/packages/opencode/src/skill/validate/SKILL.md
@@ -130,15 +130,16 @@ When doing this task, first generate a sequence of steps as a plan and execute s
 
 ---
 
-### Step 4: Present Per-Trace Results
+### Step 4: Write Per-Trace Results to File
 
-For EACH trace, present the results in the following format:
+For EACH trace, write the results **directly to a markdown file** inside the report directory. Do NOT print the full trace details to the terminal. Read `report_dir` from the batch_validate.py JSON output. Use the trace index (1-based) and first 12 characters of the trace ID for the filename.
 
----
+The file content must follow this format:
 
+```
 ## Trace: `<trace_id>`
 
-### Criteria Summary Table in markdown table
+### Criteria Summary Table
 
 | Criteria | Status | Score |
 |---|---|---|
@@ -150,7 +151,7 @@ For EACH trace, present the results in the following format:
 
 P.S. **Consider 'RIGHT NODE' as 'SUCCESS' and 'WRONG NODE' as 'FAILURE' IF PRESENT.**
 
-### Per-Criteria Node Results in markdown table
+### Per-Criteria Node Results
 
 For **Validity**, **Coherence**, and **Utility**, show a node-level breakdown table:
 
@@ -162,57 +163,60 @@ For **Validity**, **Coherence**, and **Utility**, show a node-level breakdown ta
 
 #### Groundedness
 
-Generate a summary of the generated groundedness response detailing strengths and weaknesses.
+<summary of groundedness response detailing strengths and weaknesses>
 
-Now display **ALL the claims in the **markdown table format** with these columns**:
+ALL claims table:
 
-| # | Source Tool | Source Data| Input Data                                | Claim Text                               | Claimed                      | Input | Conversion Statement | Calculated | Error | Status | Reason |
-|---|---|-----------------------|-------------------------------------------|------------------------------------------|------------------------------|---|---|---|---|---|---|
-| <claim_id> | <source tool id> | <claim_text>          | <source_data>| <input_data>| <claimed_value> <claim_unit> | <input data> | <input to claim conversion statement> | <Calculated claim> <claim_unit> | <Error in claim as %> | SUCCESS/FAILURE | <reason> |
+| # | Source Tool | Source Data | Input Data | Claim Text | Claimed | Input | Conversion Statement | Calculated | Error | Status | Reason |
+|---|---|---|---|---|---|---|---|---|---|---|---|
+| <claim_id> | <source tool id> | <claim_text> | <source_data> | <input_data> | <claimed_value> <claim_unit> | <input data> | <input to claim conversion statement> | <Calculated claim> <claim_unit> | <Error in claim as %> | SUCCESS/FAILURE | <reason> |
 
-Then show a separate **Failed Claims Summary in markdown table format** with only the failed claims:
+Failed Claims Summary (only failed claims):
 
-| # | Claim | Claimed | Source Tool ID   | Actual Text   | Actual Data  | Error | Root Cause  |
-|---|---|---|------------------|---------------|--------------|---|-------------|
+| # | Claim | Claimed | Source Tool ID | Actual Text | Actual Data | Error | Root Cause |
+|---|---|---|---|---|---|---|---|
 | <claim_id> | <claim_text> | <claimed_value> | <source_tool_id> | <source_data> | <Input data> | <error %> | <reasoning> |
 
 REMEMBER to generate each value COMPLETELY. DO NOT TRUNCATE.
 
 #### Validity
-Generate a summary of the generated validity response detailing strengths and weaknesses.
+<summary detailing strengths and weaknesses>
 
 #### Coherence
-Generate a summary of the generated coherence response detailing strengths and weaknesses.
+<summary detailing strengths and weaknesses>
 
 #### Utility
-Generate a summary of the generated utility response detailing strengths and weaknesses.
+<summary detailing strengths and weaknesses>
 
 #### Tool Validation
-Generate a summary of the generated tool validation response detailing strengths and weaknesses.
+<summary detailing strengths and weaknesses>
 
-Now display all the tool details in markdown table format:
+All tool details:
 
 | # | Tool Name | Tool Status |
 |---|---|---|
 | <id> | <tool name> | <tool status> |
+```
 
-REMEMBER to generate each value completely. NO TRUNCATION.
-
-After presenting each trace result, write it to a markdown file inside the report directory. Read `report_dir` from the batch_validate.py JSON output. Use the trace index (1-based) and first 12 characters of the trace ID for the filename:
+Write the content using the Write tool to `<report_dir>/trace_<N>_<first_12_chars_of_id>.md`.
 
-```bash
-cat > "<report_dir>/trace_<N>_<first_12_chars_of_id>.md" <<'TRACE_EOF'
-<full per-trace result output from above>
-TRACE_EOF
-```
+After writing each file, tell the user:
+> Trace `<trace_id>` result written to `<report_dir>/trace_<N>_<first_12_chars_of_id>.md`
 
 ---
 
-### Step 5: Cross-Trace Comprehensive Summary (for all evaluations)
+### Step 5: Write Cross-Trace Comprehensive Summary to File
 
-After presenting all individual trace results, generate a comprehensive summary:
+After processing all individual traces, write a comprehensive summary **directly to `<report_dir>/SUMMARY.md`** using the Write tool. Do NOT print the full summary to the terminal.
 
-#### Overall Score Summary in markdown table format
+The file content must follow this format:
+
+```
+## Validation Summary
+
+Use the scores AFTER semantic matching corrections from Step 2, and reasons AFTER semantic reason generation from Step 3.
+
+### Overall Score Summary
 
 | Criteria | Average Score | Min | Max | Traces Evaluated |
 |---|---|---|---|---|
@@ -222,41 +226,19 @@ After presenting all individual trace results, generate a comprehensive summary:
 | **Utility** | <avg>/5 | <min>/5 | <max>/5 | <count> |
 | **Tool Validation** | <avg>/5 | <min>/5 | <max>/5 | <count> |
 
-Use the scores AFTER semantic matching corrections from Step 2, and reasons AFTER semantic reason generation from Step 3.
-
-#### Per-Trace Score Breakdown in markdown table format
+### Per-Trace Score Breakdown
 
 | Trace ID | Groundedness | Validity | Coherence | Utility | Tool Validation |
 |---|---|---|---|---|---|
 | <id> | <score>/5 | <score>/5 | <score>/5 | <score>/5 | <score>/5 |
 
-#### Category-Wise Analysis
+### Category-Wise Analysis
 
-For EACH category, provide:
+For EACH category:
 - **Common Strengths**: Patterns of success observed across traces
 - **Common Weaknesses**: Recurring issues found across traces
 - **Recommendations**: Actionable improvements based on the analysis
-
-After generating the overall summary, write it to `SUMMARY.md` inside the report directory:
-
-```bash
-cat > "<report_dir>/SUMMARY.md" <<'SUMMARY_MD_EOF'
-<full cross-trace summary output from above>
-SUMMARY_MD_EOF
-```
-
----
-
-### Step 6: Log Summary to File
-
-Append the comprehensive summary (with semantic matching corrections and semantic reasons) to the log file. Read the `log_file` path from the batch_validate.py output and append:
-
-```bash
-cat >> <log_file_path> <<'SUMMARY_EOF'
-
-=== COMPREHENSIVE SUMMARY (with semantic matching corrections and semantic reasons) ===
-<paste the cross-trace summary and per-trace corrected scores here>
-SUMMARY_EOF
 ```
 
-This ensures the log file contains both the raw API results and the post-processed summary with semantic matching corrections and semantic reasons.
+After writing the file, tell the user:
+> Summary written to `<report_dir>/SUMMARY.md`