Skip to content

Commit 4c0538c

Browse files
feat: write validate skill output directly to files instead of terminal
- Step 4 now writes per-trace results directly to `trace_N_<id>.md` via Write tool instead of printing to terminal first - Step 5 now writes summary directly to `SUMMARY.md` via Write tool instead of printing to terminal first - After each file is written, Claude notifies the user with the file path - Removed Step 6 (log summary append) as it is no longer needed
1 parent a3ab5b2 commit 4c0538c

1 file changed

Lines changed: 38 additions & 56 deletions

File tree

  • packages/opencode/src/skill/validate

packages/opencode/src/skill/validate/SKILL.md

Lines changed: 38 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -130,15 +130,16 @@ When doing this task, first generate a sequence of steps as a plan and execute s
130130

131131
---
132132

133-
### Step 4: Present Per-Trace Results
133+
### Step 4: Write Per-Trace Results to File
134134

135-
For EACH trace, present the results in the following format:
135+
For EACH trace, write the results **directly to a markdown file** inside the report directory. Do NOT print the full trace details to the terminal. Read `report_dir` from the batch_validate.py JSON output. Use the trace index (1-based) and first 12 characters of the trace ID for the filename.
136136

137-
---
137+
The file content must follow this format:
138138

139+
```
139140
## Trace: `<trace_id>`
140141
141-
### Criteria Summary Table in markdown table
142+
### Criteria Summary Table
142143
143144
| Criteria | Status | Score |
144145
|---|---|---|
@@ -150,7 +151,7 @@ For EACH trace, present the results in the following format:
150151
151152
P.S. **Consider 'RIGHT NODE' as 'SUCCESS' and 'WRONG NODE' as 'FAILURE' IF PRESENT.**
152153
153-
### Per-Criteria Node Results in markdown table
154+
### Per-Criteria Node Results
154155
155156
For **Validity**, **Coherence**, and **Utility**, show a node-level breakdown table:
156157
@@ -162,57 +163,60 @@ For **Validity**, **Coherence**, and **Utility**, show a node-level breakdown ta
162163
163164
#### Groundedness
164165
165-
Generate a summary of the generated groundedness response detailing strengths and weaknesses.
166+
<summary of groundedness response detailing strengths and weaknesses>
166167
167-
Now display **ALL the claims in the **markdown table format** with these columns**:
168+
ALL claims table:
168169
169-
| # | Source Tool | Source Data| Input Data | Claim Text | Claimed | Input | Conversion Statement | Calculated | Error | Status | Reason |
170-
|---|---|-----------------------|-------------------------------------------|------------------------------------------|------------------------------|---|---|---|---|---|---|
171-
| <claim_id> | <source tool id> | <claim_text> | <source_data>| <input_data>| <claimed_value> <claim_unit> | <input data> | <input to claim conversion statement> | <Calculated claim> <claim_unit> | <Error in claim as %> | SUCCESS/FAILURE | <reason> |
170+
| # | Source Tool | Source Data | Input Data | Claim Text | Claimed | Input | Conversion Statement | Calculated | Error | Status | Reason |
171+
|---|---|---|---|---|---|---|---|---|---|---|---|
172+
| <claim_id> | <source tool id> | <claim_text> | <source_data> | <input_data> | <claimed_value> <claim_unit> | <input data> | <input to claim conversion statement> | <Calculated claim> <claim_unit> | <Error in claim as %> | SUCCESS/FAILURE | <reason> |
172173
173-
Then show a separate **Failed Claims Summary in markdown table format** with only the failed claims:
174+
Failed Claims Summary (only failed claims):
174175
175-
| # | Claim | Claimed | Source Tool ID | Actual Text | Actual Data | Error | Root Cause |
176-
|---|---|---|------------------|---------------|--------------|---|-------------|
176+
| # | Claim | Claimed | Source Tool ID | Actual Text | Actual Data | Error | Root Cause |
177+
|---|---|---|---|---|---|---|---|
177178
| <claim_id> | <claim_text> | <claimed_value> | <source_tool_id> | <source_data> | <Input data> | <error %> | <reasoning> |
178179
179180
REMEMBER to generate each value COMPLETELY. DO NOT TRUNCATE.
180181
181182
#### Validity
182-
Generate a summary of the generated validity response detailing strengths and weaknesses.
183+
<summary detailing strengths and weaknesses>
183184
184185
#### Coherence
185-
Generate a summary of the generated coherence response detailing strengths and weaknesses.
186+
<summary detailing strengths and weaknesses>
186187
187188
#### Utility
188-
Generate a summary of the generated utility response detailing strengths and weaknesses.
189+
<summary detailing strengths and weaknesses>
189190
190191
#### Tool Validation
191-
Generate a summary of the generated tool validation response detailing strengths and weaknesses.
192+
<summary detailing strengths and weaknesses>
192193
193-
Now display all the tool details in markdown table format:
194+
All tool details:
194195
195196
| # | Tool Name | Tool Status |
196197
|---|---|---|
197198
| <id> | <tool name> | <tool status> |
199+
```
198200

199-
REMEMBER to generate each value completely. NO TRUNCATION.
200-
201-
After presenting each trace result, write it to a markdown file inside the report directory. Read `report_dir` from the batch_validate.py JSON output. Use the trace index (1-based) and first 12 characters of the trace ID for the filename:
201+
Write the content using the Write tool to `<report_dir>/trace_<N>_<first_12_chars_of_id>.md`.
202202

203-
```bash
204-
cat > "<report_dir>/trace_<N>_<first_12_chars_of_id>.md" <<'TRACE_EOF'
205-
<full per-trace result output from above>
206-
TRACE_EOF
207-
```
203+
After writing each file, tell the user:
204+
> Trace `<trace_id>` result written to `<report_dir>/trace_<N>_<first_12_chars_of_id>.md`
208205
209206
---
210207

211-
### Step 5: Cross-Trace Comprehensive Summary (for all evaluations)
208+
### Step 5: Write Cross-Trace Comprehensive Summary to File
212209

213-
After presenting all individual trace results, generate a comprehensive summary:
210+
After processing all individual traces, write a comprehensive summary **directly to `<report_dir>/SUMMARY.md`** using the Write tool. Do NOT print the full summary to the terminal.
214211

215-
#### Overall Score Summary in markdown table format
212+
The file content must follow this format:
213+
214+
```
215+
## Validation Summary
216+
217+
Use the scores AFTER semantic matching corrections from Step 2, and reasons AFTER semantic reason generation from Step 3.
218+
219+
### Overall Score Summary
216220
217221
| Criteria | Average Score | Min | Max | Traces Evaluated |
218222
|---|---|---|---|---|
@@ -222,41 +226,19 @@ After presenting all individual trace results, generate a comprehensive summary:
222226
| **Utility** | <avg>/5 | <min>/5 | <max>/5 | <count> |
223227
| **Tool Validation** | <avg>/5 | <min>/5 | <max>/5 | <count> |
224228
225-
Use the scores AFTER semantic matching corrections from Step 2, and reasons AFTER semantic reason generation from Step 3.
226-
227-
#### Per-Trace Score Breakdown in markdown table format
229+
### Per-Trace Score Breakdown
228230
229231
| Trace ID | Groundedness | Validity | Coherence | Utility | Tool Validation |
230232
|---|---|---|---|---|---|
231233
| <id> | <score>/5 | <score>/5 | <score>/5 | <score>/5 | <score>/5 |
232234
233-
#### Category-Wise Analysis
235+
### Category-Wise Analysis
234236
235-
For EACH category, provide:
237+
For EACH category:
236238
- **Common Strengths**: Patterns of success observed across traces
237239
- **Common Weaknesses**: Recurring issues found across traces
238240
- **Recommendations**: Actionable improvements based on the analysis
239-
240-
After generating the overall summary, write it to `SUMMARY.md` inside the report directory:
241-
242-
```bash
243-
cat > "<report_dir>/SUMMARY.md" <<'SUMMARY_MD_EOF'
244-
<full cross-trace summary output from above>
245-
SUMMARY_MD_EOF
246-
```
247-
248-
---
249-
250-
### Step 6: Log Summary to File
251-
252-
Append the comprehensive summary (with semantic matching corrections and semantic reasons) to the log file. Read the `log_file` path from the batch_validate.py output and append:
253-
254-
```bash
255-
cat >> <log_file_path> <<'SUMMARY_EOF'
256-
257-
=== COMPREHENSIVE SUMMARY (with semantic matching corrections and semantic reasons) ===
258-
<paste the cross-trace summary and per-trace corrected scores here>
259-
SUMMARY_EOF
260241
```
261242

262-
This ensures the log file contains both the raw API results and the post-processed summary with semantic matching corrections and semantic reasons.
243+
After writing the file, tell the user:
244+
> Summary written to `<report_dir>/SUMMARY.md`

0 commit comments

Comments
 (0)