Skip to content

Commit 9c49a32

Browse files
author
catlog22
committed
feat: enhance workflow-tune skill with test requirements and analysis improvements
1 parent d843112 commit 9c49a32

3 files changed

Lines changed: 102 additions & 27 deletions

File tree

.claude/skills/workflow-tune/SKILL.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,13 @@ Phase 1 Detail:
3535

3636
## Key Design Principles
3737

38-
1. **Step-by-Step Execution**: Each workflow step executes independently, artifacts inspected before proceeding
39-
2. **Resume-Based Analysis**: Uses ccw cli `--resume` to maintain analysis context across steps
40-
3. **Process Documentation**: Running `process-log.md` accumulates observations per step
41-
4. **Two-Tool Pipeline**: Claude/target tool (execute) + Gemini (analyze) = complementary perspectives
42-
5. **Pure Orchestrator**: SKILL.md coordinates only — execution detail in phase files
43-
6. **Progressive Phase Loading**: Phase docs read only when that phase executes
38+
1. **Test-First Evaluation**: Auto-generate per-step acceptance criteria before execution — judge results against concrete requirements, not vague quality
39+
2. **Step-by-Step Execution**: Each workflow step executes independently, artifacts inspected before proceeding
40+
3. **Resume-Based Analysis**: Uses ccw cli `--resume` to maintain analysis context across steps
41+
4. **Process Documentation**: Running `process-log.md` accumulates observations per step
42+
5. **Two-Tool Pipeline**: Claude/target tool (execute) + Gemini (analyze) = complementary perspectives
43+
6. **Pure Orchestrator**: SKILL.md coordinates only — execution detail in phase files
44+
7. **Progressive Phase Loading**: Phase docs read only when that phase executes
4445

4546
## Interactive Preference Collection
4647

@@ -280,11 +281,12 @@ Read and execute: `Ref: phases/01-setup.md`
280281
- Parse workflow steps from input (Format 1-3: direct parse, Format 4: semantic decomposition)
281282
- Generate Command Document (formatted execution plan)
282283
- **User Confirmation**: Display plan, wait for confirm/edit/cancel
284+
- **Generate Test Requirements**: Auto-create per-step acceptance criteria via Gemini (expected outputs, content signals, quality thresholds, pass/fail criteria, handoff contracts)
283285
- Create workspace at `.workflow/.scratchpad/workflow-tune-{ts}/`
284-
- Initialize workflow-state.json
286+
- Initialize workflow-state.json (with test_requirements per step)
285287
- Create process-log.md template
286288

287-
Output: `workDir`, `steps[]`, `workflowContext`, `commandDoc`, initialized state
289+
Output: `workDir`, `steps[]` (with test_requirements), `workflowContext`, `commandDoc`, initialized state
288290

289291
### Step Loop (Phase 2 + Phase 3, per step)
290292

@@ -331,9 +333,10 @@ Read and execute: `Ref: phases/02-step-execute.md`
331333
Read and execute: `Ref: phases/03-step-analyze.md`
332334

333335
- Inspect step artifacts (file list, content summary, quality signals)
334-
- Build analysis prompt with step context + previous steps' process log
336+
- **Compare actual output against test requirements** (pass_criteria, content_signals, fail_signals, handoff_contract)
337+
- Build analysis prompt with step context + test requirements + previous process log
335338
- Execute: `ccw cli --tool gemini --mode analysis [--resume sessionId]`
336-
- Parse analysis → write step-{N}-analysis.md
339+
- Parse analysis → write step-{N}-analysis.md (with requirement match: PASS/FAIL)
337340
- Append findings to process-log.md
338341
- Return analysis session ID for resume chain
339342

@@ -380,7 +383,9 @@ Phase 1: Setup
380383
381384
User Confirmation (Execute / Edit / Cancel)
382385
↓ (Execute confirmed)
383-
↓ workDir, steps[], workflow-state.json, process-log.md
386+
387+
Generate Test Requirements (Gemini) → per-step acceptance criteria
388+
↓ workDir, steps[] (with test_requirements), workflow-state.json, process-log.md
384389
385390
┌─→ Phase 2: Execute Step N (ccw cli / Skill)
386391
│ ↓ step-N/ artifacts

.claude/skills/workflow-tune/phases/03-step-analyze.md

Lines changed: 78 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,30 @@ const depthInstructions = {
103103
deep: 'Provide exhaustive assessment. Cover: execution quality, output completeness and correctness, artifact quality and structure, step-to-step handoff integrity, error handling, performance signals, architecture implications, edge cases.'
104104
};
105105
106-
const analysisPrompt = `PURPOSE: Analyze the output of workflow step "${step.name}" (step ${stepIdx + 1}/${state.steps.length}) to assess quality, identify issues, and evaluate handoff readiness for the next step.
106+
// ★ Build test requirements section (the evaluation baseline)
107+
const testReqs = step.test_requirements;
108+
let testReqSection = '';
109+
if (testReqs) {
110+
testReqSection = `
111+
TEST REQUIREMENTS (Acceptance Criteria — use these as the PRIMARY evaluation baseline):
112+
Pass Criteria: ${testReqs.pass_criteria}
113+
Expected Outputs: ${(testReqs.expected_outputs || []).join(', ') || 'not specified'}
114+
Content Signals (patterns that indicate success): ${(testReqs.content_signals || []).join(', ') || 'not specified'}
115+
Quality Thresholds: ${(testReqs.quality_thresholds || []).join(', ') || 'not specified'}
116+
Fail Signals (patterns that indicate failure): ${(testReqs.fail_signals || []).join(', ') || 'not specified'}
117+
Handoff Contract (what next step needs): ${testReqs.handoff_contract || 'not specified'}
118+
119+
IMPORTANT: Score quality_score based on how well the actual output matches these test requirements.
120+
- 90-100: All pass_criteria met, all expected_outputs present, content_signals found, no fail_signals
121+
- 70-89: Most criteria met, minor gaps
122+
- 50-69: Partial match, significant gaps
123+
- 0-49: Fail — fail_signals present or pass_criteria not met`;
124+
} else {
125+
testReqSection = `
126+
NOTE: No pre-generated test requirements for this step. Evaluate based on general quality signals and workflow context.`;
127+
}
128+
129+
const analysisPrompt = `PURPOSE: Evaluate workflow step "${step.name}" (step ${stepIdx + 1}/${state.steps.length}) against its acceptance criteria. Judge whether the command execution met the pre-defined test requirements.
107130
108131
WORKFLOW CONTEXT:
109132
Name: ${state.workflow_name}
@@ -116,6 +139,7 @@ Name: ${step.name}
116139
Type: ${step.type}
117140
Command: ${step.command}
118141
${step.success_criteria ? `Success Criteria: ${step.success_criteria}` : ''}
142+
${testReqSection}
119143
120144
EXECUTION RESULT:
121145
${execSummary}
@@ -129,15 +153,25 @@ ANALYSIS DEPTH: ${state.analysis_depth}
129153
${depthInstructions[state.analysis_depth]}
130154
131155
TASK:
132-
1. Assess step execution quality (did it succeed? complete output?)
133-
2. Evaluate artifact quality (content correctness, completeness, format)
134-
3. Check handoff readiness (can the next step consume this output?)
135-
4. Identify issues, risks, or optimization opportunities
136-
5. Rate overall step quality 0-100
156+
1. **Requirement Matching**: Compare actual output against test requirements (pass_criteria, expected_outputs, content_signals)
157+
2. **Fail Signal Detection**: Check for any fail_signals in the output
158+
3. **Handoff Contract Verification**: Does the output satisfy handoff_contract for the next step?
159+
4. **Gap Analysis**: What's missing between actual output and requirements?
160+
5. **Quality Score**: Rate 0-100 based on requirement fulfillment (NOT general quality)
137161
138162
EXPECTED OUTPUT (strict JSON, no markdown):
139163
{
140164
"quality_score": <0-100>,
165+
"requirement_match": {
166+
"pass": <true|false>,
167+
"criteria_met": ["<which pass_criteria were satisfied>"],
168+
"criteria_missed": ["<which pass_criteria were NOT satisfied>"],
169+
"expected_outputs_found": ["<expected files that exist>"],
170+
"expected_outputs_missing": ["<expected files that are absent>"],
171+
"content_signals_found": ["<success patterns detected in output>"],
172+
"content_signals_missing": ["<success patterns NOT found>"],
173+
"fail_signals_detected": ["<failure patterns found, if any>"]
174+
},
141175
"execution_assessment": {
142176
"success": <true|false>,
143177
"completeness": "<complete|partial|failed>",
@@ -151,6 +185,7 @@ EXPECTED OUTPUT (strict JSON, no markdown):
151185
},
152186
"handoff_assessment": {
153187
"ready": <true|false>,
188+
"contract_satisfied": <true|false|null>,
154189
"next_step_compatible": <true|false|null>,
155190
"handoff_notes": "<what next step should know>"
156191
},
@@ -163,7 +198,7 @@ EXPECTED OUTPUT (strict JSON, no markdown):
163198
"step_summary": "<1-2 sentence summary for process log>"
164199
}
165200
166-
CONSTRAINTS: Be specific, reference artifact content where possible, output ONLY JSON`;
201+
CONSTRAINTS: Be specific, reference artifact content where possible, score against requirements not general quality, output ONLY JSON`;
167202
```
168203
169204
### Step 3.4: Execute via ccw cli Gemini with Resume
@@ -231,11 +266,36 @@ if (jsonMatch) {
231266
}
232267
233268
// Write step analysis file
269+
const reqMatch = analysis.requirement_match;
270+
const reqMatchSection = reqMatch ? `
271+
## Requirement Match — ${reqMatch.pass ? 'PASS ✓' : 'FAIL ✗'}
272+
273+
### Criteria Met
274+
${(reqMatch.criteria_met || []).map(c => `- ✓ ${c}`).join('\n') || '- None'}
275+
276+
### Criteria Missed
277+
${(reqMatch.criteria_missed || []).map(c => `- ✗ ${c}`).join('\n') || '- None'}
278+
279+
### Expected Outputs
280+
- Found: ${(reqMatch.expected_outputs_found || []).join(', ') || 'None'}
281+
- Missing: ${(reqMatch.expected_outputs_missing || []).join(', ') || 'None'}
282+
283+
### Content Signals
284+
- Detected: ${(reqMatch.content_signals_found || []).join(', ') || 'None'}
285+
- Missing: ${(reqMatch.content_signals_missing || []).join(', ') || 'None'}
286+
287+
### Fail Signals
288+
${(reqMatch.fail_signals_detected || []).length > 0
289+
? (reqMatch.fail_signals_detected || []).map(f => `- ⚠ ${f}`).join('\n')
290+
: '- None detected'}
291+
` : '';
292+
234293
const stepAnalysisReport = `# Step ${stepIdx + 1} Analysis: ${step.name}
235294
236295
**Quality Score**: ${analysis.quality_score}/100
296+
**Requirement Match**: ${reqMatch ? (reqMatch.pass ? 'PASS' : 'FAIL') : 'N/A (no test requirements)'}
237297
**Date**: ${new Date().toISOString()}
238-
298+
${reqMatchSection}
239299
## Execution
240300
- Success: ${analysis.execution_assessment?.success}
241301
- Completeness: ${analysis.execution_assessment?.completeness}
@@ -249,6 +309,7 @@ const stepAnalysisReport = `# Step ${stepIdx + 1} Analysis: ${step.name}
249309
250310
## Handoff Readiness
251311
- Ready: ${analysis.handoff_assessment?.ready}
312+
- Contract Satisfied: ${analysis.handoff_assessment?.contract_satisfied}
252313
- Next Step Compatible: ${analysis.handoff_assessment?.next_step_compatible}
253314
- Notes: ${analysis.handoff_assessment?.handoff_notes}
254315
@@ -262,14 +323,16 @@ ${(analysis.optimization_opportunities || []).map(o => `- [${o.impact}] ${o.area
262323
Write(`${stepDir}/step-${stepIdx + 1}-analysis.md`, stepAnalysisReport);
263324
264325
// Append to process log
326+
const reqPassStr = reqMatch ? (reqMatch.pass ? 'PASS' : 'FAIL') : 'N/A';
265327
const processLogEntry = `
266-
## Step ${stepIdx + 1}: ${step.name} — Score: ${analysis.quality_score}/100
328+
## Step ${stepIdx + 1}: ${step.name} — Score: ${analysis.quality_score}/100 | Req: ${reqPassStr}
267329
268330
**Command**: \`${step.command}\`
269331
**Result**: ${analysis.execution_assessment?.completeness || 'unknown'} | ${analysis.artifact_assessment?.count || 0} artifacts
332+
**Requirement Match**: ${reqPassStr}${reqMatch ? ` — Met: ${(reqMatch.criteria_met || []).length}, Missed: ${(reqMatch.criteria_missed || []).length}, Fail Signals: ${(reqMatch.fail_signals_detected || []).length}` : ''}
270333
**Summary**: ${analysis.step_summary || 'No summary'}
271334
**Issues**: ${(analysis.issues || []).filter(i => i.severity === 'high').map(i => i.description).join('; ') || 'None critical'}
272-
**Handoff**: ${analysis.handoff_assessment?.handoff_notes || 'Ready'}
335+
**Handoff**: ${analysis.handoff_assessment?.contract_satisfied ? 'Contract satisfied' : analysis.handoff_assessment?.handoff_notes || 'Ready'}
273336
274337
---
275338
`;
@@ -281,8 +344,13 @@ Write(`${state.work_dir}/process-log.md`, currentLog + processLogEntry);
281344
// Update state
282345
state.steps[stepIdx].analysis = {
283346
quality_score: analysis.quality_score,
347+
requirement_pass: reqMatch?.pass ?? null,
348+
criteria_met_count: (reqMatch?.criteria_met || []).length,
349+
criteria_missed_count: (reqMatch?.criteria_missed || []).length,
350+
fail_signals_count: (reqMatch?.fail_signals_detected || []).length,
284351
key_outputs: analysis.artifact_assessment?.key_outputs || [],
285352
handoff_notes: analysis.handoff_assessment?.handoff_notes || '',
353+
contract_satisfied: analysis.handoff_assessment?.contract_satisfied ?? null,
286354
issue_count: (analysis.issues || []).length,
287355
high_issues: (analysis.issues || []).filter(i => i.severity === 'high').length,
288356
optimization_count: (analysis.optimization_opportunities || []).length,

.claude/skills/workflow-tune/phases/05-optimize-report.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,12 @@ const minStep = state.steps.reduce((min, s) =>
4040
const totalIssues = state.steps.reduce((sum, s) => sum + (s.analysis?.issue_count || 0), 0);
4141
const totalHighIssues = state.steps.reduce((sum, s) => sum + (s.analysis?.high_issues || 0), 0);
4242

43-
// Step quality table
44-
const stepTable = state.steps.map((s, i) =>
45-
`| ${i + 1} | ${s.name} | ${s.type} | ${s.execution?.success ? 'OK' : 'FAIL'} | ${s.analysis?.quality_score || '-'} | ${s.analysis?.issue_count || 0} | ${s.analysis?.high_issues || 0} |`
46-
).join('\n');
43+
// Step quality table (with requirement match)
44+
const stepTable = state.steps.map((s, i) => {
45+
const reqPass = s.analysis?.requirement_pass;
46+
const reqStr = reqPass === true ? 'PASS' : reqPass === false ? 'FAIL' : '-';
47+
return `| ${i + 1} | ${s.name} | ${s.type} | ${s.execution?.success ? 'OK' : 'FAIL'} | ${reqStr} | ${s.analysis?.quality_score || '-'} | ${s.analysis?.issue_count || 0} | ${s.analysis?.high_issues || 0} |`;
48+
}).join('\n');
4749
4850
// Collect all improvements (workflow-level + per-step)
4951
const allImprovements = [];
@@ -97,8 +99,8 @@ const report = `# Workflow Tune — Final Report
9799
98100
## Step Quality Matrix
99101
100-
| # | Step | Type | Exec | Quality | Issues | High |
101-
|---|------|------|------|---------|--------|------|
102+
| # | Step | Type | Exec | Req Match | Quality | Issues | High |
103+
|---|------|------|------|-----------|---------|--------|------|
102104
${stepTable}
103105
104106
## Workflow Flow Assessment

0 commit comments

Comments
 (0)