Skip to content

Commit 0c75203

Browse files
committed
feat(skills): add validation and chain-of-thought to quality-scan
- Add structured output validation with bash checks - Add explicit <thinking> tag requirements to agent prompts - Improve reliability and agent reasoning quality - Follow Phase 2 enhancement patterns from socket-btm
1 parent eca914a commit 0c75203

2 files changed

Lines changed: 257 additions & 30 deletions

File tree

.claude/skills/quality-scan/SKILL.md

Lines changed: 132 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
name: quality-scan
3-
description: Performs comprehensive quality scans across codebase to identify critical bugs, logic errors, caching issues, and workflow problems. Spawns specialized agents for targeted analysis and generates prioritized improvement tasks. Use when improving code quality, before releases, or investigating issues.
3+
description: Cleans up junk files (SCREAMING_TEXT.md, temp files) and performs comprehensive quality scans across codebase to identify critical bugs, logic errors, caching issues, and workflow problems. Spawns specialized agents for targeted analysis and generates prioritized improvement tasks. Use when improving code quality, before releases, or investigating issues.
44
---
55

66
# quality-scan
77

88
<task>
9-
Your task is to perform comprehensive quality scans across the socket-btm codebase using specialized agents to identify critical bugs, logic errors, caching issues, and workflow problems. Generate a prioritized report with actionable improvement tasks.
9+
Your task is to perform comprehensive quality scans across the socket-btm codebase using specialized agents to identify critical bugs, logic errors, caching issues, and workflow problems. Before scanning, clean up junk files (SCREAMING_TEXT.md files, temporary test files, etc.) to ensure a clean and organized repository. Generate a prioritized report with actionable improvement tasks.
1010
</task>
1111

1212
<context>
@@ -35,6 +35,7 @@ This is Socket Security's binary tooling manager (BTM) that:
3535
- Improves code quality systematically
3636
- Provides actionable fixes with file:line references
3737
- Prioritizes issues by severity for efficient remediation
38+
- Cleans up junk files for a well-organized repository
3839

3940
**Agent Prompts:**
4041
All agent prompts are embedded in `reference.md` with structured <context>, <instructions>, <pattern>, and <output_format> tags following Claude best practices.
@@ -92,7 +93,82 @@ git status
9293

9394
---
9495

95-
### Phase 2: Determine Scan Scope
96+
### Phase 2: Repository Cleanup
97+
98+
<action>
99+
Clean up junk files and organize the repository before scanning:
100+
</action>
101+
102+
**Cleanup Tasks:**
103+
104+
1. **Remove SCREAMING_TEXT.md files** (all-caps .md files) that are NOT:
105+
- Inside `.claude/` directory
106+
- Inside `docs/` directory
107+
- Named `README.md`, `LICENSE`, or `SECURITY.md`
108+
109+
2. **Remove temporary test files** in wrong locations:
110+
- `.test.mjs` or `.test.mts` files outside `test/` or `__tests__/` directories
111+
- Temp files: `*.tmp`, `*.temp`, `.DS_Store`, `Thumbs.db`
112+
- Editor backups: `*~`, `*.swp`, `*.swo`, `*.bak`
113+
- Test artifacts: `*.log` files in root or package directories (not logs/)
114+
115+
```bash
116+
# Find SCREAMING_TEXT.md files (all caps with .md extension)
117+
find . -type f -name '*.md' \
118+
! -path './.claude/*' \
119+
! -path './docs/*' \
120+
! -name 'README.md' \
121+
! -name 'LICENSE' \
122+
! -name 'SECURITY.md' \
123+
| grep -E '/[A-Z_]+\.md$'
124+
125+
# Find test files in wrong locations
126+
find . -type f \( -name '*.test.mjs' -o -name '*.test.mts' \) \
127+
! -path '*/test/*' \
128+
! -path '*/__tests__/*' \
129+
! -path '*/node_modules/*'
130+
131+
# Find temp files
132+
find . -type f \( \
133+
-name '*.tmp' -o \
134+
-name '*.temp' -o \
135+
-name '.DS_Store' -o \
136+
-name 'Thumbs.db' -o \
137+
-name '*~' -o \
138+
-name '*.swp' -o \
139+
-name '*.swo' -o \
140+
-name '*.bak' \
141+
\) ! -path '*/node_modules/*'
142+
143+
# Find log files in wrong places (not in logs/ or build/ directories)
144+
find . -type f -name '*.log' \
145+
! -path '*/logs/*' \
146+
! -path '*/build/*' \
147+
! -path '*/node_modules/*' \
148+
! -path '*/.git/*'
149+
```
150+
151+
<validation>
152+
**For each file found:**
153+
1. Show the file path to user
154+
2. Explain why it's considered junk
155+
3. Ask user for confirmation before deleting (use AskUserQuestion)
156+
4. Delete confirmed files: `git rm` if tracked, `rm` if untracked
157+
5. Report files removed
158+
159+
**If no junk files found:**
160+
- Report: "✓ Repository is clean - no junk files found"
161+
162+
**Important:**
163+
- Always get user confirmation before deleting
164+
- Show file contents if user is unsure
165+
- Track deleted files for reporting
166+
167+
</validation>
168+
169+
---
170+
171+
### Phase 3: Determine Scan Scope
96172

97173
<action>
98174
Ask user which scans to run:
@@ -133,7 +209,7 @@ If user requests non-existent scan type, report error and suggest valid types.
133209

134210
---
135211

136-
### Phase 3: Execute Scans
212+
### Phase 4: Execute Scans
137213

138214
<action>
139215
For each enabled scan type, spawn a specialized agent using Task tool:
@@ -183,16 +259,56 @@ Scan systematically and report all findings. If no issues found, state that expl
183259
- Documentation scan: reference.md starting at line ~810
184260

185261
<validation>
186-
For each scan completion:
262+
**Structured Output Validation:**
263+
264+
After each agent returns, validate output structure before parsing:
265+
266+
```bash
267+
# 1. Verify agent completed successfully
268+
if [ -z "$AGENT_OUTPUT" ]; then
269+
echo "ERROR: Agent returned no output"
270+
exit 1
271+
fi
272+
273+
# 2. Check for findings or clean report
274+
if ! echo "$AGENT_OUTPUT" | grep -qE '(File:.*Issue:|No .* issues found|✓ Clean)'; then
275+
echo "WARNING: Agent output missing expected format"
276+
echo "Agent may have encountered an error or found no issues"
277+
fi
278+
279+
# 3. Verify severity levels if findings exist
280+
if echo "$AGENT_OUTPUT" | grep -q "File:"; then
281+
if ! echo "$AGENT_OUTPUT" | grep -qE 'Severity: (Critical|High|Medium|Low)'; then
282+
echo "WARNING: Findings missing severity classification"
283+
fi
284+
fi
285+
286+
# 4. Verify fix suggestions if findings exist
287+
if echo "$AGENT_OUTPUT" | grep -q "File:"; then
288+
if ! echo "$AGENT_OUTPUT" | grep -q "Fix:"; then
289+
echo "WARNING: Findings missing suggested fixes"
290+
fi
291+
fi
292+
```
293+
294+
**Manual Verification Checklist:**
295+
- [ ] Agent output includes findings OR explicit "No issues found" statement
296+
- [ ] All findings include file:line references
297+
- [ ] All findings include severity level (Critical/High/Medium/Low)
298+
- [ ] All findings include suggested fixes
299+
- [ ] Agent output is parseable and structured
300+
301+
**For each scan completion:**
187302
- Verify agent completed without errors
188-
- Extract findings from agent output
303+
- Extract findings from agent output (or confirm "No issues found")
189304
- Parse into structured format (file, issue, severity, fix)
190305
- Track scan coverage (files analyzed)
306+
- Log any validation warnings for debugging
191307
</validation>
192308

193309
---
194310

195-
### Phase 4: Aggregate Findings
311+
### Phase 5: Aggregate Findings
196312

197313
<action>
198314
Collect all findings from agents and aggregate:
@@ -231,7 +347,7 @@ interface Finding {
231347

232348
---
233349

234-
### Phase 5: Generate Report
350+
### Phase 6: Generate Report
235351

236352
<action>
237353
Create structured quality report with all findings:
@@ -304,7 +420,7 @@ Create structured quality report with all findings:
304420

305421
---
306422

307-
### Phase 6: Complete
423+
### Phase 7: Complete
308424

309425
<completion_signal>
310426
```xml
@@ -317,12 +433,19 @@ Report these final metrics to the user:
317433

318434
**Quality Scan Complete**
319435
========================
436+
✓ Repository cleanup: N junk files removed
320437
✓ Scans completed: [list of scan types]
321438
✓ Total findings: N (N critical, N high, N medium, N low)
322439
✓ Files scanned: N
323440
✓ Report generated: Yes
324441
✓ Scan duration: [calculated from start to end]
325442

443+
**Repository Cleanup Summary:**
444+
- SCREAMING_TEXT.md files removed: N
445+
- Temporary test files removed: N
446+
- Temp/backup files removed: N
447+
- Log files cleaned up: N
448+
326449
**Critical Issues Requiring Immediate Attention:**
327450
- N critical issues found
328451
- Review report above for details and fixes

.claude/skills/quality-scan/reference.md

Lines changed: 125 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -77,10 +77,31 @@ Scan all code files across the monorepo for these critical bug patterns:
7777
- Buffer operations without size checks
7878
</pattern>
7979
80-
For each bug found, think through:
81-
1. Can this actually crash in production?
82-
2. What input would trigger it?
83-
3. Is there existing safeguards I'm missing?
80+
<quality_guidelines>
81+
For each potential issue found, use explicit chain-of-thought reasoning with `<thinking>` tags:
82+
83+
<thinking>
84+
1. Can this actually crash/fail in production?
85+
- Code path analysis: [describe the execution flow]
86+
- Production scenarios: [real-world conditions]
87+
- Result: [yes/no with justification]
88+
89+
2. What input would trigger this issue?
90+
- Trigger conditions: [specific inputs/states]
91+
- Edge cases: [boundary conditions]
92+
- Likelihood: [HIGH/MEDIUM/LOW]
93+
94+
3. Are there existing safeguards I'm missing?
95+
- Defensive code: [try-catch, validation, guards]
96+
- Framework protections: [built-in safety]
97+
- Result: [SAFEGUARDED/VULNERABLE]
98+
99+
Overall assessment: [REPORT/SKIP]
100+
Decision: [If REPORT, include in findings. If SKIP, explain why it's a false positive]
101+
</thinking>
102+
103+
Only report issues that pass all three checks. Use `<thinking>` tags to show your reasoning explicitly.
104+
</quality_guidelines>
84105
</instructions>
85106
86107
<output_format>
@@ -208,10 +229,31 @@ Binary format handling errors:
208229
- Cross-platform: Windows vs Unix path separators, line endings
209230
</pattern>
210231
211-
Before reporting, think through:
212-
1. Does this logic error produce incorrect output?
213-
2. What specific input would trigger it?
214-
3. Is the error already handled elsewhere?
232+
<quality_guidelines>
233+
For each potential issue found, use explicit chain-of-thought reasoning with `<thinking>` tags:
234+
235+
<thinking>
236+
1. Can this actually crash/fail in production?
237+
- Code path analysis: [describe the execution flow]
238+
- Production scenarios: [real-world conditions]
239+
- Result: [yes/no with justification]
240+
241+
2. What input would trigger this issue?
242+
- Trigger conditions: [specific inputs/states]
243+
- Edge cases: [boundary conditions]
244+
- Likelihood: [HIGH/MEDIUM/LOW]
245+
246+
3. Are there existing safeguards I'm missing?
247+
- Defensive code: [try-catch, validation, guards]
248+
- Framework protections: [built-in safety]
249+
- Result: [SAFEGUARDED/VULNERABLE]
250+
251+
Overall assessment: [REPORT/SKIP]
252+
Decision: [If REPORT, include in findings. If SKIP, explain why it's a false positive]
253+
</thinking>
254+
255+
Only report issues that pass all three checks. Use `<thinking>` tags to show your reasoning explicitly.
256+
</quality_guidelines>
215257
</instructions>
216258
217259
<output_format>
@@ -342,10 +384,31 @@ Uncommon scenarios:
342384
- Permission changes during caching
343385
</pattern>
344386
345-
Think through each issue:
346-
1. Can this actually happen in production?
347-
2. What observable behavior results?
348-
3. How likely/severe is the impact?
387+
<quality_guidelines>
388+
For each potential issue found, use explicit chain-of-thought reasoning with `<thinking>` tags:
389+
390+
<thinking>
391+
1. Can this actually crash/fail in production?
392+
- Code path analysis: [describe the execution flow]
393+
- Production scenarios: [real-world conditions]
394+
- Result: [yes/no with justification]
395+
396+
2. What input would trigger this issue?
397+
- Trigger conditions: [specific inputs/states]
398+
- Edge cases: [boundary conditions]
399+
- Likelihood: [HIGH/MEDIUM/LOW]
400+
401+
3. Are there existing safeguards I'm missing?
402+
- Defensive code: [try-catch, validation, guards]
403+
- Framework protections: [built-in safety]
404+
- Result: [SAFEGUARDED/VULNERABLE]
405+
406+
Overall assessment: [REPORT/SKIP]
407+
Decision: [If REPORT, include in findings. If SKIP, explain why it's a false positive]
408+
</thinking>
409+
410+
Only report issues that pass all three checks. Use `<thinking>` tags to show your reasoning explicitly.
411+
</quality_guidelines>
349412
</instructions>
350413
351414
<output_format>
@@ -469,10 +532,31 @@ Documentation and setup:
469532
- First-time setup: Can a new contributor get started easily?
470533
</pattern>
471534
472-
For each issue, consider:
473-
1. Does this actually affect developers or CI?
474-
2. How often would this be encountered?
475-
3. Is there a simple fix?
535+
<quality_guidelines>
536+
For each potential issue found, use explicit chain-of-thought reasoning with `<thinking>` tags:
537+
538+
<thinking>
539+
1. Can this actually crash/fail in production?
540+
- Code path analysis: [describe the execution flow]
541+
- Production scenarios: [real-world conditions]
542+
- Result: [yes/no with justification]
543+
544+
2. What input would trigger this issue?
545+
- Trigger conditions: [specific inputs/states]
546+
- Edge cases: [boundary conditions]
547+
- Likelihood: [HIGH/MEDIUM/LOW]
548+
549+
3. Are there existing safeguards I'm missing?
550+
- Defensive code: [try-catch, validation, guards]
551+
- Framework protections: [built-in safety]
552+
- Result: [SAFEGUARDED/VULNERABLE]
553+
554+
Overall assessment: [REPORT/SKIP]
555+
Decision: [If REPORT, include in findings. If SKIP, explain why it's a false positive]
556+
</thinking>
557+
558+
Only report issues that pass all three checks. Use `<thinking>` tags to show your reasoning explicitly.
559+
</quality_guidelines>
476560
</instructions>
477561
478562
<output_format>
@@ -928,11 +1012,31 @@ Look for:
9281012
- Critical sections (75%+ of package) not mentioned
9291013
</pattern>
9301014

931-
For each issue found:
932-
1. Read the documented information
933-
2. Read the actual code/config to verify
934-
3. Determine the discrepancy
935-
4. Provide the correct information
1015+
<quality_guidelines>
1016+
For each potential issue found, use explicit chain-of-thought reasoning with `<thinking>` tags:
1017+
1018+
<thinking>
1019+
1. Can this actually crash/fail in production?
1020+
- Code path analysis: [describe the execution flow]
1021+
- Production scenarios: [real-world conditions]
1022+
- Result: [yes/no with justification]
1023+
1024+
2. What input would trigger this issue?
1025+
- Trigger conditions: [specific inputs/states]
1026+
- Edge cases: [boundary conditions]
1027+
- Likelihood: [HIGH/MEDIUM/LOW]
1028+
1029+
3. Are there existing safeguards I'm missing?
1030+
- Defensive code: [try-catch, validation, guards]
1031+
- Framework protections: [built-in safety]
1032+
- Result: [SAFEGUARDED/VULNERABLE]
1033+
1034+
Overall assessment: [REPORT/SKIP]
1035+
Decision: [If REPORT, include in findings. If SKIP, explain why it's a false positive]
1036+
</thinking>
1037+
1038+
Only report issues that pass all three checks. Use `<thinking>` tags to show your reasoning explicitly.
1039+
</quality_guidelines>
9361040
</instructions>
9371041

9381042
<output_format>

0 commit comments

Comments
 (0)