@@ -6,217 +6,40 @@ disable-model-invocation: false
66
77# Problem Validation Skill
88
9- This skill guides the validation of problem statement samples and sample files to ensure correctness before generating final test data .
9+ Use this skill to enforce sample correctness before final test generation .
1010
11- Hard rule:
12- - Validation failure is a release blocker. Do not proceed to final test generation or packaging until all sample validations pass.
11+ ## Trigger Conditions
1312
14- Output contract:
15- - ` decision ` : ` go ` / ` no_go `
16- - ` blocking_issues ` : validation blockers that must be fixed
17- - ` next_actions ` : exact re-validation steps after fixes
18-
19- ## Overview
20-
21- The ` problem_validate ` tool verifies that:
22-
23- 1 . ** Statement Samples** : The expected outputs in the problem statement (README.md) match what the solution actually produces
24- 2 . ** Sample Files** : The sample files in ` tests/ ` directory (e.g., ` 01.in ` , ` 01.ans ` ) are consistent with the solution
25-
26- This catches common errors like:
27- - Wrong expected output in problem statement
28- - Sample files that don't match the problem description
29- - Solution bugs that would fail on the provided examples
30-
31- ## When to Use
32-
33- Use this skill after:
34- - ` stress_test_run ` has passed (solution is verified correct)
35- - Before ` problem_generate_tests ` (ensures samples are correct)
36-
37- ## Validation Types
38-
39- ### 1. Statement Samples (` statement_samples ` )
40-
41- Validates samples embedded in the problem statement (README.md).
42-
43- ** Supported formats (both localized labels and English labels are accepted):**
44-
45- ``` markdown
46- ** 样例输入 1**
47- ``` text
48- 5
49- 3 -5 2 -8 4
50- ```
51-
52- ** 样例输出 1**
53- ``` text
54- 2
55- ```
56- ```
57-
58- Or standard English labels:
59-
60- ``` markdown
61- ** Sample Input 1**
62- ``` text
63- 5
64- 3 -5 2 -8 4
65- ```
66-
67- ** Sample Output 1**
68- ``` text
69- 2
70- ```
71- ```
72-
73- ### 2. Sample Files (` sample_files ` )
74-
75- Validates sample files in the ` tests/ ` directory:
76- - ` 01.in ` , ` 01.ans `
77- - ` 02.in ` , ` 02.ans `
78- - etc.
79-
80- ## Usage
81-
82- ### Basic Usage
83-
84- ``` json
85- {
86- "problem_dir" : " problems/emergency-escape"
87- }
88- ```
89-
90- Auto-extracts samples from README.md and validates all sample files.
13+ Use when:
9114
92- ### With Explicit Samples
15+ - ` stress_test_run ` has passed and the next step is sample verification;
16+ - the user updates statement samples or sample files;
17+ - workflow is blocked by ` problem_validate ` or sample mismatch.
9318
94- ``` json
95- {
96- "problem_dir" : " problems/emergency-escape" ,
97- "statement_samples" : [
98- {
99- "input" : " 5\n 3 -5 2 -8 4" ,
100- "expected_output" : " 2"
101- }
102- ]
103- }
104- ```
19+ ## Core Instructions
10520
106- ### Validate Only Statement Samples
21+ 1 . Run ` problem_validate ` against statement samples and sample files.
22+ 2 . Treat any mismatch as a release blocker.
23+ 3 . Classify failures by source: statement text, sample files, or solution behavior.
24+ 4 . Re-run validation after fixes before allowing progression.
10725
108- ``` json
109- {
110- "problem_dir" : " problems/emergency-escape" ,
111- "validate_types" : [" statement_samples" ]
112- }
113- ```
26+ ## Output Contract
11427
115- ### Validate Only Sample Files
116-
117- ``` json
118- {
119- "problem_dir" : " problems/emergency-escape" ,
120- "validate_types" : [" sample_files" ]
121- }
122- ```
123-
124- ## Output Interpretation
125-
126- ### Success
127-
128- ``` json
129- {
130- "success" : true ,
131- "data" : {
132- "statement_samples" : {
133- "validated" : true ,
134- "passed" : 2 ,
135- "failed" : 0 ,
136- "total" : 2
137- },
138- "sample_files" : {
139- "validated" : true ,
140- "passed" : 3 ,
141- "failed" : 0 ,
142- "total" : 3
143- }
144- }
145- }
146- ```
147-
148- ### Failure
149-
150- ``` json
151- {
152- "success" : false ,
153- "error" : " Validation failed" ,
154- "data" : {
155- "statement_samples" : {
156- "validated" : true ,
157- "passed" : 1 ,
158- "failed" : 1 ,
159- "total" : 2 ,
160- "details" : [
161- {
162- "index" : 1 ,
163- "input" : " 5\n 3 -5 2 -8 4" ,
164- "expected" : " 5" ,
165- "actual" : " 2" ,
166- "passed" : false
167- }
168- ]
169- }
170- }
171- }
172- ```
173-
174- ## Error Recovery
175-
176- ### If Statement Sample Fails
177-
178- 1 . Check the ` actual ` output - this is what the solution produces
179- 2 . Verify manually if ` actual ` is correct
180- 3 . If correct, update README.md with the correct expected output
181- 4 . If incorrect, the solution has a bug - fix and rebuild
182-
183- ### If Sample File Fails
184-
185- 1 . Check if the ` .ans ` file exists
186- 2 . Compare ` actual ` vs ` expected ` output
187- 3 . Update ` .ans ` file if solution is correct
188- 4 . Or fix solution if it's wrong
189-
190- ## Output Comparison Rules
191-
192- The tool uses multiple comparison methods:
193-
194- 1 . ** Exact match** : Direct string comparison
195- 2 . ** Whitespace-insensitive** : Compares after stripping trailing whitespace from each line
196- 3 . ** Token match** : Compares after splitting on whitespace
197- 4 . ** Floating-point tolerance** : Compares numeric values within tolerance (default 1e-9)
198-
199- This handles common formatting variations while catching actual errors.
200-
201- ## Integration with Workflow
202-
203- The validation step is enforced by ` workflow_guard.py ` :
204-
205- ```
206- stress_test_run -> problem_validate -> problem_generate_tests
207- ```
208-
209- You cannot skip validation - it must pass before generating final tests.
28+ - ` decision ` : ` go ` / ` no_go `
29+ - ` blocking_issues ` : validation blockers that must be fixed
30+ - ` next_actions ` : exact re-validation steps after fixes
21031
211- ## Best Practices
32+ ## Forbidden Behavior
21233
213- 1 . ** Write samples early** : Include samples in README.md during problem creation
214- 2 . ** Validate after stress test** : Solution is verified, so any mismatch is likely in the statement
215- 3 . ** Keep samples simple** : Use small, easy-to-verify examples
216- 4 . ** Include edge cases** : Add boundary samples that test limits
217- 5 . ** Match format** : Ensure sample format matches actual input/output format
34+ - Do not proceed to ` problem_generate_tests ` while validation is failing.
35+ - Do not approve based on file presence without validation evidence.
36+ - Do not rewrite expected outputs unless the corrected output is justified.
21837
21938## Decision Rules
22039
22140- ` go ` : all selected validation types pass without unresolved mismatches.
22241- ` no_go ` : any selected validation type fails, or validation evidence is incomplete.
42+
43+ ## Additional Resources
44+
45+ For tool examples, detailed output schemas, error recovery playbooks, and comparison rules, see [ reference.md] ( reference.md ) .
0 commit comments