@@ -52,13 +52,19 @@ Based on the paper "AutoCode: LLMs as Problem Setters for Competitive Programmin
5252│ │ (for non-exact problems) │ │
5353│ └────────────────────┬────────────────────┘ │
5454│ │ │
55- │ Phase 7: Test Generation │
55+ │ Phase 7: Sample Validation │
56+ │ ┌────────────────────┴────────────────────┐ │
57+ │ │ problem_validate │ Validate statement samples │
58+ │ │ (statement_samples + sample_files) │ and test files │
59+ │ └────────────────────┬────────────────────┘ │
60+ │ │ │
61+ │ Phase 8: Test Generation │
5662│ ┌────────────────────┴────────────────────┐ │
5763│ │ problem_generate_tests │ Generate final test data │
5864│ │ (dedup + validator filter + balance) │ │
5965│ └────────────────────┬────────────────────┘ │
6066│ │ │
61- │ Phase 8 : Packaging │
67+ │ Phase 9 : Packaging │
6268│ ┌────────────────────┴────────────────────┐ │
6369│ │ problem_pack_polygon │ Export for Codeforces/Polygon │
6470│ └─────────────────────────────────────────┘ │
@@ -198,9 +204,31 @@ Verify: Check accuracy >= 0.9
198204]
199205```
200206
201- ### Phase 7: Test Generation
207+ ### Phase 7: Sample Validation
202208
203- ** Step 7.1: Generate Final Tests**
209+ ** Step 7.1: Validate Statement Samples**
210+ ```
211+ Tool: problem_validate
212+ Required: problem_dir
213+ Optional: statement_samples (if not provided, auto-extract from README.md)
214+ Output: validation results for statement_samples and sample_files
215+ Verify: Check success=true, all samples passed
216+ CRITICAL: Must pass validation before generating final tests
217+ ```
218+
219+ ** Validation Types:**
220+ - ` statement_samples ` : Validate samples in problem statement (README.md)
221+ - ` sample_files ` : Validate sample files in tests/ directory
222+
223+ ** If validation fails:**
224+ 1 . Check the failing sample's expected output
225+ 2 . Run sol manually to verify correct output
226+ 3 . Update README.md or sample files as needed
227+ 4 . Re-run validation
228+
229+ ### Phase 8: Test Generation
230+
231+ ** Step 8.1: Generate Final Tests**
204232```
205233Tool: problem_generate_tests
206234Required: problem_dir
@@ -209,9 +237,9 @@ Output: tests/01.in ~ tests/50.in + corresponding .ans files
209237Verify: Check generated_tests count matches test_count
210238```
211239
212- ### Phase 8 : Packaging
240+ ### Phase 9 : Packaging
213241
214- ** Step 8 .1: Pack for Polygon**
242+ ** Step 9 .1: Pack for Polygon**
215243```
216244Tool: problem_pack_polygon
217245Required: problem_dir
@@ -254,16 +282,18 @@ Generate 3-5 mutant solutions with common bugs:
254282| 4 | ` generator_build ` | Step 3 | ` success=true ` , gen.exe exists |
255283| 5 | ` stress_test_run ` | Step 4 | ` "All N rounds passed" ` |
256284| 6 | ` checker_build ` (optional) | Step 5 | ` accuracy >= 0.9 ` |
257- | 7 | ` problem_generate_tests ` | Step 5 or 6 | ` generated_tests == test_count ` |
258- | 8 | ` problem_pack_polygon ` | Step 7 | ` success=true ` |
285+ | 7 | ` problem_validate ` | Step 5 or 6 | ` success=true ` , all samples passed |
286+ | 8 | ` problem_generate_tests ` | Step 7 | ` generated_tests == test_count ` |
287+ | 9 | ` problem_pack_polygon ` | Step 8 | ` success=true ` |
259288
260289### FORBIDDEN Actions
261290
2622911 . ** NEVER** call ` generator_build ` before ` validator_build `
2632922 . ** NEVER** call ` stress_test_run ` before building BOTH sol AND brute
264- 3 . ** NEVER** call ` problem_generate_tests ` before stress test passes
265- 4 . ** NEVER** skip stress test verification
266- 5 . ** NEVER** proceed if any step returns ` success=false `
293+ 3 . ** NEVER** call ` problem_validate ` before stress test passes
294+ 4 . ** NEVER** call ` problem_generate_tests ` before validation passes
295+ 5 . ** NEVER** skip stress test verification
296+ 6 . ** NEVER** proceed if any step returns ` success=false `
267297
268298## Error Recovery
269299
@@ -284,6 +314,13 @@ Generate 3-5 mutant solutions with common bugs:
2843143 . Fix the buggy solution
2853154 . Rebuild and re-run stress test
286316
317+ ### Validation Failure
318+ 1 . The result contains ` statement_samples ` or ` sample_files ` details
319+ 2 . Check which sample failed (expected vs actual output)
320+ 3 . Verify correct output by running sol manually
321+ 4 . Update README.md or sample files with correct output
322+ 5 . Re-run validation
323+
287324## Quality Checklist
288325
289326Before considering the problem complete:
@@ -295,6 +332,8 @@ Before considering the problem complete:
295332- [ ] Generator produces valid inputs
296333- [ ] Stress test passes 1000+ rounds
297334- [ ] (If applicable) Checker passes 90%+ scenarios
335+ - [ ] Statement samples validated (problem_validate passed)
336+ - [ ] Sample files validated (problem_validate passed)
298337- [ ] Final test data generated (50+ tests)
299338- [ ] Polygon package created
300339
@@ -329,11 +368,15 @@ assert result["completed_rounds"] == result["total_rounds"]
329368result = checker_build(problem_dir="problems/ab", code=checker_code, test_scenarios=checker_tests)
330369assert result["accuracy"] >= 0.9
331370
332- # Phase 7: Generate Tests
371+ # Phase 7: Validate Samples
372+ result = problem_validate(problem_dir="problems/ab")
373+ assert result["success"] == True
374+
375+ # Phase 8: Generate Tests
333376result = problem_generate_tests(problem_dir="problems/ab", test_count=50)
334377assert len(result["generated_tests"]) == 50
335378
336- # Phase 8 : Package
379+ # Phase 9 : Package
337380result = problem_pack_polygon(problem_dir="problems/ab", time_limit=1, memory_limit=256)
338381assert result["success"] == True
339382```
@@ -356,4 +399,5 @@ If the user asks to skip steps (e.g., "just generate tests"), you MUST:
356399| ` checker_build ` | Algorithm 3 | Build output verification |
357400| ` interactor_build ` | Algorithm 4 | Build interactive problem handler |
358401| ` stress_test_run ` | - | Verify solution correctness |
402+ | ` problem_validate ` | - | Validate statement samples and sample files |
359403| ` problem_generate_tests ` | - | Generate final test dataset |
0 commit comments