Skip to content

Commit cdb7da4

Browse files
committed
docs: refactor skill governance and reduce workflow duplication
Restructure problem-validate into a lean skill with a separate reference file, align governance requirements with checklist enforcement, and reduce duplicated workflow details between agent and skill while keeping gate semantics unchanged. Made-with: Cursor
1 parent 611c725 commit cdb7da4

8 files changed

Lines changed: 177 additions & 229 deletions

File tree

agents/autocode-workflow.md

Lines changed: 10 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -38,26 +38,16 @@ Also report:
3838

3939
## Canonical Workflow
4040

41-
Use this sequence unless the user request is explicitly outside problem creation.
41+
Use the canonical sequence defined in `skills/autocode-workflow/SKILL.md`.
42+
This agent is responsible for orchestration and gate handling, not for duplicating the full workflow reference.
4243

43-
Non-interactive:
44+
Orchestration requirements:
4445

45-
1. `problem_create`
46-
2. `solution_build(solution_type="sol")`
47-
3. `solution_build(solution_type="brute")`
48-
4. `solution_analyze`, `solution_audit_std`, `solution_audit_brute`
49-
5. `validator_build(accuracy >= 0.9)`
50-
6. `generator_build`
51-
7. `stress_test_run`
52-
8. `checker_build` when non-exact output is required
53-
9. `problem_validate`
54-
10. `problem_generate_tests`
55-
11. `problem_verify_tests`
56-
12. `problem_pack_polygon`
57-
58-
Interactive:
59-
60-
- replace `validator_build` and `checker_build` with `interactor_build`.
46+
1. identify current workflow position;
47+
2. select the next valid MCP call;
48+
3. block or reroute when prerequisites are missing;
49+
4. delegate deep checks to dedicated auditors/skills;
50+
5. continue only after gate evidence is satisfied.
6151

6252
## Gate Discipline
6353

@@ -69,10 +59,8 @@ Interactive:
6959

7060
## Test-Quality Requirements
7161

72-
During `problem_generate_tests` and `problem_verify_tests`:
73-
74-
- target at least 50% limit-oriented cases (`type=3` + `type=4`) when candidate availability allows;
75-
- require semantic difference between `type=3` and `type=4` (`type=4` is targeted worst-case/TLE, not only max-parameter scaling).
62+
Enforce test-quality gates using `testdata-quality` and `problem-validate` skills.
63+
Do not restate tool reference details here; use skill contracts as the source of truth.
7664

7765
## Long-Running Generation
7866

skills/agent-skill-governance/SKILL.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ Must include:
4444
3. Step-by-step execution guidance.
4545
4. Required output format.
4646
5. Decision rules (`go` vs `no_go`).
47+
6. Forbidden behavior.
4748

4849
## Workflow Consistency Rules
4950

@@ -66,8 +67,11 @@ Before finalizing edits:
6667
Use this checklist when updating any `agents/` or `skills/` file:
6768

6869
- [ ] Role/scope is explicit.
70+
- [ ] Trigger conditions are explicit (skills).
6971
- [ ] Output contract is explicit.
7072
- [ ] Gate behavior is explicit.
7173
- [ ] Decision rules are explicit.
7274
- [ ] Forbidden behavior is explicit.
7375
- [ ] Terminology is consistent with project standards.
76+
77+
Project-specific constraints such as maximum clarification count are allowed as long as they are explicit and non-contradictory.

skills/problem-validate/SKILL.md

Lines changed: 23 additions & 200 deletions
Original file line numberDiff line numberDiff line change
@@ -6,217 +6,40 @@ disable-model-invocation: false
66

77
# Problem Validation Skill
88

9-
This skill guides the validation of problem statement samples and sample files to ensure correctness before generating final test data.
9+
Use this skill to enforce sample correctness before final test generation.
1010

11-
Hard rule:
12-
- Validation failure is a release blocker. Do not proceed to final test generation or packaging until all sample validations pass.
11+
## Trigger Conditions
1312

14-
Output contract:
15-
- `decision`: `go` / `no_go`
16-
- `blocking_issues`: validation blockers that must be fixed
17-
- `next_actions`: exact re-validation steps after fixes
18-
19-
## Overview
20-
21-
The `problem_validate` tool verifies that:
22-
23-
1. **Statement Samples**: The expected outputs in the problem statement (README.md) match what the solution actually produces
24-
2. **Sample Files**: The sample files in `tests/` directory (e.g., `01.in`, `01.ans`) are consistent with the solution
25-
26-
This catches common errors like:
27-
- Wrong expected output in problem statement
28-
- Sample files that don't match the problem description
29-
- Solution bugs that would fail on the provided examples
30-
31-
## When to Use
32-
33-
Use this skill after:
34-
- `stress_test_run` has passed (solution is verified correct)
35-
- Before `problem_generate_tests` (ensures samples are correct)
36-
37-
## Validation Types
38-
39-
### 1. Statement Samples (`statement_samples`)
40-
41-
Validates samples embedded in the problem statement (README.md).
42-
43-
**Supported formats (both localized labels and English labels are accepted):**
44-
45-
```markdown
46-
**样例输入 1**
47-
```text
48-
5
49-
3 -5 2 -8 4
50-
```
51-
52-
**样例输出 1**
53-
```text
54-
2
55-
```
56-
```
57-
58-
Or standard English labels:
59-
60-
```markdown
61-
**Sample Input 1**
62-
```text
63-
5
64-
3 -5 2 -8 4
65-
```
66-
67-
**Sample Output 1**
68-
```text
69-
2
70-
```
71-
```
72-
73-
### 2. Sample Files (`sample_files`)
74-
75-
Validates sample files in the `tests/` directory:
76-
- `01.in`, `01.ans`
77-
- `02.in`, `02.ans`
78-
- etc.
79-
80-
## Usage
81-
82-
### Basic Usage
83-
84-
```json
85-
{
86-
"problem_dir": "problems/emergency-escape"
87-
}
88-
```
89-
90-
Auto-extracts samples from README.md and validates all sample files.
13+
Use when:
9114

92-
### With Explicit Samples
15+
- `stress_test_run` has passed and the next step is sample verification;
16+
- the user updates statement samples or sample files;
17+
- workflow is blocked by `problem_validate` or sample mismatch.
9318

94-
```json
95-
{
96-
"problem_dir": "problems/emergency-escape",
97-
"statement_samples": [
98-
{
99-
"input": "5\n3 -5 2 -8 4",
100-
"expected_output": "2"
101-
}
102-
]
103-
}
104-
```
19+
## Core Instructions
10520

106-
### Validate Only Statement Samples
21+
1. Run `problem_validate` against statement samples and sample files.
22+
2. Treat any mismatch as a release blocker.
23+
3. Classify failures by source: statement text, sample files, or solution behavior.
24+
4. Re-run validation after fixes before allowing progression.
10725

108-
```json
109-
{
110-
"problem_dir": "problems/emergency-escape",
111-
"validate_types": ["statement_samples"]
112-
}
113-
```
26+
## Output Contract
11427

115-
### Validate Only Sample Files
116-
117-
```json
118-
{
119-
"problem_dir": "problems/emergency-escape",
120-
"validate_types": ["sample_files"]
121-
}
122-
```
123-
124-
## Output Interpretation
125-
126-
### Success
127-
128-
```json
129-
{
130-
"success": true,
131-
"data": {
132-
"statement_samples": {
133-
"validated": true,
134-
"passed": 2,
135-
"failed": 0,
136-
"total": 2
137-
},
138-
"sample_files": {
139-
"validated": true,
140-
"passed": 3,
141-
"failed": 0,
142-
"total": 3
143-
}
144-
}
145-
}
146-
```
147-
148-
### Failure
149-
150-
```json
151-
{
152-
"success": false,
153-
"error": "Validation failed",
154-
"data": {
155-
"statement_samples": {
156-
"validated": true,
157-
"passed": 1,
158-
"failed": 1,
159-
"total": 2,
160-
"details": [
161-
{
162-
"index": 1,
163-
"input": "5\n3 -5 2 -8 4",
164-
"expected": "5",
165-
"actual": "2",
166-
"passed": false
167-
}
168-
]
169-
}
170-
}
171-
}
172-
```
173-
174-
## Error Recovery
175-
176-
### If Statement Sample Fails
177-
178-
1. Check the `actual` output - this is what the solution produces
179-
2. Verify manually if `actual` is correct
180-
3. If correct, update README.md with the correct expected output
181-
4. If incorrect, the solution has a bug - fix and rebuild
182-
183-
### If Sample File Fails
184-
185-
1. Check if the `.ans` file exists
186-
2. Compare `actual` vs `expected` output
187-
3. Update `.ans` file if solution is correct
188-
4. Or fix solution if it's wrong
189-
190-
## Output Comparison Rules
191-
192-
The tool uses multiple comparison methods:
193-
194-
1. **Exact match**: Direct string comparison
195-
2. **Whitespace-insensitive**: Compares after stripping trailing whitespace from each line
196-
3. **Token match**: Compares after splitting on whitespace
197-
4. **Floating-point tolerance**: Compares numeric values within tolerance (default 1e-9)
198-
199-
This handles common formatting variations while catching actual errors.
200-
201-
## Integration with Workflow
202-
203-
The validation step is enforced by `workflow_guard.py`:
204-
205-
```
206-
stress_test_run -> problem_validate -> problem_generate_tests
207-
```
208-
209-
You cannot skip validation - it must pass before generating final tests.
28+
- `decision`: `go` / `no_go`
29+
- `blocking_issues`: validation blockers that must be fixed
30+
- `next_actions`: exact re-validation steps after fixes
21031

211-
## Best Practices
32+
## Forbidden Behavior
21233

213-
1. **Write samples early**: Include samples in README.md during problem creation
214-
2. **Validate after stress test**: Solution is verified, so any mismatch is likely in the statement
215-
3. **Keep samples simple**: Use small, easy-to-verify examples
216-
4. **Include edge cases**: Add boundary samples that test limits
217-
5. **Match format**: Ensure sample format matches actual input/output format
34+
- Do not proceed to `problem_generate_tests` while validation is failing.
35+
- Do not approve based on file presence without validation evidence.
36+
- Do not rewrite expected outputs unless the corrected output is justified.
21837

21938
## Decision Rules
22039

22140
- `go`: all selected validation types pass without unresolved mismatches.
22241
- `no_go`: any selected validation type fails, or validation evidence is incomplete.
42+
43+
## Additional Resources
44+
45+
For tool examples, detailed output schemas, error recovery playbooks, and comparison rules, see [reference.md](reference.md).

0 commit comments

Comments
 (0)