Skip to content

Commit 0504da6

Browse files
committed
docs: add Mar 23-24 session learnings (mcp_suite metadata gap)
Reviewed 6 JSONL sessions from Mar 23-24. No new code bugs found beyond what nightly #17 already captured. Added mcp_suite field missing from 139 tasks to the metadata gap documentation (distinct from verification_modes/use_case_category missing from all 274).
1 parent 432d66f commit 0504da6

File tree

3 files changed

+3
-3
lines changed

3 files changed

+3
-3
lines changed

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ full operations manual.
9999
- `abc_audit.py`: 6+ functions defined twice (T5,R2,T10,OA,OB,OG); Python uses last. T5+R2: `pytest` 2 FAIL / 40 pass. Leaks/contamination pass audit silently.
100100
- `rerun_failed.py`: `shell=True` injection; wrong `sourcegraph_full→deepsearch`; deprecated model.
101101
- `ir_metrics.py:749`: `tt_all_r` set comparison bug. `--skip-completed`: check only result.json.
102-
- Task registry header: claims 436, actual 274. `verification_modes`/`use_case_category` missing from all tasks.
102+
- Task registry header: claims 436, actual 274. `verification_modes`/`use_case_category` missing from all 274 tasks; `mcp_suite` missing from 139.
103103

104104
### Validation / Scoring
105105
- `validators.py` duplicated in `ccb_build`; update all copies (`sha256sum`). Agent <2s = never ran. CSB dual-score: edits + `answer.json` independent. Fallback: promoted_verifier→oracle_checks→heuristic.

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ full operations manual.
9999
- `abc_audit.py`: 6+ functions defined twice (T5,R2,T10,OA,OB,OG); Python uses last. T5+R2: `pytest` 2 FAIL / 40 pass. Leaks/contamination pass audit silently.
100100
- `rerun_failed.py`: `shell=True` injection; wrong `sourcegraph_full→deepsearch`; deprecated model.
101101
- `ir_metrics.py:749`: `tt_all_r` set comparison bug. `--skip-completed`: check only result.json.
102-
- Task registry header: claims 436, actual 274. `verification_modes`/`use_case_category` missing from all tasks.
102+
- Task registry header: claims 436, actual 274. `verification_modes`/`use_case_category` missing from all 274 tasks; `mcp_suite` missing from 139.
103103

104104
### Validation / Scoring
105105
- `validators.py` duplicated in `ccb_build`; update all copies (`sha256sum`). Agent <2s = never ran. CSB dual-score: edits + `answer.json` independent. Fallback: promoted_verifier→oracle_checks→heuristic.

docs/ops/ROOT_AGENT_GUIDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ full operations manual.
9999
- `abc_audit.py`: 6+ functions defined twice (T5,R2,T10,OA,OB,OG); Python uses last. T5+R2: `pytest` 2 FAIL / 40 pass. Leaks/contamination pass audit silently.
100100
- `rerun_failed.py`: `shell=True` injection; wrong `sourcegraph_full→deepsearch`; deprecated model.
101101
- `ir_metrics.py:749`: `tt_all_r` set comparison bug. `--skip-completed`: check only result.json.
102-
- Task registry header: claims 436, actual 274. `verification_modes`/`use_case_category` missing from all tasks.
102+
- Task registry header: claims 436, actual 274. `verification_modes`/`use_case_category` missing from all 274 tasks; `mcp_suite` missing from 139.
103103

104104
### Validation / Scoring
105105
- `validators.py` duplicated in `ccb_build`; update all copies (`sha256sum`). Agent <2s = never ran. CSB dual-score: edits + `answer.json` independent. Fallback: promoted_verifier→oracle_checks→heuristic.

0 commit comments

Comments
 (0)