docs: record h2 order-control seed stability#313
Conversation
There was a problem hiding this comment.
Code Review
This pull request documents the results of a seed-stability scout for the H2 output-cloud geometry candidate using seed 177. The findings confirm that the signal remains strong (AUC = 0.956192) and is not a single-seed artifact, while label-shuffle sanity tests remain at random levels. The changes include updates to AGENTS.md, ROADMAP.md, and various evidence documents to reflect these results, alongside the addition of two new JSON artifact files. Feedback was provided to improve cross-platform compatibility by using forward slashes instead of Windows-style backslashes in file paths within the new JSON artifacts.
| "track": "black-box", | ||
| "method": "H2 output-cloud geometry scorer", | ||
| "mode": "cpu-cache-review", | ||
| "response_cache": "workspaces\\black-box\\runs\\h2-response-strength-256-shared-position-seed177-20260525-r1\\response-cache.npz", |
There was a problem hiding this comment.
The response_cache path uses Windows-style backslashes. For better cross-platform compatibility and consistency with other documentation in this repository (e.g., .gitignore and ROADMAP.md), it is recommended to use forward slashes.
| "response_cache": "workspaces\\black-box\\runs\\h2-response-strength-256-shared-position-seed177-20260525-r1\\response-cache.npz", | |
| "response_cache": "workspaces/black-box/runs/h2-response-strength-256-shared-position-seed177-20260525-r1/response-cache.npz", |
| "track": "black-box", | ||
| "method": "H2 output-cloud geometry scorer", | ||
| "mode": "cpu-cache-review", | ||
| "response_cache": "workspaces\\black-box\\runs\\h2-response-strength-256-shared-position-seed177-20260525-r1\\response-cache.npz", |
There was a problem hiding this comment.
The response_cache path uses Windows-style backslashes. For better cross-platform compatibility and consistency with other documentation in this repository, it is recommended to use forward slashes.
| "response_cache": "workspaces\\black-box\\runs\\h2-response-strength-256-shared-position-seed177-20260525-r1\\response-cache.npz", | |
| "response_cache": "workspaces/black-box/runs/h2-response-strength-256-shared-position-seed177-20260525-r1/response-cache.npz", |
Summary
256 / 256shared-position seed177stability scout for H2 output-cloud geometry177original-label and label-shuffle reviewsDecision
512 / 512shared-position rerun selected by defaultVerification
python -X utf8 scripts/check_markdown_links.pypython -X utf8 scripts/check_public_surface.pypython -X utf8 scripts/export_admitted_evidence_bundle.py --checkpython -X utf8 scripts/run_pr_checks.py