Tighten effectiveness evidence report traceability#33
Conversation
baskduf
left a comment
There was a problem hiding this comment.
Thanks for tightening the evidence report. I don’t think this is ready to approve yet because the main verification claim is still broken.
Blocking:
docs/examples/effectiveness-small-evidence-report.mdis not actually included byscripts/check_effectiveness_plan.py. The checker only treats Markdown files as reports when the filename contains the contiguous substringeffectiveness-report, but this filename containseffectiveness-small-evidence-report, sois_report(...)returns false. That means CI can pass while skipping this new report.
Please either rename the file to something likeeffectiveness-report-small-evidence.md, or update the checker and tests to intentionally include this naming pattern.
Also worth fixing before merge:
- The task outcome records now include
jihwan4155/recipe-api@99af81..., but that repo/commit is not accessible to me via GitHub, and the aggregate report still saysRepository refs compared: local practice branch snapshots. This weakens the traceability goal fromdocs/evaluation.md. - The aggregate
Review gaps detectedcount looks ambiguous: the report says 3, but task001lists 3 review gaps and task002also records the missing PATCH tests as a review gap. Please reconcile the count or narrow the metric definition.
Checks I ran on the PR head:
python3 -m unittest discover -s testspython3 -m py_compile ...python3 scripts/check_docs_drift.pypython3 scripts/check_structure.pypython3 scripts/check_encoding_hygiene.pypython3 scripts/check_effectiveness_plan.pypython3 scripts/check_failure_memory.pypython3 scripts/check_decision_memory.py --base ca367dbd3da9e89aa2653cf26bcdd00180b792a9python3 scripts/harness_doctor.py --target .
All passed, but the first issue means the new evidence report is currently not being validated.
|
I don’t think this is ready to approve yet, but this should be a small fix. The blocking issue is that the new report is still not included by Please either rename it to something like Non-blocking: it would also help to tighten the source refs / aggregate count wording, but the validation mismatch is the main thing blocking approval. |
|
I updated the PR to address the blocking validation issue and the additional evidence-quality concerns. Changes made:
Checks run:
All checks pass locally. |
|
The previous blockers look resolved now.
I’m approving this from a review standpoint. Please make sure the GitHub Actions |
baskduf
left a comment
There was a problem hiding this comment.
Previous blockers are resolved: the effectiveness report is discovered by the checker, the review-gap count is reconciled, and the referenced source commit is now reachable. Local harness checks passed in review.
I updated the PR to address the requested changes:
docs/examples/effectiveness-small-evidence-report.mdso it is included byscripts/check_effectiveness_plan.py.docs/evaluation.mdlink to the renamed report.Checks run:
python scripts/check_docs_drift.pypython scripts/check_structure.pypython scripts/check_encoding_hygiene.pypython scripts/check_effectiveness_plan.pypython scripts/check_decision_memory.pypython scripts/harness_doctor.py --target .python -m unittest discover -s testsAll checks passed.
harness_doctor.py --target .reports 100/100 baseline evidence.