Tighten effectiveness evidence report traceability by jihwan4155 · Pull Request #33 · baskduf/harness-starter-kit

jihwan4155 · 2026-06-04T00:36:35Z

I updated the PR to address the requested changes:

Renamed the report to docs/examples/effectiveness-small-evidence-report.md so it is included by scripts/check_effectiveness_plan.py.
Updated the docs/evaluation.md link to the renamed report.
Strengthened traceability in the task outcome records by replacing weak local refs with stable repository/commit references.
Narrowed the drift metric by separating the single docs-drift violation from broader review/feedback-loop gaps.
Reworded the report claim to stay narrow: operational outcome evidence, not proof of general agent effectiveness.

Checks run:

python scripts/check_docs_drift.py
python scripts/check_structure.py
python scripts/check_encoding_hygiene.py
python scripts/check_effectiveness_plan.py
python scripts/check_decision_memory.py
python scripts/harness_doctor.py --target .
python -m unittest discover -s tests

All checks passed. harness_doctor.py --target . reports 100/100 baseline evidence.

baskduf

Thanks for tightening the evidence report. I don’t think this is ready to approve yet because the main verification claim is still broken.

Blocking:

docs/examples/effectiveness-small-evidence-report.md is not actually included by scripts/check_effectiveness_plan.py. The checker only treats Markdown files as reports when the filename contains the contiguous substring effectiveness-report, but this filename contains effectiveness-small-evidence-report, so is_report(...) returns false. That means CI can pass while skipping this new report.
Please either rename the file to something like effectiveness-report-small-evidence.md, or update the checker and tests to intentionally include this naming pattern.

Also worth fixing before merge:

The task outcome records now include jihwan4155/recipe-api@99af81..., but that repo/commit is not accessible to me via GitHub, and the aggregate report still says Repository refs compared: local practice branch snapshots. This weakens the traceability goal from docs/evaluation.md.
The aggregate Review gaps detected count looks ambiguous: the report says 3, but task 001 lists 3 review gaps and task 002 also records the missing PATCH tests as a review gap. Please reconcile the count or narrow the metric definition.

Checks I ran on the PR head:

python3 -m unittest discover -s tests
python3 -m py_compile ...
python3 scripts/check_docs_drift.py
python3 scripts/check_structure.py
python3 scripts/check_encoding_hygiene.py
python3 scripts/check_effectiveness_plan.py
python3 scripts/check_failure_memory.py
python3 scripts/check_decision_memory.py --base ca367dbd3da9e89aa2653cf26bcdd00180b792a9
python3 scripts/harness_doctor.py --target .

All passed, but the first issue means the new evidence report is currently not being validated.

baskduf · 2026-06-04T02:30:40Z

I don’t think this is ready to approve yet, but this should be a small fix.

The blocking issue is that the new report is still not included by scripts/check_effectiveness_plan.py: the checker only recognizes Markdown files whose names contain the contiguous substring effectiveness-report, while this file is named effectiveness-small-evidence-report.md. So CI can pass while skipping the new report.

Please either rename it to something like effectiveness-report-small-evidence.md, or update the checker/tests to intentionally include this naming pattern.

Non-blocking: it would also help to tighten the source refs / aggregate count wording, but the validation mismatch is the main thing blocking approval.

jihwan4155 · 2026-06-04T03:02:57Z

I updated the PR to address the blocking validation issue and the additional evidence-quality concerns.

Changes made:

Renamed the report to docs/examples/effectiveness-report-small-evidence.md so it is included by scripts/check_effectiveness_plan.py.
Updated the docs/evaluation.md link to the renamed report.
Updated the report language to keep the claim narrow and separate harness health from observed task outcomes.
Separated the single docs-drift violation from broader review/feedback-loop gaps.
Reconciled the review gap count so the aggregate report matches the task outcome records.
Updated the task outcome records/report traceability language.

Checks run:

python scripts/check_docs_drift.py
python scripts/check_structure.py
python scripts/check_encoding_hygiene.py
python scripts/check_effectiveness_plan.py
python scripts/check_failure_memory.py
python scripts/check_decision_memory.py
python scripts/harness_doctor.py --target .
python -m unittest discover -s tests

All checks pass locally. python -m unittest discover -s tests ran 116 tests with OK (skipped=1).

baskduf · 2026-06-04T04:18:57Z

The previous blockers look resolved now.

The effectiveness report is discovered by check_effectiveness_plan.py.
The review-gap count is reconciled.
The referenced source commit jihwan4155/recipe-api@99af81bf0da4a8bfecb19e5ca0af817b276f49b6 is now reachable.

I’m approving this from a review standpoint. Please make sure the GitHub Actions Harness Check is approved/re-run and passing before merge, since the latest PR run is currently action-required rather than green.

baskduf

Previous blockers are resolved: the effectiveness report is discovered by the checker, the review-gap count is reconciled, and the referenced source commit is now reachable. Local harness checks passed in review.

jihwan4155 added 2 commits June 3, 2026 18:57

Add small harness effectiveness evidence pass

4191f7f

Tighten effectiveness evidence report traceability

ae1477c

baskduf requested changes Jun 4, 2026

View reviewed changes

Fix effectiveness report validation and traceability

bc63ca3

baskduf approved these changes Jun 4, 2026

View reviewed changes

baskduf merged commit 2e7bad5 into baskduf:main Jun 4, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tighten effectiveness evidence report traceability#33

Tighten effectiveness evidence report traceability#33
baskduf merged 3 commits into
baskduf:mainfrom
jihwan4155:evidence/effectiveness-small-pass

jihwan4155 commented Jun 4, 2026

Uh oh!

baskduf left a comment

Uh oh!

baskduf commented Jun 4, 2026

Uh oh!

jihwan4155 commented Jun 4, 2026

Uh oh!

baskduf commented Jun 4, 2026

Uh oh!

baskduf left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jihwan4155 commented Jun 4, 2026

Uh oh!

baskduf left a comment

Choose a reason for hiding this comment

Uh oh!

baskduf commented Jun 4, 2026

Uh oh!

jihwan4155 commented Jun 4, 2026

Uh oh!

baskduf commented Jun 4, 2026

Uh oh!

baskduf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants