Round 76: standardize 06-repo prompt contracts + eval anti-drift

johnteee · johnteee · commit 3cf341c7f9d1 · 2026-06-25T15:40:51.000+08:00
diff --git a/reflective-prompt-library/06-repo/AGENTS.md b/reflective-prompt-library/06-repo/AGENTS.md
@@ -1,5 +1,23 @@
 # AGENTS.md
 
+## Purpose
+
+Repository-level harness policy for reflective engineering agents. Primary workflow surfaces: `reflective-dispatch`, `reflective-implement`, and `reflective-spec-plan`. Pairs with `01-thinking/why-what-how-done.md`, `01-thinking/critical-thinking-check.md`, and `01-thinking/socratic-reviewer.md`.
+
+## Scope
+
+- In scope: strictness-first routing, nine frozen workflow skills, evidence-backed completion, project-knowledge authority boundary.
+- Out of scope: multi-agent runtime, tenth core skill without promotion gate, silent rigor downgrade.
+
+## Acceptance Criteria
+
+- Non-trivial tasks produce goal, scope, acceptance criteria, and test-backed completion evidence.
+- High-blast-radius work stops for human review before irreversible action.
+
+## Falsifiability
+
+Name one task that would fail if this harness policy were ignored at session start.
+
 ## Mission
 
 This repository uses Reflective Engineering Agent Protocol.
diff --git a/reflective-prompt-library/06-repo/PROJECT_KNOWLEDGE.template.md b/reflective-prompt-library/06-repo/PROJECT_KNOWLEDGE.template.md
@@ -1,5 +1,28 @@
 Language: English
 
+## Purpose
+
+Scaffold for project-design judgement layer (non-authoritative). Primary workflow surfaces: `reflective-handoff-retro` and `reflective-brief`. Pairs with `01-thinking/socratic-reviewer.md` and `01-thinking/falsifiability.md`.
+
+## Scope
+
+- In scope: governing principles, active direction, durable lessons, decision index, growth gate.
+- Out of scope: agent operating rules (belong in `AGENTS.md` or `SKILL.md`).
+
+## Acceptance Criteria
+
+- Each lesson and principle has evidence and a review or retirement trigger.
+- Milestones carry verifiable targets; done milestones reflow before retirement.
+
+## Falsifiability
+
+State what evidence would retire a recorded principle as stale.
+
+## Human Review
+
+Require human approval before promoting lessons into executable skills or overriding `AGENTS.md`.
+
+
 # Project Knowledge — [Project Name]
 
 > **NON-AUTHORITATIVE FILE.** This artifact records project-design judgement:
diff --git a/reflective-prompt-library/06-repo/codex-opencode.md b/reflective-prompt-library/06-repo/codex-opencode.md
@@ -1,5 +1,23 @@
 # Codex / OpenCode Task Prompt
 
+## Purpose
+
+Repo-aware coding task harness for inspection-first implementation. Primary workflow surfaces: `reflective-implement` and `reflective-dispatch`. Pairs with `01-thinking/why-what-how-done.md` and `01-thinking/falsifiability.md`.
+
+## Scope
+
+- In scope: repository inspection, brief plan, smallest safe edits, tests, final report.
+- Out of scope: changing task requirements or weakening tests without explicit approval.
+
+## Acceptance Criteria
+
+- Final report lists goal, files changed, acceptance criteria status, tests run, and risks.
+- Failures and skipped checks are reported honestly.
+
+## Falsifiability
+
+Name one completion claim that would be invalid without listing tests run.
+
 ```markdown
 You are a repo-aware coding agent.
 
diff --git a/reflective-prompt-library/06-repo/cursor-rules.md b/reflective-prompt-library/06-repo/cursor-rules.md
@@ -1,5 +1,27 @@
 # Cursor Rules
 
+## Purpose
+
+IDE-native editing harness for small, safe, reviewable changes. Primary workflow surfaces: `reflective-implement` and `reflective-review`. Pairs with `01-thinking/critical-thinking-check.md` and `01-thinking/falsifiability.md`.
+
+## Scope
+
+- In scope: goal restatement, minimal diffs, acceptance criteria traceability, post-edit test plan.
+- Out of scope: silent refactors outside stated scope (escalate to `reflective-minimality` when disputed).
+
+## Acceptance Criteria
+
+- Every edit maps to stated acceptance criteria with explicit risks and follow-up tests.
+- Credential, schema, billing, or destructive changes trigger a human review stop.
+
+## Falsifiability
+
+State what test or diff review would reject the change as out of scope.
+
+## Human Review
+
+Require human approval before editing credential handling, schema, billing, public API, or destructive operations.
+
 ```markdown
 You are working inside an IDE. Prioritize small, safe, reviewable edits.
 
diff --git a/reflective-prompt-library/PROJECT_KNOWLEDGE.md b/reflective-prompt-library/PROJECT_KNOWLEDGE.md
@@ -75,6 +75,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
 > Pointers to the causal trail — plans, reflections, tests, commits. Detail is
 > not duplicated here; this is a map, not an archive.
 
+- 2026-06-25 Round 76 panel — standardize `06-repo/` prompt contracts (Purpose/Scope/Acceptance/Falsifiability) + thinking/workflow cross-links + `test_repo_prompts_eval_harness.py` → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 75 panel — standardize `05-domain/` prompt contracts (Purpose/Scope/Acceptance/Falsifiability) + thinking/workflow cross-links + `test_domain_prompts_eval_harness.py` → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 74 panel — standardize `03-context/` prompt contracts (Purpose/Scope/Acceptance/Falsifiability) + thinking/workflow cross-links + `test_context_prompts_eval_harness.py` → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 73 panel — standardize `04-agent/` prompt contracts (Purpose/Scope/Acceptance/Falsifiability) + thinking/workflow cross-links + `test_agent_prompts_eval_harness.py` → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
diff --git a/reflective-prompt-library/README.md b/reflective-prompt-library/README.md
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
 
 ## Governance Panel Record
 
-Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–75, options A–DT) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
+Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–76, options A–DW) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
 
 ## Directory Map
 
diff --git a/reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md b/reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md
@@ -314,7 +314,7 @@ ROUTE-002 measures unseen phrasing separately from ROUTE-001. Round 7 (2026-06-2
 2. **ROUTE-001/002/003 in CI** — 128 + 102 + 53 paraphrases at 100% consistency (seeded fixtures); `validate_route_fixture.py` gates minimum coverage
 3. **Governance validators** — links, lint, governance metadata, PROJECT_KNOWLEDGE, benchmark fixture, skill examples
 4. **Harness policy docs** — CONTRIBUTING, AGENTS, SKILL_INSTALLATION, maintenance playbook
-5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py` (370+ pytest anti-drift suite in CI)
+5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py` (400+ pytest anti-drift suite in CI)
 
 ### Ongoing maintenance (not blockers)
 
@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
 - ✅ Benchmark fixture gate plus optional manual benchmark runs
 - ✅ Research-backed design decisions
 
-The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–75; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
+The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–76; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
diff --git a/reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md b/reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md
@@ -1877,4 +1877,61 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
 
 ## Panel status (updated)
 
-**Resealed 2026-06-25** after **Round 75** (options DR–DT). Domain-prompt contract pass complete; all composable prompt categories (`00-core`–`05-domain`) now standardized. `06-repo` template Purpose sweep remains recurrence-gated.
+## Round 76 — Repository template contract review (2026-06-25)
+
+User directive (repeat): review prompts, plans, skills, and Socratic/critical-thinking lenses in parallel until all roles agree, then implement.
+
+### DU: Standardize `06-repo/` prompt contracts + cross-links?
+
+| Lens | Position |
+| --- | --- |
+| Opus | **Agree** — host-install templates are the last unstructured prompt layer; AGENTS.md is canonical harness surface |
+| Codex | **Agree** — four files bounded; falsifiable via `test_repo_prompts_eval_harness.py` + cross-link pytest |
+| Gemini | **Agree** — IDE/Codex templates are cost-relevant entry points; defer governance pytest mirrors |
+| Composer | **Agree** — AGENTS harness policy already cited; contract headers close eval_harness gap |
+| Sakana | **Agree** — no tenth skill; repo templates support existing nine |
+| GLM | **Agree** — English contracts outside localized fences; Human Review on high-blast-radius templates |
+
+**Socratic Q:** Why `06-repo` after `05-domain`?
+**Answer:** Repo templates are host-install artifacts with distinct authority boundaries (`AGENTS.md` vs `PROJECT_KNOWLEDGE.md`); completing them finishes the prompt-library contract sweep.
+
+**Consensus:** **Agree** — Purpose/Scope/Acceptance Criteria/Falsifiability on all four `06-repo/` templates; thinking + workflow cross-links; `test_repo_prompts_eval_harness.py`; extend `test_prompt_cross_links.py`; preserve existing Harness Policy section in AGENTS.md.
+
+### DV: Governance pytest mirrors (`validate_links`, `validate_governance`, `lint_skills`) now?
+
+| Lens | Position |
+| --- | --- |
+| All six | **Reject** — recurrence-gated (option DH); `make validate` already covers these |
+
+### DW: Router / holdout / tenth skill?
+
+| Lens | Position |
+| --- | --- |
+| All six | **Reject** — ROUTE-001/002/003 at 100%; nine-skill freeze holds |
+
+### Round 76 verdict table
+
+| ID | Option | Verdict | Action |
+| --- | --- | --- | --- |
+| DU | Repo template contracts + cross-links | **Agree** | 4 files + pytest anti-drift |
+| DV | Governance pytest mirrors | **Reject** | backlog (DH) |
+| DW | Router/holdout/tenth skill | **Reject** | no change |
+
+**All roles agree.**
+
+## Implemented Changes (Round 76)
+
+- `06-repo/*.md`: Purpose, Scope, Acceptance Criteria, Falsifiability + workflow skill mapping; thinking lens links; Human Review where applicable
+- `plans/tests/test_repo_prompts_eval_harness.py`: structural + 80%+ score floor anti-drift; AGENTS harness-policy guard
+- `plans/tests/test_prompt_cross_links.py`: repo ↔ thinking ↔ skill cross-links
+- `QUALITY_GATES_SUMMARY.md`: repo prompt test mention; pytest floor 400+
+- `PROJECT_KNOWLEDGE.md`: Decision Index Round 76 entry
+- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 76 sync
+
+## Verification (Round 76)
+
+- `make all`: pytest + ROUTE-001/002/003 100%
+
+## Panel status (updated)
+
+**Resealed 2026-06-25** after **Round 76** (options DU–DW). Repository-template contract pass complete; full prompt-library contract sweep (`00-core`–`06-repo`) finished. Governance pytest mirrors remain recurrence-gated.
diff --git a/reflective-prompt-library/plans/tests/test_prompt_cross_links.py b/reflective-prompt-library/plans/tests/test_prompt_cross_links.py
@@ -1,4 +1,4 @@
-"""Anti-drift: thinking lenses, engineering/agent/context/domain prompts, and workflow skills cross-link."""
+"""Anti-drift: thinking lenses, engineering/agent/context/domain/repo prompts, and workflow skills cross-link."""
 
 from pathlib import Path
 
@@ -10,6 +10,7 @@
 AGENT_DIR = LIBRARY_ROOT / "04-agent"
 CONTEXT_DIR = LIBRARY_ROOT / "03-context"
 DOMAIN_DIR = LIBRARY_ROOT / "05-domain"
+REPO_DIR = LIBRARY_ROOT / "06-repo"
 SKILLS_DIR = LIBRARY_ROOT / "skills"
 
 ENGINEERING_THINKING_LINKS: dict[str, tuple[str, ...]] = {
@@ -183,6 +184,39 @@
 
 DOMAIN_PROMPTS = tuple(sorted(DOMAIN_DIR.glob("*.md")))
 
+REPO_THINKING_LINKS: dict[str, tuple[str, ...]] = {
+    "AGENTS.md": (
+        "01-thinking/why-what-how-done.md",
+        "01-thinking/critical-thinking-check.md",
+        "01-thinking/socratic-reviewer.md",
+    ),
+    "cursor-rules.md": (
+        "01-thinking/critical-thinking-check.md",
+        "01-thinking/falsifiability.md",
+    ),
+    "codex-opencode.md": (
+        "01-thinking/why-what-how-done.md",
+        "01-thinking/falsifiability.md",
+    ),
+    "PROJECT_KNOWLEDGE.template.md": (
+        "01-thinking/socratic-reviewer.md",
+        "01-thinking/falsifiability.md",
+    ),
+}
+
+REPO_SKILL_LINKS: dict[str, tuple[str, ...]] = {
+    "AGENTS.md": (
+        "reflective-dispatch",
+        "reflective-implement",
+        "reflective-spec-plan",
+    ),
+    "cursor-rules.md": ("reflective-implement", "reflective-review"),
+    "codex-opencode.md": ("reflective-implement", "reflective-dispatch"),
+    "PROJECT_KNOWLEDGE.template.md": ("reflective-handoff-retro", "reflective-brief"),
+}
+
+REPO_PROMPTS = tuple(sorted(REPO_DIR.glob("*.md")))
+
 CONTEXT_PROMPTS = tuple(sorted(CONTEXT_DIR.glob("*.md")))
 
 
@@ -300,3 +334,26 @@ def test_thinking_lens_files_exist_for_domain_links():
     for ref in linked:
         assert (LIBRARY_ROOT / ref).is_file(), f"missing thinking lens file {ref}"
 
+@pytest.mark.parametrize("prompt_name,thinking_refs", REPO_THINKING_LINKS.items())
+def test_repo_prompt_links_thinking_lens(prompt_name: str, thinking_refs: tuple[str, ...]):
+    path = REPO_DIR / prompt_name
+    preamble = _preamble(path)
+    for ref in thinking_refs:
+        assert ref in preamble, f"{prompt_name} preamble should reference {ref}"
+
+
+def test_all_repo_prompts_have_thinking_cross_link():
+    assert set(REPO_THINKING_LINKS) == {p.name for p in REPO_PROMPTS}
+
+
+@pytest.mark.parametrize("prompt_name,skill_refs", REPO_SKILL_LINKS.items())
+def test_repo_prompt_maps_workflow_skill(prompt_name: str, skill_refs: tuple[str, ...]):
+    preamble = _preamble(REPO_DIR / prompt_name)
+    for skill in skill_refs:
+        assert skill in preamble, f"{prompt_name} preamble should reference {skill}"
+
+
+def test_thinking_lens_files_exist_for_repo_links():
+    linked = {ref for refs in REPO_THINKING_LINKS.values() for ref in refs}
+    for ref in linked:
+        assert (LIBRARY_ROOT / ref).is_file(), f"missing thinking lens file {ref}"
diff --git a/reflective-prompt-library/plans/tests/test_readme_governance.py b/reflective-prompt-library/plans/tests/test_readme_governance.py
@@ -10,8 +10,8 @@
 METHODOLOGY_MAP_EN = Path(__file__).parent.parent.parent / "METHODOLOGY_MAP.md"
 SKILL_MAP = Path(__file__).parent.parent.parent / "skills" / "skill-map.md"
 
-CURRENT_PANEL_ROUND = "75"
-CURRENT_PANEL_OPTIONS = "A–DT"
+CURRENT_PANEL_ROUND = "76"
+CURRENT_PANEL_OPTIONS = "A–DW"
 
 
 @pytest.fixture(scope="module")
diff --git a/reflective-prompt-library/plans/tests/test_repo_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_repo_prompts_eval_harness.py
@@ -0,0 +1,68 @@
+"""Anti-drift: 06-repo templates must satisfy eval_harness structural rubric."""
+
+import sys
+from pathlib import Path
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+from eval_harness import EvalHarness  # noqa: E402
+
+REPO_DIR = Path(__file__).parent.parent.parent / "06-repo"
+REPO_ROOT = str(Path(__file__).parent.parent.parent.parent)
+MIN_SCORE = 80.0
+
+REQUIRED_HEADINGS = (
+    "## Purpose",
+    "## Scope",
+    "## Acceptance Criteria",
+    "## Falsifiability",
+)
+
+REPO_PROMPTS = tuple(sorted(REPO_DIR.glob("*.md")))
+
+
+@pytest.fixture(scope="module")
+def harness() -> EvalHarness:
+    return EvalHarness(repo_root=REPO_ROOT)
+
+
+@pytest.mark.parametrize("prompt_path", REPO_PROMPTS, ids=lambda p: p.name)
+def test_repo_prompt_has_contract_headings(prompt_path: Path):
+    text = prompt_path.read_text(encoding="utf-8")
+    preamble = text.split("```", 1)[0]
+    for heading in REQUIRED_HEADINGS:
+        assert heading in preamble, f"{prompt_path.name} missing {heading} outside template block"
+
+
+@pytest.mark.parametrize("prompt_path", REPO_PROMPTS, ids=lambda p: p.name)
+def test_repo_prompt_meets_eval_harness_floor(prompt_path: Path, harness: EvalHarness):
+    rel = str(prompt_path.relative_to(REPO_ROOT))
+    result = harness.evaluate_file(rel)
+    assert result["score"] >= MIN_SCORE, (
+        f"{prompt_path.name} eval_harness score {result['score']}% < {MIN_SCORE}%: "
+        f"{[(c['id'], c['result']) for c in result['checks']]}"
+    )
+
+
+def test_repo_prompts_reference_workflow_skills():
+    for prompt_path in REPO_PROMPTS:
+        text = prompt_path.read_text(encoding="utf-8")
+        assert "reflective-" in text, f"{prompt_path.name} should map to at least one workflow skill"
+
+
+def test_repo_prompts_cover_harness_surfaces():
+    text = "\n".join(p.read_text(encoding="utf-8") for p in REPO_PROMPTS)
+    for skill in (
+        "reflective-dispatch",
+        "reflective-implement",
+        "reflective-handoff-retro",
+    ):
+        assert skill in text, f"06-repo should reference {skill}"
+
+
+def test_agents_md_retains_harness_policy_section():
+    text = (REPO_DIR / "AGENTS.md").read_text(encoding="utf-8")
+    assert "## Harness Policy (Nine Skills)" in text
+    assert "make all" in text