Round 81: thinking-lens Human Review + Escalation route guards

johnteee · johnteee · commit 0f09da31619d · 2026-06-25T16:07:15.000+08:00
- Add Human Review preambles to socratic-reviewer and why-what-how-done
- Guard all 01-thinking lenses with Human Review pytest
- Validate Escalation cites only CORE_SKILLS (reflective-risk terminal exempt)
- Reseal panel, governance docs, GLOSSARY steps 12-13; pytest floor 450+
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ Full library docs: [reflective-prompt-library/README.md](reflective-prompt-libra
 ## Governance
 
 - **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) — quality gates, routing maintenance (R8–R12), `make all`
-- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–80)
+- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–81)
 - **Operator playbook:** [GLOSSARY.md](reflective-prompt-library/GLOSSARY.md) — Governance Maintenance Playbook
 
 The repository contains:
diff --git a/reflective-prompt-library/01-thinking/socratic-reviewer.md b/reflective-prompt-library/01-thinking/socratic-reviewer.md
@@ -22,6 +22,10 @@ Clarify the real question before choosing a direction. Primary workflow surfaces
 
 If the session cannot name what evidence would prove the current framing wrong, stop and return to Clarify instead of recommending action.
 
+## Human Review
+
+Escalate to `reflective-risk` with an explicit Human Review gate when the work implies irreversible or high-blast-radius action.
+
 ```markdown
 你是 Socratic Questioner。你的目標不是立刻給答案，而是幫我逼近真正問題。
 
diff --git a/reflective-prompt-library/01-thinking/why-what-how-done.md b/reflective-prompt-library/01-thinking/why-what-how-done.md
@@ -22,6 +22,10 @@ Gate a task through Why / What / How / Done before choosing strictness or workfl
 
 Done gate must name evidence that would prove the task should not proceed or should be rolled back.
 
+## Human Review
+
+Escalate to `reflective-risk` with an explicit Human Review gate when the work implies irreversible or high-blast-radius action.
+
 ```markdown
 請把任務通過 Why / What / How / Done 四層檢查。
 
diff --git a/reflective-prompt-library/GLOSSARY.md b/reflective-prompt-library/GLOSSARY.md
@@ -337,7 +337,7 @@ Curated top-of-cheatsheet summary of high-confusion routing traps (ROUTE-002 hol
 
 ## Governance Maintenance Playbook / 治理維護手冊
 
-Ongoing upkeep after panel close (Rounds 1–80). Not agent instructions — operator checklist.
+Ongoing upkeep after panel close (Rounds 1–81). Not agent instructions — operator checklist.
 
 **Operational test:** Before router tuning, add fresh ROUTE-002/003 holdout phrases; run `make all`; record decisions in `PROJECT_KNOWLEDGE.md` Decision Index when governance surface changes.
 
@@ -352,4 +352,5 @@ Ongoing upkeep after panel close (Rounds 1–80). Not agent instructions — ope
 9. When adding benchmark golden tasks, keep `test_benchmark_covers_all_nine_workflows` green and bump `MIN_TASK_COUNT` in `validate_benchmark_fixture.py` if the floor rises.
 10. When changing thinking-lens ↔ skill cross-links, update `SKILL_THINKING_SOURCES` and consumer lists in `01-thinking/` Purpose preambles; run `test_prompt_cross_links.py` (including reciprocal `THINKING_LENS_SKILL_CONSUMERS`).
 11. When changing Module Contract subsections on workflow skills, keep `Escalation:` present and run `test_skill_module_contract.py`.
-
+12. When adding or editing `01-thinking/` lenses, keep `## Human Review` in the preamble (routes to `reflective-risk`) and run `test_thinking_prompts_eval_harness.py`.
+13. When editing workflow skill Escalation bullets, cite only frozen `reflective-*` skills; run `test_skill_module_contract.py` escalation route guard.
diff --git a/reflective-prompt-library/PROJECT_KNOWLEDGE.md b/reflective-prompt-library/PROJECT_KNOWLEDGE.md
@@ -75,6 +75,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
 > Pointers to the causal trail — plans, reflections, tests, commits. Detail is
 > not duplicated here; this is a map, not an archive.
 
+- 2026-06-25 Round 81 panel — thinking-lens Human Review preamble guards + Escalation route-target anti-drift → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 80 panel — Module Contract Escalation anti-drift + thinking-lens preamble consumer guards → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 79 panel — bidirectional thinking-lens ↔ workflow skill preamble cross-links + reciprocal pytest → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 78 panel — complete nine-skill thinking-lens cross-links + Module Contract anti-drift → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
diff --git a/reflective-prompt-library/README.md b/reflective-prompt-library/README.md
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
 
 ## Governance Panel Record
 
-Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–80, options A–EL) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
+Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–81, options A–EP) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
 
 ## Directory Map
 
diff --git a/reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md b/reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md
@@ -314,7 +314,7 @@ ROUTE-002 measures unseen phrasing separately from ROUTE-001. Round 7 (2026-06-2
 2. **ROUTE-001/002/003 in CI** — 128 + 102 + 53 paraphrases at 100% consistency (seeded fixtures); `validate_route_fixture.py` gates minimum coverage
 3. **Governance validators** — links, lint, governance metadata, PROJECT_KNOWLEDGE, benchmark fixture, skill examples
 4. **Harness policy docs** — CONTRIBUTING, AGENTS, SKILL_INSTALLATION, maintenance playbook
-5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 440+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks in `test_prompt_cross_links.py`
+5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 450+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks in `test_prompt_cross_links.py`; Human Review + Escalation route-target guards in thinking/skill contract tests
 
 ### Ongoing maintenance (not blockers)
 
@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
 - ✅ Benchmark fixture gate plus optional manual benchmark runs
 - ✅ Research-backed design decisions
 
-The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–80; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
+The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–81; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
diff --git a/reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md b/reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md
@@ -2151,3 +2151,48 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
 
 **Resealed 2026-06-25** after **Round 80** (options EI–EL). Module Contract Escalation anti-drift closed; thinking-lens preamble consumer guards complete. Holdout expansion remains recurrence-gated maintenance.
 
+---
+
+## Round 81 — Thinking-lens Human Review + Escalation route-target guards (2026-06-25)
+
+**Options EM–EP** | Six-lens panel (Opus, Codex, Gemini, Composer, Sakana, GLM)
+
+### Round 81 options
+
+| ID | Proposal | Verdict |
+| --- | --- | --- |
+| EM | `## Human Review` preamble on all `01-thinking/` lenses + pytest | **Agree** |
+| EN | Escalation route-target anti-drift (`reflective-*` cites only `CORE_SKILLS`; terminal `reflective-risk` exempt) | **Agree** |
+| EO | ROUTE holdout expansion | **Defer** |
+| EP | Router / tenth skill / benchmark CI | **Reject** |
+
+### Round 81 verdict table
+
+| ID | Option | Verdict | Action |
+| --- | --- | --- | --- |
+| EM | Human Review on thinking lenses | **Agree** | preamble + `test_thinking_prompt_has_human_review_section` |
+| EN | Escalation route targets | **Agree** | `test_core_skill_escalation_routes_to_valid_workflow_skills` |
+| EO | Holdout expansion | **Defer** | maintenance |
+| EP | Router/tenth skill/benchmark CI | **Reject** | no change |
+
+**All roles agree.**
+
+## Implemented Changes (Round 81)
+
+- `01-thinking/socratic-reviewer.md`, `why-what-how-done.md`: `## Human Review` preamble routes to `reflective-risk`
+- `plans/tests/test_thinking_prompts_eval_harness.py`: Human Review preamble guard on all five lenses
+- `plans/tests/test_skill_module_contract.py`: Escalation route-target guard; `reflective-risk` terminal-gate exemption
+- `GLOSSARY.md`: playbook Rounds 1–81; steps 12–13 for Human Review + Escalation route targets
+- `QUALITY_GATES_SUMMARY.md`: 450+ pytest floor; Human Review / Escalation route notes; panel Rounds 1–81
+- `PROJECT_KNOWLEDGE.md`: Decision Index Round 81 entry
+- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 81 sync
+
+## Verification (Round 81)
+
+- `make all`: pytest + ROUTE-001/002/003 100%
+
+## Panel status (updated)
+
+**Resealed 2026-06-25** after **Round 81** (options EM–EP). Thinking-lens Human Review preambles complete; Escalation route-target anti-drift closed. Holdout expansion remains recurrence-gated maintenance.
+
+
diff --git a/reflective-prompt-library/plans/tests/test_glossary_structure.py b/reflective-prompt-library/plans/tests/test_glossary_structure.py
@@ -30,10 +30,10 @@ def test_round_boundary_terms_present(glossary_text: str):
         assert heading in glossary_text, f"missing glossary section: {heading}"
 
 
-def test_maintenance_playbook_references_round_80(glossary_text: str):
+def test_maintenance_playbook_references_round_81(glossary_text: str):
     playbook = glossary_text.split("## Governance Maintenance Playbook", 1)[1]
-    assert "Rounds 1–80" in playbook or "Rounds 1-79" in playbook
-    assert "Rounds 1–79" not in playbook and "Rounds 1-78" not in playbook
+    assert "Rounds 1–81" in playbook or "Rounds 1-80" in playbook
+    assert "Rounds 1–80" not in playbook and "Rounds 1-79" not in playbook
 
 
 
diff --git a/reflective-prompt-library/plans/tests/test_readme_governance.py b/reflective-prompt-library/plans/tests/test_readme_governance.py
@@ -10,8 +10,8 @@
 METHODOLOGY_MAP_EN = Path(__file__).parent.parent.parent / "METHODOLOGY_MAP.md"
 SKILL_MAP = Path(__file__).parent.parent.parent / "skills" / "skill-map.md"
 
-CURRENT_PANEL_ROUND = "80"
-CURRENT_PANEL_OPTIONS = "A–EL"
+CURRENT_PANEL_ROUND = "81"
+CURRENT_PANEL_OPTIONS = "A–EP"
 
 
 @pytest.fixture(scope="module")
diff --git a/reflective-prompt-library/plans/tests/test_skill_module_contract.py b/reflective-prompt-library/plans/tests/test_skill_module_contract.py
@@ -14,6 +14,29 @@
 
 REQUIRED_SUBSECTIONS = ("Trigger", "Methods", "Output", "Never", "Escalation")
 
+ESCALATION_SKILL_PATTERN = re.compile(r"`(reflective-[a-z-]+)`")
+
+
+def _module_contract_block(content: str) -> str:
+    marker = "## Module Contract"
+    assert marker in content
+    block = content.split(marker, 1)[1]
+    next_heading = re.search(r"\n## [^#]", block)
+    return block[: next_heading.start()] if next_heading else block
+
+
+def _escalation_section(module_contract: str) -> str:
+    match = re.search(
+        r"(?:##\s*Escalation|^\*?\*?Escalation:)(.*)",
+        module_contract,
+        re.MULTILINE | re.IGNORECASE | re.DOTALL,
+    )
+    assert match, "missing Escalation subsection"
+    tail = match.group(1)
+    next_sub = re.search(r"\n(?:##\s+\S|###\s+\S)", tail)
+    return tail[: next_sub.start()] if next_sub else tail
+
+
 
 @pytest.mark.parametrize("skill_name", CORE_SKILLS)
 def test_core_skill_has_module_contract(skill_name: str):
@@ -29,3 +52,19 @@ def test_core_skill_has_contract_subsections(skill_name: str):
         assert re.search(pattern, content, re.MULTILINE | re.IGNORECASE), (
             f"{skill_name} missing {subsection}"
         )
+
+@pytest.mark.parametrize("skill_name", CORE_SKILLS)
+def test_core_skill_escalation_routes_to_valid_workflow_skills(skill_name: str):
+    """Escalation bullets must name frozen workflow skills, not invented routes."""
+    content = (SKILLS_DIR / skill_name / "SKILL.md").read_text(encoding="utf-8")
+    escalation = _escalation_section(_module_contract_block(content))
+    targets = ESCALATION_SKILL_PATTERN.findall(escalation)
+    if skill_name == "reflective-risk":
+        assert "Human Review" in escalation, (
+            f"{skill_name} Escalation should require Human Review as terminal gate"
+        )
+        return
+    assert targets, f"{skill_name} Escalation should cite at least one workflow skill"
+    invalid = sorted({t for t in targets if t not in CORE_SKILLS})
+    assert not invalid, f"{skill_name} Escalation cites unknown skills: {invalid}"
+
diff --git a/reflective-prompt-library/plans/tests/test_thinking_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_thinking_prompts_eval_harness.py
@@ -59,3 +59,12 @@ def test_thinking_prompts_have_primary_workflow_surfaces_line():
             f"{prompt_path.name} Purpose should list Primary workflow surfaces"
         )
 
+@pytest.mark.parametrize("prompt_path", THINKING_PROMPTS, ids=lambda p: p.name)
+def test_thinking_prompt_has_human_review_section(prompt_path: Path):
+    """All 01-thinking lenses declare Human Review escalation outside zh-TW templates."""
+    preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
+    assert "## Human Review" in preamble, f"{prompt_path.name} missing Human Review preamble"
+    assert "reflective-risk" in preamble, (
+        f"{prompt_path.name} Human Review should route to reflective-risk"
+    )
+