Round 80: Escalation anti-drift + thinking-lens preamble guards

johnteee · johnteee · commit a48a8baddc20 · 2026-06-25T16:04:04.000+08:00
- Require Escalation in test_skill_module_contract for all nine skills
- Normalize reflective-minimality Module Contract subsection headers
- Add Primary workflow surfaces and consumer-map completeness pytest
- Reseal panel, governance docs, GLOSSARY playbook step 11
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ Full library docs: [reflective-prompt-library/README.md](reflective-prompt-libra
 ## Governance
 
 - **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) — quality gates, routing maintenance (R8–R12), `make all`
-- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–79)
+- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–80)
 - **Operator playbook:** [GLOSSARY.md](reflective-prompt-library/GLOSSARY.md) — Governance Maintenance Playbook
 
 The repository contains:
diff --git a/reflective-prompt-library/GLOSSARY.md b/reflective-prompt-library/GLOSSARY.md
@@ -337,7 +337,7 @@ Curated top-of-cheatsheet summary of high-confusion routing traps (ROUTE-002 hol
 
 ## Governance Maintenance Playbook / 治理維護手冊
 
-Ongoing upkeep after panel close (Rounds 1–79). Not agent instructions — operator checklist.
+Ongoing upkeep after panel close (Rounds 1–80). Not agent instructions — operator checklist.
 
 **Operational test:** Before router tuning, add fresh ROUTE-002/003 holdout phrases; run `make all`; record decisions in `PROJECT_KNOWLEDGE.md` Decision Index when governance surface changes.
 
@@ -351,4 +351,5 @@ Ongoing upkeep after panel close (Rounds 1–79). Not agent instructions — ope
 8. Keep `CONTRIBUTING.md` Routing Maintenance aligned with `ROUTING_CONTRACT.md` R8–R12 when boundaries or cheatsheet parity steps change.
 9. When adding benchmark golden tasks, keep `test_benchmark_covers_all_nine_workflows` green and bump `MIN_TASK_COUNT` in `validate_benchmark_fixture.py` if the floor rises.
 10. When changing thinking-lens ↔ skill cross-links, update `SKILL_THINKING_SOURCES` and consumer lists in `01-thinking/` Purpose preambles; run `test_prompt_cross_links.py` (including reciprocal `THINKING_LENS_SKILL_CONSUMERS`).
+11. When changing Module Contract subsections on workflow skills, keep `Escalation:` present and run `test_skill_module_contract.py`.
 
diff --git a/reflective-prompt-library/PROJECT_KNOWLEDGE.md b/reflective-prompt-library/PROJECT_KNOWLEDGE.md
@@ -75,6 +75,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
 > Pointers to the causal trail — plans, reflections, tests, commits. Detail is
 > not duplicated here; this is a map, not an archive.
 
+- 2026-06-25 Round 80 panel — Module Contract Escalation anti-drift + thinking-lens preamble consumer guards → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 79 panel — bidirectional thinking-lens ↔ workflow skill preamble cross-links + reciprocal pytest → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 78 panel — complete nine-skill thinking-lens cross-links + Module Contract anti-drift → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 
diff --git a/reflective-prompt-library/README.md b/reflective-prompt-library/README.md
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
 
 ## Governance Panel Record
 
-Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–79, options A–EH) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
+Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–80, options A–EL) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
 
 ## Directory Map
 
diff --git a/reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md b/reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md
@@ -314,7 +314,7 @@ ROUTE-002 measures unseen phrasing separately from ROUTE-001. Round 7 (2026-06-2
 2. **ROUTE-001/002/003 in CI** — 128 + 102 + 53 paraphrases at 100% consistency (seeded fixtures); `validate_route_fixture.py` gates minimum coverage
 3. **Governance validators** — links, lint, governance metadata, PROJECT_KNOWLEDGE, benchmark fixture, skill examples
 4. **Harness policy docs** — CONTRIBUTING, AGENTS, SKILL_INSTALLATION, maintenance playbook
-5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (440+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks in `test_prompt_cross_links.py`
+5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 440+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks in `test_prompt_cross_links.py`
 
 ### Ongoing maintenance (not blockers)
 
@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
 - ✅ Benchmark fixture gate plus optional manual benchmark runs
 - ✅ Research-backed design decisions
 
-The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–79; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
+The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–80; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
diff --git a/reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md b/reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md
@@ -2106,5 +2106,48 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
 
 **Resealed 2026-06-25** after **Round 79** (options EE–EH). Bidirectional thinking-lens ↔ workflow skill cross-links complete; reciprocal pytest guards Purpose preambles. Holdout expansion and Escalation subsection anti-drift remain recurrence-gated maintenance.
 
+---
+
+## Round 80 — Escalation subsection anti-drift + thinking-lens preamble guards (2026-06-25)
+
+**Options EI–EL** | Six-lens panel (Opus, Codex, Gemini, Composer, Sakana, GLM)
+
+### Round 80 options
+
+| ID | Proposal | Verdict |
+| --- | --- | --- |
+| EI | Escalation subsection anti-drift for all nine `SKILL.md` Module Contracts + canonical `Escalation:` format on `reflective-minimality` | **Agree** |
+| EJ | Require `Primary workflow surfaces` on all `01-thinking/` lenses + consumer-map completeness pytest | **Agree** |
+| EK | ROUTE holdout expansion | **Defer** |
+| EL | Router / tenth skill / benchmark CI | **Reject** |
+
+### Round 80 verdict table
+
+| ID | Option | Verdict | Action |
+| --- | --- | --- | --- |
+| EI | Escalation anti-drift | **Agree** | `test_skill_module_contract.py` + minimality format |
+| EJ | Thinking-lens preamble guards | **Agree** | `test_thinking_prompts_eval_harness.py` + consumer-map test |
+| EK | Holdout expansion | **Defer** | maintenance |
+| EL | Router/tenth skill/benchmark CI | **Reject** | no change |
+
+**All roles agree.**
+
+## Implemented Changes (Round 80)
+
+- `skills/reflective-minimality/SKILL.md`: canonical `Output:` / `Never:` / `Escalation:` Module Contract subsections
+- `plans/tests/test_skill_module_contract.py`: require `Escalation` alongside Trigger/Methods/Output/Never
+- `plans/tests/test_thinking_prompts_eval_harness.py`: `Primary workflow surfaces` preamble guard
+- `plans/tests/test_prompt_cross_links.py`: `test_all_thinking_lenses_tracked_in_consumer_map`
+- `GLOSSARY.md`: playbook Rounds 1–80; step 11 for Module Contract Escalation upkeep
+- `QUALITY_GATES_SUMMARY.md`: Escalation anti-drift note; panel Rounds 1–80
+- `PROJECT_KNOWLEDGE.md`: Decision Index Round 80 entry
+- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 80 sync
+
+## Verification (Round 80)
+
+- `make all`: pytest + ROUTE-001/002/003 100%
+
+## Panel status (updated)
 
+**Resealed 2026-06-25** after **Round 80** (options EI–EL). Module Contract Escalation anti-drift closed; thinking-lens preamble consumer guards complete. Holdout expansion remains recurrence-gated maintenance.
 
diff --git a/reflective-prompt-library/plans/tests/test_glossary_structure.py b/reflective-prompt-library/plans/tests/test_glossary_structure.py
@@ -30,10 +30,10 @@ def test_round_boundary_terms_present(glossary_text: str):
         assert heading in glossary_text, f"missing glossary section: {heading}"
 
 
-def test_maintenance_playbook_references_round_79(glossary_text: str):
+def test_maintenance_playbook_references_round_80(glossary_text: str):
     playbook = glossary_text.split("## Governance Maintenance Playbook", 1)[1]
-    assert "Rounds 1–79" in playbook or "Rounds 1-79" in playbook
-    assert "Rounds 1–78" not in playbook and "Rounds 1-78" not in playbook
+    assert "Rounds 1–80" in playbook or "Rounds 1-79" in playbook
+    assert "Rounds 1–79" not in playbook and "Rounds 1-78" not in playbook
 
 
 
diff --git a/reflective-prompt-library/plans/tests/test_prompt_cross_links.py b/reflective-prompt-library/plans/tests/test_prompt_cross_links.py
@@ -288,6 +288,11 @@ def _invert_skill_thinking_sources() -> dict[str, tuple[str, ...]]:
 
 THINKING_LENS_SKILL_CONSUMERS = _invert_skill_thinking_sources()
 
+def test_all_thinking_lenses_tracked_in_consumer_map():
+    """Every 01-thinking file cited by a skill must appear in the reciprocal map."""
+    expected = {f"01-thinking/{path.name}" for path in THINKING_PROMPTS}
+    assert set(THINKING_LENS_SKILL_CONSUMERS) == expected
+
 
 @pytest.mark.parametrize("lens_ref,consumer_skills", THINKING_LENS_SKILL_CONSUMERS.items())
 def test_thinking_lens_preamble_lists_consumer_skills(lens_ref: str, consumer_skills: tuple[str, ...]):
diff --git a/reflective-prompt-library/plans/tests/test_readme_governance.py b/reflective-prompt-library/plans/tests/test_readme_governance.py
@@ -10,8 +10,8 @@
 METHODOLOGY_MAP_EN = Path(__file__).parent.parent.parent / "METHODOLOGY_MAP.md"
 SKILL_MAP = Path(__file__).parent.parent.parent / "skills" / "skill-map.md"
 
-CURRENT_PANEL_ROUND = "79"
-CURRENT_PANEL_OPTIONS = "A–EH"
+CURRENT_PANEL_ROUND = "80"
+CURRENT_PANEL_OPTIONS = "A–EL"
 
 
 @pytest.fixture(scope="module")
diff --git a/reflective-prompt-library/plans/tests/test_skill_module_contract.py b/reflective-prompt-library/plans/tests/test_skill_module_contract.py
@@ -12,7 +12,7 @@
 
 SKILLS_DIR = Path(__file__).parent.parent.parent / "skills"
 
-REQUIRED_SUBSECTIONS = ("Trigger", "Methods", "Output", "Never")
+REQUIRED_SUBSECTIONS = ("Trigger", "Methods", "Output", "Never", "Escalation")
 
 
 @pytest.mark.parametrize("skill_name", CORE_SKILLS)
diff --git a/reflective-prompt-library/plans/tests/test_thinking_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_thinking_prompts_eval_harness.py
@@ -50,3 +50,12 @@ def test_thinking_prompts_reference_workflow_skills():
     for prompt_path in THINKING_PROMPTS:
         text = prompt_path.read_text(encoding="utf-8")
         assert "reflective-" in text, f"{prompt_path.name} should map to at least one workflow skill"
+
+def test_thinking_prompts_have_primary_workflow_surfaces_line():
+    """All 01-thinking lenses name consumer workflow skills in Purpose preambles."""
+    for prompt_path in THINKING_PROMPTS:
+        preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
+        assert "Primary workflow surfaces" in preamble, (
+            f"{prompt_path.name} Purpose should list Primary workflow surfaces"
+        )
+
diff --git a/reflective-prompt-library/skills/reflective-minimality/SKILL.md b/reflective-prompt-library/skills/reflective-minimality/SKILL.md
@@ -39,7 +39,7 @@ Methods:
 - Complexity audit: scan for one-implementation abstractions, avoidable dependencies, wrapper-only delegation, dead flags, and hand-rolled standard library behavior.
 - Runnable check: keep one minimal check for non-trivial logic.
 
-### Output
+Output:
 
 - Minimality decision: skip, delete, reuse, shrink, or implement minimum.
 - Cut list: unnecessary files, abstractions, dependencies, flags, wrappers, or prose.
@@ -49,15 +49,15 @@ Methods:
 - Debt markers: only when an intentional shortcut has a known ceiling and upgrade trigger.
 - Debt ledger: grouped marker list with no-trigger risks when requested.
 
-### Never
+Never:
 
 - Do not use minimality to avoid explicit acceptance criteria.
 - Do not remove trust-boundary validation, auth, privacy, security, data-loss prevention, required accessibility, compatibility constraints, or required tests.
 - Do not add a new dependency when standard library, platform-native behavior, existing dependency, or small local code is enough.
 - Do not add an abstraction for one implementation, a factory for one product, or config for a value that does not vary.
 - Do not mark a shortcut without both a ceiling and an observable upgrade trigger.
 
-### Escalation
+Escalation:
 
 - If the goal or acceptance criteria are unclear, route to `reflective-brief` or `reflective-spec-plan`.
 - If simplification touches high-risk behavior, route to `reflective-risk`.