Round 85: composable Primary workflow surface preamble guards

johnteee · johnteee · commit 2b3de127e67c · 2026-06-25T16:29:45.000+08:00
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ Full library docs: [reflective-prompt-library/README.md](reflective-prompt-libra
 ## Governance
 
 - **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) — quality gates, routing maintenance (R8–R12), `make all`
-- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–84)
+- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–85)
 - **Operator playbook:** [GLOSSARY.md](reflective-prompt-library/GLOSSARY.md) — Governance Maintenance Playbook
 
 The repository contains:
diff --git a/reflective-prompt-library/GLOSSARY.md b/reflective-prompt-library/GLOSSARY.md
@@ -337,7 +337,7 @@ Curated top-of-cheatsheet summary of high-confusion routing traps (ROUTE-002 hol
 
 ## Governance Maintenance Playbook / 治理維護手冊
 
-Ongoing upkeep after panel close (Rounds 1–84). Not agent instructions — operator checklist.
+Ongoing upkeep after panel close (Rounds 1–85). Not agent instructions — operator checklist.
 
 **Operational test:** Before router tuning, add fresh ROUTE-002/003 holdout phrases; run `make all`; record decisions in `PROJECT_KNOWLEDGE.md` Decision Index when governance surface changes.
 
diff --git a/reflective-prompt-library/PROJECT_KNOWLEDGE.md b/reflective-prompt-library/PROJECT_KNOWLEDGE.md
@@ -72,6 +72,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
 
 ## Decision Index
 
+- 2026-06-25 Round 85 panel — composable prompt Primary workflow surface preamble guards (`test_*_prompts_eval_harness.py`) + Supporting-lens exemption → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 84 panel — `00-core` Primary workflow surface parity + primary-line trim → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 83 panel — composable prompt Primary workflow surface parity (`02-engineering`–`06-repo`) + supporting-lens exemption → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 > Pointers to the causal trail — plans, reflections, tests, commits. Detail is
diff --git a/reflective-prompt-library/README.md b/reflective-prompt-library/README.md
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
 
 ## Governance Panel Record
 
-Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–84, options A–FB) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
+Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–85, options A–FF) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
 
 ## Directory Map
 
diff --git a/reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md b/reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md
@@ -314,7 +314,7 @@ ROUTE-002 measures unseen phrasing separately from ROUTE-001. Round 7 (2026-06-2
 2. **ROUTE-001/002/003 in CI** — 128 + 102 + 53 paraphrases at 100% consistency (seeded fixtures); `validate_route_fixture.py` gates minimum coverage
 3. **Governance validators** — links, lint, governance metadata, PROJECT_KNOWLEDGE, benchmark fixture, skill examples
 4. **Harness policy docs** — CONTRIBUTING, AGENTS, SKILL_INSTALLATION, maintenance playbook
-5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 520+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests
+5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 530+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards in `test_*_prompts_eval_harness.py`
 
 ### Ongoing maintenance (not blockers)
 
@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
 - ✅ Benchmark fixture gate plus optional manual benchmark runs
 - ✅ Research-backed design decisions
 
-The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–84; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
+The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–85; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
diff --git a/reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md b/reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md
@@ -2321,8 +2321,46 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
 
 - `make all`: pytest + ROUTE-001/002/003 100%
 
-## Panel status (updated)
+---
+
+## Round 85 — Composable prompt Primary workflow surface preamble guards (2026-06-25)
+
+**Options FC–FF** | Six-lens panel (Opus, Codex, Gemini, Composer, Sakana, GLM)
+
+### Round 85 options
+
+| ID | Proposal | Verdict |
+| --- | --- | --- |
+| FC | `Primary workflow surface(s)` / Supporting-lens preamble guards in all composable `test_*_prompts_eval_harness.py` files | **Agree** |
+| FD | GLOSSARY playbook step 17 + governance sync | **Agree** |
+| FE | ROUTE holdout expansion | **Defer** |
+| FF | Router / tenth skill / benchmark CI | **Reject** |
+
+### Round 85 verdict table
+
+| ID | Option | Verdict | Action |
+| --- | --- | --- | --- |
+| FC | Composable preamble guards | **Agree** | mirror `test_thinking_prompts_eval_harness.py`; Supporting-lens exemption for `runtime-trust-boundary.md` |
+| FD | Playbook + docs | **Agree** | step 17; panel round 85 sync |
+| FE | Holdout expansion | **Defer** | maintenance |
+| FF | Router/tenth skill/benchmark CI | **Reject** | no change |
+
+**All roles agree.**
+
+## Implemented Changes (Round 85)
 
-**Resealed 2026-06-25** after **Round 84** (options EY–FB). `00-core` Primary workflow surface lines now match `CORE_SKILL_LINKS` exactly; full prompt library (`00-core`–`06-repo` + `01-thinking` graph parity) closed. Holdout expansion remains recurrence-gated maintenance.
+- `plans/tests/test_core_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`: Primary workflow surface preamble guard
+- `plans/tests/test_agent_prompts_eval_harness.py`: Primary vs Supporting-lens preamble guard (`runtime-trust-boundary.md` exemption)
+- `GLOSSARY.md`: playbook Rounds 1–85; step 17 for composable preamble guards
+- `QUALITY_GATES_SUMMARY.md`: preamble guard note; panel Rounds 1–85; 530+ pytest floor
+- `PROJECT_KNOWLEDGE.md`: Decision Index Round 85 entry
+- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 85 sync
+
+## Verification (Round 85)
+
+- `make all`: pytest + ROUTE-001/002/003 100%
+
+## Panel status (updated)
 
+**Resealed 2026-06-25** after **Round 85** (options FC–FF). Composable prompts now have eval_harness preamble guards matching thinking-lens pattern; full library parity (graph + preamble) closed. Holdout expansion remains recurrence-gated maintenance.
 
diff --git a/reflective-prompt-library/plans/tests/test_agent_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_agent_prompts_eval_harness.py
@@ -21,6 +21,7 @@
 )
 
 AGENT_PROMPTS = tuple(sorted(AGENT_DIR.glob("*.md")))
+SUPPORTING_LENS_AGENT_PROMPTS = frozenset({"runtime-trust-boundary.md"})
 
 
 @pytest.fixture(scope="module")
@@ -62,3 +63,17 @@ def test_agent_prompts_cover_agent_workflow_surfaces():
         "reflective-research",
     ):
         assert skill in text, f"04-agent should reference {skill}"
+
+def test_agent_prompts_have_workflow_surface_preamble_line():
+    """04-agent prompts use Primary workflow surface(s) or Supporting lens (trust boundary)."""
+    for prompt_path in AGENT_PROMPTS:
+        preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
+        if prompt_path.name in SUPPORTING_LENS_AGENT_PROMPTS:
+            assert "Supporting lens for" in preamble, (
+                f"{prompt_path.name} Purpose should use Supporting lens for workflow skills"
+            )
+        else:
+            assert "Primary workflow surface" in preamble, (
+                f"{prompt_path.name} Purpose should list Primary workflow surface(s)"
+            )
+
diff --git a/reflective-prompt-library/plans/tests/test_context_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_context_prompts_eval_harness.py
@@ -61,3 +61,12 @@ def test_context_prompts_cover_context_workflow_surfaces():
         "reflective-research",
     ):
         assert skill in text, f"03-context should reference {skill}"
+
+def test_context_prompts_have_primary_workflow_surfaces_line():
+    """All 03-context prompts declare Primary workflow surface(s) in Purpose preambles."""
+    for prompt_path in CONTEXT_PROMPTS:
+        preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
+        assert "Primary workflow surface" in preamble, (
+            f"{prompt_path.name} Purpose should list Primary workflow surface(s)"
+        )
+
diff --git a/reflective-prompt-library/plans/tests/test_core_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_core_prompts_eval_harness.py
@@ -56,3 +56,12 @@ def test_core_prompts_cover_brief_and_dispatch():
     text = "\n".join(p.read_text(encoding="utf-8") for p in CORE_PROMPTS)
     assert "reflective-brief" in text
     assert "reflective-dispatch" in text
+
+
+def test_core_prompts_have_primary_workflow_surfaces_line():
+    """All 00-core prompts declare Primary workflow surface(s) in Purpose preambles."""
+    for prompt_path in CORE_PROMPTS:
+        preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
+        assert "Primary workflow surface" in preamble, (
+            f"{prompt_path.name} Purpose should list Primary workflow surface(s)"
+        )
diff --git a/reflective-prompt-library/plans/tests/test_domain_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_domain_prompts_eval_harness.py
@@ -68,3 +68,12 @@ def test_high_risk_prompt_has_human_review_section():
     text = (DOMAIN_DIR / "high-risk.md").read_text(encoding="utf-8")
     preamble = text.split("```", 1)[0]
     assert "## Human Review" in preamble, "high-risk.md preamble should include Human Review"
+
+def test_domain_prompts_have_primary_workflow_surfaces_line():
+    """All 05-domain prompts declare Primary workflow surface(s) in Purpose preambles."""
+    for prompt_path in DOMAIN_PROMPTS:
+        preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
+        assert "Primary workflow surface" in preamble, (
+            f"{prompt_path.name} Purpose should list Primary workflow surface(s)"
+        )
+
diff --git a/reflective-prompt-library/plans/tests/test_engineering_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_engineering_prompts_eval_harness.py
@@ -62,3 +62,12 @@ def test_engineering_prompts_cover_core_workflows():
         "reflective-review",
     ):
         assert skill in text, f"02-engineering should reference {skill}"
+
+def test_engineering_prompts_have_primary_workflow_surfaces_line():
+    """All 02-engineering prompts declare Primary workflow surface(s) in Purpose preambles."""
+    for prompt_path in ENGINEERING_PROMPTS:
+        preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
+        assert "Primary workflow surface" in preamble, (
+            f"{prompt_path.name} Purpose should list Primary workflow surface(s)"
+        )
+
diff --git a/reflective-prompt-library/plans/tests/test_glossary_structure.py b/reflective-prompt-library/plans/tests/test_glossary_structure.py
@@ -30,10 +30,10 @@ def test_round_boundary_terms_present(glossary_text: str):
         assert heading in glossary_text, f"missing glossary section: {heading}"
 
 
-def test_maintenance_playbook_references_round_84(glossary_text: str):
+def test_maintenance_playbook_references_round_85(glossary_text: str):
     playbook = glossary_text.split("## Governance Maintenance Playbook", 1)[1]
-    assert "Rounds 1–84" in playbook or "Rounds 1-83" in playbook
-    assert "Rounds 1–83" not in playbook and "Rounds 1-82" not in playbook
+    assert "Rounds 1–85" in playbook or "Rounds 1-84" in playbook
+    assert "Rounds 1–84" not in playbook and "Rounds 1-83" not in playbook
 
 
 
diff --git a/reflective-prompt-library/plans/tests/test_readme_governance.py b/reflective-prompt-library/plans/tests/test_readme_governance.py
@@ -10,8 +10,8 @@
 METHODOLOGY_MAP_EN = Path(__file__).parent.parent.parent / "METHODOLOGY_MAP.md"
 SKILL_MAP = Path(__file__).parent.parent.parent / "skills" / "skill-map.md"
 
-CURRENT_PANEL_ROUND = "84"
-CURRENT_PANEL_OPTIONS = "A–FB"
+CURRENT_PANEL_ROUND = "85"
+CURRENT_PANEL_OPTIONS = "A–FF"
 
 
 @pytest.fixture(scope="module")
diff --git a/reflective-prompt-library/plans/tests/test_repo_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_repo_prompts_eval_harness.py
@@ -66,3 +66,12 @@ def test_agents_md_retains_harness_policy_section():
     text = (REPO_DIR / "AGENTS.md").read_text(encoding="utf-8")
     assert "## Harness Policy (Nine Skills)" in text
     assert "make all" in text
+
+def test_repo_prompts_have_primary_workflow_surfaces_line():
+    """All 06-repo prompts declare Primary workflow surface(s) in Purpose preambles."""
+    for prompt_path in REPO_PROMPTS:
+        preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
+        assert "Primary workflow surface" in preamble, (
+            f"{prompt_path.name} Purpose should list Primary workflow surface(s)"
+        )
+