Round 91: cross-category Human Review library registry

johnteee · johnteee · commit 0f799dbd89bb · 2026-06-25T16:59:12.000+08:00
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ Full library docs: [reflective-prompt-library/README.md](reflective-prompt-libra
 ## Governance
 
 - **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) — quality gates, routing maintenance (R8–R12), `make all`
-- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–90)
+- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–91)
 - **Operator playbook:** [GLOSSARY.md](reflective-prompt-library/GLOSSARY.md) — Governance Maintenance Playbook
 
 The repository contains:
diff --git a/reflective-prompt-library/GLOSSARY.md b/reflective-prompt-library/GLOSSARY.md
@@ -337,7 +337,7 @@ Curated top-of-cheatsheet summary of high-confusion routing traps (ROUTE-002 hol
 
 ## Governance Maintenance Playbook / 治理維護手冊
 
-Ongoing upkeep after panel close (Rounds 1–90). Not agent instructions — operator checklist.
+Ongoing upkeep after panel close (Rounds 1–91). Not agent instructions — operator checklist.
 
 **Operational test:** Before router tuning, add fresh ROUTE-002/003 holdout phrases; run `make all`; record decisions in `PROJECT_KNOWLEDGE.md` Decision Index when governance surface changes.
 
@@ -363,3 +363,4 @@ Ongoing upkeep after panel close (Rounds 1–90). Not agent instructions — ope
 20. When adding or editing risk-bearing `00-core/` prompts with `## Human Review`, keep preamble escalation routed to `reflective-risk` and run `test_core_prompts_eval_harness.py` Human Review guards via `prompt_eval_helpers.py`.
 21. When editing `00-core/` Human Review coverage, keep `CORE_HUMAN_REVIEW_REQUIRED` and `CORE_HUMAN_REVIEW_EXEMPT` in `test_core_prompts_eval_harness.py` aligned with preamble `## Human Review` sections; run core HR parity tests.
 22. When editing Human Review coverage on thinking lenses or composable prompts (`01-thinking`–`06-repo`), keep frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` sets in `test_*_prompts_eval_harness.py` aligned with preamble `## Human Review` sections; use `prompt_eval_helpers.assert_human_review_*` parity helpers and run HR set partition tests.
+23. When adding composable prompts or new categories, keep `PROMPT_LIBRARY_CATEGORIES` and `test_human_review_library_registry.py` aligned so frozen HR sets cover every `00-core`–`06-repo` prompt exactly once.
diff --git a/reflective-prompt-library/PROJECT_KNOWLEDGE.md b/reflective-prompt-library/PROJECT_KNOWLEDGE.md
@@ -73,6 +73,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
 ## Decision Index
 
 - 2026-06-25 Round 85 panel — composable prompt Primary workflow surface preamble guards (`test_*_prompts_eval_harness.py`) + Supporting-lens exemption → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
+- 2026-06-25 Round 91 panel — cross-category Human Review library registry (`test_human_review_library_registry.py`, `PROMPT_LIBRARY_CATEGORIES`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 90 panel — library-wide Human Review required/exempt set parity (`01-thinking`–`06-repo`) + DRY `prompt_eval_helpers` HR set guards → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 89 panel — `00-core` Human Review required/exempt set parity (`CORE_HUMAN_REVIEW_REQUIRED` / `CORE_HUMAN_REVIEW_EXEMPT`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 88 panel — `00-core` Human Review preamble guards on risk-bearing prompts + `test_core_prompts_eval_harness.py` → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
diff --git a/reflective-prompt-library/README.md b/reflective-prompt-library/README.md
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
 
 ## Governance Panel Record
 
-Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–90, options A–GB) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
+Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–91, options A–GE) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
 
 ## Directory Map
 
diff --git a/reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md b/reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md
@@ -314,7 +314,7 @@ ROUTE-002 measures unseen phrasing separately from ROUTE-001. Round 7 (2026-06-2
 2. **ROUTE-001/002/003 in CI** — 128 + 102 + 53 paraphrases at 100% consistency (seeded fixtures); `validate_route_fixture.py` gates minimum coverage
 3. **Governance validators** — links, lint, governance metadata, PROJECT_KNOWLEDGE, benchmark fixture, skill examples
 4. **Harness policy docs** — CONTRIBUTING, AGENTS, SKILL_INSTALLATION, maintenance playbook
-5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 580+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90)
+5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 590+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90)
 
 ### Ongoing maintenance (not blockers)
 
@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
 - ✅ Benchmark fixture gate plus optional manual benchmark runs
 - ✅ Research-backed design decisions
 
-The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–90; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
+The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–91; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
diff --git a/reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md b/reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md
@@ -2586,3 +2586,49 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
 
 **Resealed 2026-06-25** after **Round 90** (options FX–GB). Human Review coverage is now explicit via frozen required/exempt sets across all prompt categories (`00-core`–`06-repo`). Holdout expansion remains recurrence-gated maintenance.
 
+---
+
+## Round 91 — cross-category Human Review library registry (2026-06-25)
+
+**Options GC–GG** | Six-lens panel (Opus, Codex, Gemini, Composer, Sakana, GLM)
+
+### Round 91 options
+
+| ID | Proposal | Verdict |
+| --- | --- | --- |
+| GC | `PROMPT_LIBRARY_CATEGORIES` + `test_human_review_library_registry.py` cross-category HR registry pytest | **Agree** |
+| GD | Remove duplicate `*_PROMPTS_WITH_HUMAN_REVIEW` assignments in composable harness files | **Agree** |
+| GE | GLOSSARY playbook step 23 + governance sync | **Agree** |
+| GF | ROUTE holdout expansion | **Defer** |
+| GG | Router / tenth skill / benchmark CI | **Reject** |
+
+### Round 91 verdict table
+
+| ID | Option | Verdict | Action |
+| --- | --- | --- | --- |
+| GC | HR library registry | **Agree** | `PROMPT_LIBRARY_CATEGORIES`; registry imports all frozen HR sets; library glob parity |
+| GD | Harness dedupe | **Agree** | drop duplicate `prompts_with_human_review` lines |
+| GE | Playbook + docs | **Agree** | step 23; panel round 91 sync |
+| GF | Holdout expansion | **Defer** | maintenance |
+| GG | Router/tenth skill/benchmark CI | **Reject** | no change |
+
+**All roles agree.**
+
+## Implemented Changes (Round 91)
+
+- `plans/tests/prompt_eval_helpers.py`: `PROMPT_LIBRARY_CATEGORIES` tuple
+- `plans/tests/test_human_review_library_registry.py`: cross-category HR registry + library glob parity
+- `plans/tests/test_{agent,context,domain,engineering,repo}_prompts_eval_harness.py`: dedupe duplicate HR prompt tuples
+- `GLOSSARY.md`: playbook Rounds 1–91; step 23 for HR library registry
+- `QUALITY_GATES_SUMMARY.md`: HR registry note; panel Rounds 1–91; 590+ pytest floor
+- `PROJECT_KNOWLEDGE.md`: Decision Index Round 91 entry
+- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 91 sync
+
+## Verification (Round 91)
+
+- `make all`: pytest + ROUTE-001/002/003 100%
+
+## Panel status (updated)
+
+**Resealed 2026-06-25** after **Round 91** (options GC–GG). Human Review frozen sets are now cross-checked by a single library registry (`00-core`–`06-repo`). Holdout expansion remains recurrence-gated maintenance.
+
diff --git a/reflective-prompt-library/plans/tests/prompt_eval_helpers.py b/reflective-prompt-library/plans/tests/prompt_eval_helpers.py
@@ -5,6 +5,16 @@
 
 HUMAN_REVIEW_HEADING = re.compile(r"^## Human Review\s*$", re.MULTILINE)
 
+PROMPT_LIBRARY_CATEGORIES = (
+    "00-core",
+    "01-thinking",
+    "02-engineering",
+    "03-context",
+    "04-agent",
+    "05-domain",
+    "06-repo",
+)
+
 
 def prompt_preamble(prompt_path: Path) -> str:
     return prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
diff --git a/reflective-prompt-library/plans/tests/test_agent_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_agent_prompts_eval_harness.py
@@ -39,7 +39,6 @@
     "workflow-recipes.md",
 })
 
-AGENT_PROMPTS_WITH_HUMAN_REVIEW = prompts_with_human_review(AGENT_PROMPTS)
 
 
 
diff --git a/reflective-prompt-library/plans/tests/test_context_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_context_prompts_eval_harness.py
@@ -36,7 +36,6 @@
     "medium-context.md",
 })
 
-CONTEXT_PROMPTS_WITH_HUMAN_REVIEW = prompts_with_human_review(CONTEXT_PROMPTS)
 
 
 
diff --git a/reflective-prompt-library/plans/tests/test_domain_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_domain_prompts_eval_harness.py
@@ -36,7 +36,6 @@
     "writing-article.md",
 })
 
-DOMAIN_PROMPTS_WITH_HUMAN_REVIEW = prompts_with_human_review(DOMAIN_PROMPTS)
 
 
 
diff --git a/reflective-prompt-library/plans/tests/test_engineering_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_engineering_prompts_eval_harness.py
@@ -37,7 +37,6 @@
     "usage-first.md",
 })
 
-ENGINEERING_PROMPTS_WITH_HUMAN_REVIEW = prompts_with_human_review(ENGINEERING_PROMPTS)
 
 
 
diff --git a/reflective-prompt-library/plans/tests/test_glossary_structure.py b/reflective-prompt-library/plans/tests/test_glossary_structure.py
@@ -30,10 +30,11 @@ def test_round_boundary_terms_present(glossary_text: str):
         assert heading in glossary_text, f"missing glossary section: {heading}"
 
 
-def test_maintenance_playbook_references_round_90(glossary_text: str):
+def test_maintenance_playbook_references_round_91(glossary_text: str):
     playbook = glossary_text.split("## Governance Maintenance Playbook", 1)[1]
-    assert "Rounds 1–90" in playbook
-    assert "Rounds 1–89" not in playbook and "Rounds 1-89" not in playbook
+    assert "Rounds 1–91" in playbook
+    assert "Rounds 1–90" not in playbook and "Rounds 1-90" not in playbook
+
 
 
 
@@ -43,7 +44,7 @@ def test_maintenance_playbook_steps_on_separate_lines(glossary_text: str):
     assert re.search(r"guards\.\d+\.", playbook) is None, (
         "playbook steps merged without newline between numbers"
     )
-    for step in ("17.", "18.", "19.", "20.", "21.", "22."):
+    for step in ("17.", "18.", "19.", "20.", "21.", "22.", "23."):
         assert step in playbook
 
 
diff --git a/reflective-prompt-library/plans/tests/test_human_review_library_registry.py b/reflective-prompt-library/plans/tests/test_human_review_library_registry.py
@@ -0,0 +1,129 @@
+"""Cross-category Human Review registry anti-drift for composable prompt library."""
+
+import sys
+from pathlib import Path
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).parent))
+
+from prompt_eval_helpers import (  # noqa: E402
+    PROMPT_LIBRARY_CATEGORIES,
+    prompts_with_human_review,
+)
+from test_agent_prompts_eval_harness import (  # noqa: E402
+    AGENT_HUMAN_REVIEW_EXEMPT,
+    AGENT_HUMAN_REVIEW_REQUIRED,
+    AGENT_PROMPTS,
+)
+from test_context_prompts_eval_harness import (  # noqa: E402
+    CONTEXT_HUMAN_REVIEW_EXEMPT,
+    CONTEXT_HUMAN_REVIEW_REQUIRED,
+    CONTEXT_PROMPTS,
+)
+from test_core_prompts_eval_harness import (  # noqa: E402
+    CORE_HUMAN_REVIEW_EXEMPT,
+    CORE_HUMAN_REVIEW_REQUIRED,
+    CORE_PROMPTS,
+)
+from test_domain_prompts_eval_harness import (  # noqa: E402
+    DOMAIN_HUMAN_REVIEW_EXEMPT,
+    DOMAIN_HUMAN_REVIEW_REQUIRED,
+    DOMAIN_PROMPTS,
+)
+from test_engineering_prompts_eval_harness import (  # noqa: E402
+    ENGINEERING_HUMAN_REVIEW_EXEMPT,
+    ENGINEERING_HUMAN_REVIEW_REQUIRED,
+    ENGINEERING_PROMPTS,
+)
+from test_repo_prompts_eval_harness import (  # noqa: E402
+    REPO_HUMAN_REVIEW_EXEMPT,
+    REPO_HUMAN_REVIEW_REQUIRED,
+    REPO_PROMPTS,
+)
+from test_thinking_prompts_eval_harness import (  # noqa: E402
+    THINKING_HUMAN_REVIEW_EXEMPT,
+    THINKING_HUMAN_REVIEW_REQUIRED,
+    THINKING_PROMPTS,
+)
+
+LIBRARY_ROOT = Path(__file__).parent.parent.parent
+
+HUMAN_REVIEW_CATEGORY_REGISTRY = (
+    ("00-core", CORE_PROMPTS, CORE_HUMAN_REVIEW_REQUIRED, CORE_HUMAN_REVIEW_EXEMPT),
+    (
+        "01-thinking",
+        THINKING_PROMPTS,
+        THINKING_HUMAN_REVIEW_REQUIRED,
+        THINKING_HUMAN_REVIEW_EXEMPT,
+    ),
+    (
+        "02-engineering",
+        ENGINEERING_PROMPTS,
+        ENGINEERING_HUMAN_REVIEW_REQUIRED,
+        ENGINEERING_HUMAN_REVIEW_EXEMPT,
+    ),
+    (
+        "03-context",
+        CONTEXT_PROMPTS,
+        CONTEXT_HUMAN_REVIEW_REQUIRED,
+        CONTEXT_HUMAN_REVIEW_EXEMPT,
+    ),
+    ("04-agent", AGENT_PROMPTS, AGENT_HUMAN_REVIEW_REQUIRED, AGENT_HUMAN_REVIEW_EXEMPT),
+    (
+        "05-domain",
+        DOMAIN_PROMPTS,
+        DOMAIN_HUMAN_REVIEW_REQUIRED,
+        DOMAIN_HUMAN_REVIEW_EXEMPT,
+    ),
+    ("06-repo", REPO_PROMPTS, REPO_HUMAN_REVIEW_REQUIRED, REPO_HUMAN_REVIEW_EXEMPT),
+)
+
+
+def test_human_review_registry_lists_all_prompt_categories():
+    assert tuple(cat for cat, *_ in HUMAN_REVIEW_CATEGORY_REGISTRY) == PROMPT_LIBRARY_CATEGORIES
+
+
+@pytest.mark.parametrize(
+    "category,prompts,required,exempt",
+    HUMAN_REVIEW_CATEGORY_REGISTRY,
+    ids=[row[0] for row in HUMAN_REVIEW_CATEGORY_REGISTRY],
+)
+def test_human_review_registry_category_partition(
+    category: str, prompts: tuple[Path, ...], required: frozenset[str], exempt: frozenset[str]
+):
+    all_names = frozenset(p.name for p in prompts)
+    assert required | exempt == all_names, f"{category}: required ∪ exempt != all prompts"
+    assert not required & exempt, f"{category}: required and exempt overlap"
+
+
+def test_human_review_registry_library_wide_unique_filenames():
+    basenames: list[str] = []
+    for _category, prompts, _required, _exempt in HUMAN_REVIEW_CATEGORY_REGISTRY:
+        basenames.extend(p.name for p in prompts)
+    assert len(basenames) == len(frozenset(basenames)), (
+        "duplicate prompt basenames across categories"
+    )
+
+
+def test_human_review_registry_matches_library_glob():
+    globbed: list[Path] = []
+    for category in PROMPT_LIBRARY_CATEGORIES:
+        globbed.extend(sorted((LIBRARY_ROOT / category).glob("*.md")))
+    registry_paths = [
+        p for _category, prompts, _required, _exempt in HUMAN_REVIEW_CATEGORY_REGISTRY
+        for p in prompts
+    ]
+    assert sorted(globbed) == sorted(registry_paths)
+
+
+def test_human_review_registry_required_union_matches_detection():
+    detected: set[str] = set()
+    for category, prompts, required, _exempt in HUMAN_REVIEW_CATEGORY_REGISTRY:
+        category_detected = {p.name for p in prompts_with_human_review(prompts)}
+        assert category_detected == required, (
+            f"{category}: detected HR preambles {sorted(category_detected)} "
+            f"!= frozen required {sorted(required)}"
+        )
+        detected |= category_detected
+    assert len(detected) > 0
diff --git a/reflective-prompt-library/plans/tests/test_readme_governance.py b/reflective-prompt-library/plans/tests/test_readme_governance.py
@@ -10,8 +10,8 @@
 METHODOLOGY_MAP_EN = Path(__file__).parent.parent.parent / "METHODOLOGY_MAP.md"
 SKILL_MAP = Path(__file__).parent.parent.parent / "skills" / "skill-map.md"
 
-CURRENT_PANEL_ROUND = "90"
-CURRENT_PANEL_OPTIONS = "A–GB"
+CURRENT_PANEL_ROUND = "91"
+CURRENT_PANEL_OPTIONS = "A–GE"
 
 
 @pytest.fixture(scope="module")
diff --git a/reflective-prompt-library/plans/tests/test_repo_prompts_eval_harness.py b/reflective-prompt-library/plans/tests/test_repo_prompts_eval_harness.py
@@ -33,7 +33,6 @@
     "codex-opencode.md",
 })
 
-REPO_PROMPTS_WITH_HUMAN_REVIEW = prompts_with_human_review(REPO_PROMPTS)
 
 
 

Original file line number	Diff line number	Diff line change
`@@ -39,7 +39,6 @@`
`39`	`39`	`"workflow-recipes.md",`
`40`	`40`	`})`
`41`	`41`
`42`		`-AGENT_PROMPTS_WITH_HUMAN_REVIEW = prompts_with_human_review(AGENT_PROMPTS)`
`43`	`42`
`44`	`43`
`45`	`44`
Original file line number	Diff line number	Diff line change
`@@ -36,7 +36,6 @@`
`36`	`36`	`"medium-context.md",`
`37`	`37`	`})`
`38`	`38`
`39`		`-CONTEXT_PROMPTS_WITH_HUMAN_REVIEW = prompts_with_human_review(CONTEXT_PROMPTS)`
`40`	`39`
`41`	`40`
`42`	`41`