Skip to content

Commit 8b53f9d

Browse files
committed
Round 100: library registry helper DRY + cross-registry migration
Six-lens panel unanimous on HV–HZ: extract shared assert_library_wide_unique_basenames, assert_registry_matches_library_glob, and sorted_all_library_prompts helpers; migrate all *_library_registry.py glob/unique guards; add test_prompt_library_registry_helpers_library_registry.py; align test_prompt_cross_links.py and promotion contract with shared path helpers. Governance synced to Round 100 (options A–HV); 702 pytest, ROUTE 100%.
1 parent db36f81 commit 8b53f9d

21 files changed

Lines changed: 277 additions & 133 deletions

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Full library docs: [reflective-prompt-library/README.md](reflective-prompt-libra
2121
## Governance
2222

2323
- **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) — quality gates, routing maintenance (R8–R12), `make all`
24-
- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–99)
24+
- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–100)
2525
- **Operator playbook:** [GLOSSARY.md](reflective-prompt-library/GLOSSARY.md) — Governance Maintenance Playbook
2626

2727
The repository contains:

reflective-prompt-library/GLOSSARY.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ Curated top-of-cheatsheet summary of high-confusion routing traps (ROUTE-002 hol
337337

338338
## Governance Maintenance Playbook / 治理維護手冊
339339

340-
Ongoing upkeep after panel close (Rounds 1–99). Not agent instructions — operator checklist.
340+
Ongoing upkeep after panel close (Rounds 1–100). Not agent instructions — operator checklist.
341341

342342
**Operational test:** Before router tuning, add fresh ROUTE-002/003 holdout phrases; run `make all`; record decisions in `PROJECT_KNOWLEDGE.md` Decision Index when governance surface changes.
343343

@@ -372,3 +372,4 @@ Ongoing upkeep after panel close (Rounds 1–99). Not agent instructions — ope
372372
29. When editing per-category `reference_workflow_skills` guards, use `assert_prompt_references_workflow_skill` in `prompt_eval_helpers.py` (preamble-scoped, not fenced templates); run `test_prompt_workflow_skill_reference_library_registry.py` plus per-category harness guards.
373373
30. When editing per-category eval_harness fixtures, keep `PROMPT_LIBRARY_REPO_ROOT` and `make_category_eval_harness_fixture` in `prompt_eval_helpers.py`; run `test_prompt_eval_harness_fixture_library_registry.py` plus per-category harness guards.
374374
31. When editing per-category `*_DIR` / `*_PROMPTS` tuples, use `category_prompt_dir` and `sorted_category_prompts` in `prompt_eval_helpers.py`; run `test_prompt_category_paths_library_registry.py` plus per-category harness guards.
375+
32. When editing cross-category library registries, use `assert_library_wide_unique_basenames`, `assert_registry_matches_library_glob`, and `sorted_all_library_prompts` in `prompt_eval_helpers.py`; run `test_prompt_library_registry_helpers_library_registry.py` plus per-registry glob/unique guards.

reflective-prompt-library/PROJECT_KNOWLEDGE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
7272

7373
## Decision Index
7474

75+
- 2026-06-25 Round 100 panel — cross-category library registry helper DRY (`test_prompt_library_registry_helpers_library_registry.py`, `assert_library_wide_unique_basenames`, `assert_registry_matches_library_glob`, `sorted_all_library_prompts`; migrate all `*_library_registry.py` glob/unique guards) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7576
- 2026-06-25 Round 99 panel — cross-category prompt path library registry (`test_prompt_category_paths_library_registry.py`, DRY `category_prompt_dir` / `sorted_category_prompts`; preamble-scoped `assert_prompt_references_workflow_skill`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7677
- 2026-06-25 Round 98 panel — cross-category eval_harness fixture library registry (`test_prompt_eval_harness_fixture_library_registry.py`, DRY `make_category_eval_harness_fixture`, `PROMPT_LIBRARY_REPO_ROOT`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7778
- 2026-06-25 Round 97 panel — cross-category workflow skill reference library registry (`test_prompt_workflow_skill_reference_library_registry.py`, DRY `assert_prompt_references_workflow_skill`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)

reflective-prompt-library/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
3030

3131
## Governance Panel Record
3232

33-
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–99, options A–HU) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
33+
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–100, options A–HV) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
3434

3535
## Directory Map
3636

reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ ROUTE-002 measures unseen phrasing separately from ROUTE-001. Round 7 (2026-06-2
314314
2. **ROUTE-001/002/003 in CI** — 128 + 102 + 53 paraphrases at 100% consistency (seeded fixtures); `validate_route_fixture.py` gates minimum coverage
315315
3. **Governance validators** — links, lint, governance metadata, PROJECT_KNOWLEDGE, benchmark fixture, skill examples
316316
4. **Harness policy docs** — CONTRIBUTING, AGENTS, SKILL_INSTALLATION, maintenance playbook
317-
5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_prompt_skill_links_library_registry.py`, `test_prompt_contract_library_registry.py`, `test_prompt_primary_workflow_surface_library_registry.py`, `test_workflow_skill_coverage_library_registry.py`, `test_prompt_eval_harness_score_library_registry.py`, `test_prompt_workflow_skill_reference_library_registry.py`, `test_prompt_eval_harness_fixture_library_registry.py`, `test_prompt_category_paths_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 680+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90); library-wide contract heading registry (`PROMPT_CONTRACT_HEADINGS`, Round 93); workflow skill coverage registry (`*_COVER_WORKFLOW_SKILLS`, Round 95); eval_harness score floor registry (`PROMPT_EVAL_MIN_SCORE`, Round 96); workflow skill reference registry (`assert_prompt_references_workflow_skill`, Round 97); eval_harness fixture registry (`make_category_eval_harness_fixture`, Round 98); category path registry (`category_prompt_dir` / `sorted_category_prompts`, Round 99); workflow skill reference helper preamble-aligned (Round 99)
317+
5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_prompt_skill_links_library_registry.py`, `test_prompt_contract_library_registry.py`, `test_prompt_primary_workflow_surface_library_registry.py`, `test_workflow_skill_coverage_library_registry.py`, `test_prompt_eval_harness_score_library_registry.py`, `test_prompt_workflow_skill_reference_library_registry.py`, `test_prompt_eval_harness_fixture_library_registry.py`, `test_prompt_category_paths_library_registry.py`, `test_prompt_library_registry_helpers_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 690+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90); library-wide contract heading registry (`PROMPT_CONTRACT_HEADINGS`, Round 93); workflow skill coverage registry (`*_COVER_WORKFLOW_SKILLS`, Round 95); eval_harness score floor registry (`PROMPT_EVAL_MIN_SCORE`, Round 96); workflow skill reference registry (`assert_prompt_references_workflow_skill`, Round 97); eval_harness fixture registry (`make_category_eval_harness_fixture`, Round 98); category path registry (`category_prompt_dir` / `sorted_category_prompts`, Round 99); workflow skill reference helper preamble-aligned (Round 99); library registry helper DRY (`assert_library_wide_unique_basenames` / `assert_registry_matches_library_glob`, Round 100)
318318

319319
### Ongoing maintenance (not blockers)
320320

@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
384384
- ✅ Benchmark fixture gate plus optional manual benchmark runs
385385
- ✅ Research-backed design decisions
386386

387-
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–99; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
387+
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–100; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.

reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3040,3 +3040,57 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
30403040
---
30413041

30423042
**Resealed 2026-06-25** after **Round 99** (options HQ–HU). Composable prompt category paths are now library-registry checked across all `00-core``06-repo` categories with shared `category_prompt_dir` / `sorted_category_prompts`; workflow skill reference guards are preamble-scoped as documented. Holdout expansion remains recurrence-gated maintenance.
3043+
3044+
## Round 100 — cross-category library registry helper DRY (2026-06-25)
3045+
3046+
**Options HV–HZ** | Six-lens panel (Opus, Codex, Gemini, Composer, Sakana, GLM)
3047+
3048+
### Round 100 options
3049+
3050+
| ID | Proposal | Verdict |
3051+
| --- | --- | --- |
3052+
| HV | DRY `assert_library_wide_unique_basenames` + `assert_registry_matches_library_glob` + `sorted_all_library_prompts` in `prompt_eval_helpers.py` | **Agree** |
3053+
| HW | `test_prompt_library_registry_helpers_library_registry.py` — registry helper parity + module guard | **Agree** |
3054+
| HX | Migrate all `*_library_registry.py` glob/unique guards + `test_prompt_cross_links.py` paths; GLOSSARY step 32 + governance sync | **Agree** |
3055+
| HY | ROUTE holdout expansion | **Defer** |
3056+
| HZ | Router / tenth skill / benchmark CI | **Reject** |
3057+
3058+
### Round 100 verdict table
3059+
3060+
| ID | Option | Verdict | Action |
3061+
| --- | --- | --- | --- |
3062+
| HV | Library registry helper DRY | **Agree** | `assert_library_wide_unique_basenames` + `assert_registry_matches_library_glob` |
3063+
| HW | Registry helper library registry | **Agree** | `test_prompt_library_registry_helpers_library_registry.py` |
3064+
| HX | Registry migration + playbook | **Agree** | DRY all `*_library_registry.py`; step 32 |
3065+
| HY | Holdout expansion | **Defer** | maintenance |
3066+
| HZ | Router/tenth skill/benchmark CI | **Reject** | no change |
3067+
3068+
### Socratic rationale (Round 100)
3069+
3070+
- **Opus:** Round 99 closed category path helpers; nine cross-category registry files still duplicated library-wide unique-basename and glob-parity loops with local `LIBRARY_ROOT` paths.
3071+
- **Codex:** Shared `assert_registry_matches_library_glob` ensures every registry uses `sorted_category_prompts` semantics; `sorted_all_library_prompts` gives one canonical library-wide tuple.
3072+
- **Gemini:** Deterministic helper extraction; no prompt content churn.
3073+
- **Composer:** Mirrors R91–R99 registry pattern; one helper trio + one registry test file + migration sweep.
3074+
- **Sakana:** Registry glob parity now falsifies if any module reintroduces ad-hoc `Path(__file__).parent` globs.
3075+
- **GLM:** Playbook step 32 gives operators a single checklist line for library registry edits.
3076+
3077+
**All roles agree.**
3078+
3079+
## Implemented Changes (Round 100)
3080+
3081+
- `plans/tests/prompt_eval_helpers.py`: `sorted_all_library_prompts`, `library_skills_dir`, `assert_library_wide_unique_basenames`, `assert_registry_matches_library_glob`
3082+
- `plans/tests/test_*_library_registry.py`: DRY unique/glob guards via shared helpers; remove local `LIBRARY_ROOT`
3083+
- `plans/tests/test_prompt_library_registry_helpers_library_registry.py`: cross-category registry helper registry
3084+
- `plans/tests/test_prompt_cross_links.py`, `test_project_knowledge_promotion_contract.py`: shared `category_prompt_dir` / `library_skills_dir`
3085+
- `GLOSSARY.md`: playbook Rounds 1–100; step 32 for library registry helper registry
3086+
- `QUALITY_GATES_SUMMARY.md`: registry helper note; panel Rounds 1–100; 690+ pytest floor
3087+
- `PROJECT_KNOWLEDGE.md`: Decision Index Round 100 entry
3088+
- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 100 sync
3089+
3090+
## Verification (Round 100)
3091+
3092+
- `make all`: 702 pytest + ROUTE-001/002/003 100%
3093+
3094+
---
3095+
3096+
**Resealed 2026-06-25** after **Round 100** (options HV–HZ). Cross-category library registries now share `assert_library_wide_unique_basenames` and `assert_registry_matches_library_glob` with a library-wide helper registry guard. Holdout expansion remains recurrence-gated maintenance.

reflective-prompt-library/plans/tests/prompt_eval_helpers.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,36 @@ def sorted_category_prompts(category: str) -> tuple[Path, ...]:
3939
"""Return sorted markdown prompt paths for a library category."""
4040
return tuple(sorted(category_prompt_dir(category).glob("*.md")))
4141

42+
43+
def sorted_all_library_prompts() -> tuple[Path, ...]:
44+
"""Return every composable prompt path across all library categories."""
45+
paths: list[Path] = []
46+
for category in PROMPT_LIBRARY_CATEGORIES:
47+
paths.extend(sorted_category_prompts(category))
48+
return tuple(paths)
49+
50+
51+
def library_skills_dir() -> Path:
52+
"""Resolve the workflow skills directory under the prompt library root."""
53+
return PROMPT_LIBRARY_ROOT / "skills"
54+
55+
56+
def assert_library_wide_unique_basenames(
57+
prompt_paths: tuple[Path, ...] | list[Path],
58+
) -> None:
59+
"""Composable prompt basenames must be unique across all categories."""
60+
basenames = [p.name for p in prompt_paths]
61+
assert len(basenames) == len(frozenset(basenames)), (
62+
"duplicate prompt basenames across categories"
63+
)
64+
65+
66+
def assert_registry_matches_library_glob(
67+
registry_paths: tuple[Path, ...] | list[Path],
68+
) -> None:
69+
"""Registry prompt tuples must match sorted_category_prompts across the library."""
70+
assert sorted(sorted_all_library_prompts()) == sorted(registry_paths)
71+
4272
CATEGORY_EVAL_HARNESS_FIXTURE_MARKER = "_from_category_eval_harness_fixture"
4373

4474

reflective-prompt-library/plans/tests/test_glossary_structure.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,8 @@ def test_round_boundary_terms_present(glossary_text: str):
3232

3333
def test_maintenance_playbook_references_round_99(glossary_text: str):
3434
playbook = glossary_text.split("## Governance Maintenance Playbook", 1)[1]
35-
assert "Rounds 1–99" in playbook
36-
assert "Rounds 1–98" not in playbook and "Rounds 1-91" not in playbook
35+
assert "Rounds 1–100" in playbook
36+
assert "Rounds 1–99" not in playbook and "Rounds 1-91" not in playbook
3737

3838

3939

@@ -44,7 +44,7 @@ def test_maintenance_playbook_steps_on_separate_lines(glossary_text: str):
4444
assert re.search(r"guards\.\d+\.", playbook) is None, (
4545
"playbook steps merged without newline between numbers"
4646
)
47-
for step in ("17.", "18.", "19.", "20.", "21.", "22.", "23.", "24.", "25.", "26.", "27.", "28.", "29.", "30.", "31."):
47+
for step in ("17.", "18.", "19.", "20.", "21.", "22.", "23.", "24.", "25.", "26.", "27.", "28.", "29.", "30.", "31.", "32."):
4848
assert step in playbook
4949

5050

reflective-prompt-library/plans/tests/test_human_review_library_registry.py

Lines changed: 6 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
from prompt_eval_helpers import ( # noqa: E402
1111
PROMPT_LIBRARY_CATEGORIES,
1212
prompts_with_human_review,
13+
assert_library_wide_unique_basenames,
14+
assert_registry_matches_library_glob,
1315
)
1416
from test_agent_prompts_eval_harness import ( # noqa: E402
1517
AGENT_HUMAN_REVIEW_EXEMPT,
@@ -47,7 +49,6 @@
4749
THINKING_PROMPTS,
4850
)
4951

50-
LIBRARY_ROOT = Path(__file__).parent.parent.parent
5152

5253
HUMAN_REVIEW_CATEGORY_REGISTRY = (
5354
("00-core", CORE_PROMPTS, CORE_HUMAN_REVIEW_REQUIRED, CORE_HUMAN_REVIEW_EXEMPT),
@@ -98,23 +99,13 @@ def test_human_review_registry_category_partition(
9899

99100

100101
def test_human_review_registry_library_wide_unique_filenames():
101-
basenames: list[str] = []
102-
for _category, prompts, _required, _exempt in HUMAN_REVIEW_CATEGORY_REGISTRY:
103-
basenames.extend(p.name for p in prompts)
104-
assert len(basenames) == len(frozenset(basenames)), (
105-
"duplicate prompt basenames across categories"
106-
)
102+
registry_paths = [p for _category, prompts, _required, _exempt in HUMAN_REVIEW_CATEGORY_REGISTRY for p in prompts]
103+
assert_library_wide_unique_basenames(registry_paths)
107104

108105

109106
def test_human_review_registry_matches_library_glob():
110-
globbed: list[Path] = []
111-
for category in PROMPT_LIBRARY_CATEGORIES:
112-
globbed.extend(sorted((LIBRARY_ROOT / category).glob("*.md")))
113-
registry_paths = [
114-
p for _category, prompts, _required, _exempt in HUMAN_REVIEW_CATEGORY_REGISTRY
115-
for p in prompts
116-
]
117-
assert sorted(globbed) == sorted(registry_paths)
107+
registry_paths = [p for _category, prompts, _required, _exempt in HUMAN_REVIEW_CATEGORY_REGISTRY for p in prompts]
108+
assert_registry_matches_library_glob(registry_paths)
118109

119110

120111
def test_human_review_registry_required_union_matches_detection():

reflective-prompt-library/plans/tests/test_project_knowledge_promotion_contract.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,17 @@
11
"""Regression checks for the project-knowledge promotion surface."""
22

3+
import sys
34
from pathlib import Path
45

6+
sys.path.insert(0, str(Path(__file__).parent))
57

6-
LIBRARY_ROOT = Path(__file__).parent.parent.parent
7-
HANDOFF_SKILL = LIBRARY_ROOT / "skills" / "reflective-handoff-retro" / "SKILL.md"
8-
PROJECT_TEMPLATE = LIBRARY_ROOT / "06-repo" / "PROJECT_KNOWLEDGE.template.md"
8+
from prompt_eval_helpers import ( # noqa: E402
9+
category_prompt_dir,
10+
library_skills_dir,
11+
)
12+
13+
HANDOFF_SKILL = library_skills_dir() / "reflective-handoff-retro" / "SKILL.md"
14+
PROJECT_TEMPLATE = category_prompt_dir("06-repo") / "PROJECT_KNOWLEDGE.template.md"
915

1016

1117
def test_handoff_skill_exposes_complete_promotion_candidate_contract():

0 commit comments

Comments
 (0)