Skip to content

Commit db36f81

Browse files
committed
Round 99: category prompt path library registry + preamble fix
Panel consensus (HQ–HU): DRY category_prompt_dir/sorted_category_prompts across all eval_harness modules, add test_prompt_category_paths_library_registry, align assert_prompt_references_workflow_skill with preamble scope, and sync governance docs to round 99 (682 pytest, ROUTE 100%).
1 parent febb1c4 commit db36f81

17 files changed

Lines changed: 174 additions & 33 deletions

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Full library docs: [reflective-prompt-library/README.md](reflective-prompt-libra
2121
## Governance
2222

2323
- **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) — quality gates, routing maintenance (R8–R12), `make all`
24-
- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–98)
24+
- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–99)
2525
- **Operator playbook:** [GLOSSARY.md](reflective-prompt-library/GLOSSARY.md) — Governance Maintenance Playbook
2626

2727
The repository contains:

reflective-prompt-library/GLOSSARY.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ Curated top-of-cheatsheet summary of high-confusion routing traps (ROUTE-002 hol
337337

338338
## Governance Maintenance Playbook / 治理維護手冊
339339

340-
Ongoing upkeep after panel close (Rounds 1–98). Not agent instructions — operator checklist.
340+
Ongoing upkeep after panel close (Rounds 1–99). Not agent instructions — operator checklist.
341341

342342
**Operational test:** Before router tuning, add fresh ROUTE-002/003 holdout phrases; run `make all`; record decisions in `PROJECT_KNOWLEDGE.md` Decision Index when governance surface changes.
343343

@@ -371,3 +371,4 @@ Ongoing upkeep after panel close (Rounds 1–98). Not agent instructions — ope
371371
28. When editing eval_harness score floors, keep `PROMPT_EVAL_MIN_SCORE` in `prompt_eval_helpers.py` and use `assert_prompt_meets_eval_harness_floor` in per-category `test_*_prompts_eval_harness.py` guards; run `test_prompt_eval_harness_score_library_registry.py`.
372372
29. When editing per-category `reference_workflow_skills` guards, use `assert_prompt_references_workflow_skill` in `prompt_eval_helpers.py` (preamble-scoped, not fenced templates); run `test_prompt_workflow_skill_reference_library_registry.py` plus per-category harness guards.
373373
30. When editing per-category eval_harness fixtures, keep `PROMPT_LIBRARY_REPO_ROOT` and `make_category_eval_harness_fixture` in `prompt_eval_helpers.py`; run `test_prompt_eval_harness_fixture_library_registry.py` plus per-category harness guards.
374+
31. When editing per-category `*_DIR` / `*_PROMPTS` tuples, use `category_prompt_dir` and `sorted_category_prompts` in `prompt_eval_helpers.py`; run `test_prompt_category_paths_library_registry.py` plus per-category harness guards.

reflective-prompt-library/PROJECT_KNOWLEDGE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
7272

7373
## Decision Index
7474

75+
- 2026-06-25 Round 99 panel — cross-category prompt path library registry (`test_prompt_category_paths_library_registry.py`, DRY `category_prompt_dir` / `sorted_category_prompts`; preamble-scoped `assert_prompt_references_workflow_skill`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7576
- 2026-06-25 Round 98 panel — cross-category eval_harness fixture library registry (`test_prompt_eval_harness_fixture_library_registry.py`, DRY `make_category_eval_harness_fixture`, `PROMPT_LIBRARY_REPO_ROOT`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7677
- 2026-06-25 Round 97 panel — cross-category workflow skill reference library registry (`test_prompt_workflow_skill_reference_library_registry.py`, DRY `assert_prompt_references_workflow_skill`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7778
- 2026-06-25 Round 96 panel — cross-category eval_harness score floor library registry (`test_prompt_eval_harness_score_library_registry.py`, DRY `assert_prompt_meets_eval_harness_floor`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)

reflective-prompt-library/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
3030

3131
## Governance Panel Record
3232

33-
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–98, options A–HP) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
33+
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–99, options A–HU) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
3434

3535
## Directory Map
3636

reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ ROUTE-002 measures unseen phrasing separately from ROUTE-001. Round 7 (2026-06-2
314314
2. **ROUTE-001/002/003 in CI** — 128 + 102 + 53 paraphrases at 100% consistency (seeded fixtures); `validate_route_fixture.py` gates minimum coverage
315315
3. **Governance validators** — links, lint, governance metadata, PROJECT_KNOWLEDGE, benchmark fixture, skill examples
316316
4. **Harness policy docs** — CONTRIBUTING, AGENTS, SKILL_INSTALLATION, maintenance playbook
317-
5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_prompt_skill_links_library_registry.py`, `test_prompt_contract_library_registry.py`, `test_prompt_primary_workflow_surface_library_registry.py`, `test_workflow_skill_coverage_library_registry.py`, `test_prompt_eval_harness_score_library_registry.py`, `test_prompt_workflow_skill_reference_library_registry.py`, `test_prompt_eval_harness_fixture_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 670+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90); library-wide contract heading registry (`PROMPT_CONTRACT_HEADINGS`, Round 93); workflow skill coverage registry (`*_COVER_WORKFLOW_SKILLS`, Round 95); eval_harness score floor registry (`PROMPT_EVAL_MIN_SCORE`, Round 96); workflow skill reference registry (`assert_prompt_references_workflow_skill`, Round 97); eval_harness fixture registry (`make_category_eval_harness_fixture`, Round 98)
317+
5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_prompt_skill_links_library_registry.py`, `test_prompt_contract_library_registry.py`, `test_prompt_primary_workflow_surface_library_registry.py`, `test_workflow_skill_coverage_library_registry.py`, `test_prompt_eval_harness_score_library_registry.py`, `test_prompt_workflow_skill_reference_library_registry.py`, `test_prompt_eval_harness_fixture_library_registry.py`, `test_prompt_category_paths_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 680+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90); library-wide contract heading registry (`PROMPT_CONTRACT_HEADINGS`, Round 93); workflow skill coverage registry (`*_COVER_WORKFLOW_SKILLS`, Round 95); eval_harness score floor registry (`PROMPT_EVAL_MIN_SCORE`, Round 96); workflow skill reference registry (`assert_prompt_references_workflow_skill`, Round 97); eval_harness fixture registry (`make_category_eval_harness_fixture`, Round 98); category path registry (`category_prompt_dir` / `sorted_category_prompts`, Round 99); workflow skill reference helper preamble-aligned (Round 99)
318318

319319
### Ongoing maintenance (not blockers)
320320

@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
384384
- ✅ Benchmark fixture gate plus optional manual benchmark runs
385385
- ✅ Research-backed design decisions
386386

387-
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–98; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
387+
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–99; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.

reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2987,3 +2987,56 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
29872987
---
29882988

29892989
**Resealed 2026-06-25** after **Round 98** (options HL–HP). Eval_harness fixtures are now library-registry checked across all `00-core``06-repo` categories with shared `make_category_eval_harness_fixture` and `PROMPT_LIBRARY_REPO_ROOT`. Holdout expansion remains recurrence-gated maintenance.
2990+
2991+
## Round 99 — cross-category prompt path library registry + preamble fix (2026-06-25)
2992+
2993+
**Options HQ–HU** | Six-lens panel (Opus, Codex, Gemini, Composer, Sakana, GLM)
2994+
2995+
### Round 99 options
2996+
2997+
| ID | Proposal | Verdict |
2998+
| --- | --- | --- |
2999+
| HQ | DRY `category_prompt_dir` + `sorted_category_prompts` in `prompt_eval_helpers.py` | **Agree** |
3000+
| HR | `test_prompt_category_paths_library_registry.py` — category dir + prompt tuple parity registry | **Agree** |
3001+
| HS | Fix `assert_prompt_references_workflow_skill` to preamble scope + GLOSSARY step 31 + governance sync | **Agree** |
3002+
| HT | ROUTE holdout expansion | **Defer** |
3003+
| HU | Router / tenth skill / benchmark CI | **Reject** |
3004+
3005+
### Round 99 verdict table
3006+
3007+
| ID | Option | Verdict | Action |
3008+
| --- | --- | --- | --- |
3009+
| HQ | Category prompt path DRY helpers | **Agree** | `category_prompt_dir` + `sorted_category_prompts` |
3010+
| HR | Category path library registry | **Agree** | `test_prompt_category_paths_library_registry.py` |
3011+
| HS | Preamble scope + playbook | **Agree** | `prompt_preamble` in workflow skill reference helper; step 31 |
3012+
| HT | Holdout expansion | **Defer** | maintenance |
3013+
| HU | Router/tenth skill/benchmark CI | **Reject** | no change |
3014+
3015+
### Socratic rationale (Round 99)
3016+
3017+
- **Opus:** Round 98 closed eval_harness fixtures; seven harness files still duplicate `Path(__file__).parent.parent.parent / "0X-category"` glob tuples with no library-wide falsifiability; Round 97 docs claimed preamble-scoped workflow skill references but helper still scanned full file bodies.
3018+
- **Codex:** Shared `PROMPT_LIBRARY_ROOT` + `category_prompt_dir` prevents off-by-one parent drift; preamble fix aligns implementation with `test_prompt_cross_links.py` and GLOSSARY step 29.
3019+
- **Gemini:** Deterministic path registry; no prompt content churn.
3020+
- **Composer:** Mirrors R91–R98 registry pattern; one helper pair + one registry file.
3021+
- **Sakana:** Category path parity documents that all harness modules resolve the same composable prompt globs.
3022+
- **GLM:** Preamble-scoped workflow skill check avoids false passes from fenced template mentions; playbook step 31 gives operators a single checklist line.
3023+
3024+
**All roles agree.**
3025+
3026+
## Implemented Changes (Round 99)
3027+
3028+
- `plans/tests/prompt_eval_helpers.py`: `PROMPT_LIBRARY_ROOT`, `category_prompt_dir`, `sorted_category_prompts`; `assert_prompt_references_workflow_skill` preamble-scoped
3029+
- `plans/tests/test_*_prompts_eval_harness.py`: DRY category dirs + prompt tuples via shared helpers
3030+
- `plans/tests/test_prompt_category_paths_library_registry.py`: cross-category path registry
3031+
- `GLOSSARY.md`: playbook Rounds 1–99; step 31 for category path library registry
3032+
- `QUALITY_GATES_SUMMARY.md`: category path registry note; preamble fix note; panel Rounds 1–99; 680+ pytest floor
3033+
- `PROJECT_KNOWLEDGE.md`: Decision Index Round 99 entry
3034+
- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 99 sync
3035+
3036+
## Verification (Round 99)
3037+
3038+
- `make all`: 682 pytest + ROUTE-001/002/003 100%
3039+
3040+
---
3041+
3042+
**Resealed 2026-06-25** after **Round 99** (options HQ–HU). Composable prompt category paths are now library-registry checked across all `00-core``06-repo` categories with shared `category_prompt_dir` / `sorted_category_prompts`; workflow skill reference guards are preamble-scoped as documented. Holdout expansion remains recurrence-gated maintenance.

reflective-prompt-library/plans/tests/prompt_eval_helpers.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,19 @@
2525
PROMPT_EVAL_MIN_SCORE = 80.0
2626

2727
PROMPT_LIBRARY_REPO_ROOT = str(Path(__file__).resolve().parent.parent.parent.parent)
28+
PROMPT_LIBRARY_ROOT = Path(__file__).resolve().parent.parent.parent
29+
30+
31+
def category_prompt_dir(category: str) -> Path:
32+
"""Resolve a composable prompt category directory under the library root."""
33+
if category not in PROMPT_LIBRARY_CATEGORIES:
34+
raise ValueError(f"unknown prompt category: {category}")
35+
return PROMPT_LIBRARY_ROOT / category
36+
37+
38+
def sorted_category_prompts(category: str) -> tuple[Path, ...]:
39+
"""Return sorted markdown prompt paths for a library category."""
40+
return tuple(sorted(category_prompt_dir(category).glob("*.md")))
2841

2942
CATEGORY_EVAL_HARNESS_FIXTURE_MARKER = "_from_category_eval_harness_fixture"
3043

@@ -148,9 +161,9 @@ def assert_category_workflow_skill_coverage(
148161

149162

150163
def assert_prompt_references_workflow_skill(prompt_path: Path) -> None:
151-
"""Every composable prompt must mention at least one reflective-* workflow skill."""
152-
body = prompt_path.read_text(encoding="utf-8")
153-
assert "reflective-" in body, (
164+
"""Every composable prompt must mention at least one reflective-* workflow skill in preamble."""
165+
preamble = prompt_preamble(prompt_path)
166+
assert "reflective-" in preamble, (
154167
f"{prompt_path.name} should map to at least one workflow skill"
155168
)
156169

reflective-prompt-library/plans/tests/test_agent_prompts_eval_harness.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@
99
sys.path.insert(0, str(Path(__file__).parent))
1010

1111
from eval_harness import EvalHarness # noqa: E402
12-
from prompt_eval_helpers import PROMPT_LIBRARY_REPO_ROOT, make_category_eval_harness_fixture, assert_category_workflow_skill_coverage, assert_human_review_preamble, assert_primary_workflow_surface_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings, assert_prompt_references_workflow_skill, assert_prompt_meets_eval_harness_floor # noqa: E402
12+
from prompt_eval_helpers import category_prompt_dir, sorted_category_prompts, PROMPT_LIBRARY_REPO_ROOT, make_category_eval_harness_fixture, assert_category_workflow_skill_coverage, assert_human_review_preamble, assert_primary_workflow_surface_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings, assert_prompt_references_workflow_skill, assert_prompt_meets_eval_harness_floor # noqa: E402
1313

1414
REQUIRED_HEADINGS = PROMPT_CONTRACT_HEADINGS
1515
MIN_SCORE = PROMPT_EVAL_MIN_SCORE
1616

17-
AGENT_DIR = Path(__file__).parent.parent.parent / "04-agent"
17+
AGENT_DIR = category_prompt_dir("04-agent")
1818
REPO_ROOT = PROMPT_LIBRARY_REPO_ROOT
1919

20-
AGENT_PROMPTS = tuple(sorted(AGENT_DIR.glob("*.md")))
20+
AGENT_PROMPTS = sorted_category_prompts("04-agent")
2121
AGENT_COVER_WORKFLOW_SKILLS = (
2222
"reflective-dispatch",
2323
"reflective-spec-plan",

reflective-prompt-library/plans/tests/test_context_prompts_eval_harness.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@
99
sys.path.insert(0, str(Path(__file__).parent))
1010

1111
from eval_harness import EvalHarness # noqa: E402
12-
from prompt_eval_helpers import PROMPT_LIBRARY_REPO_ROOT, make_category_eval_harness_fixture, assert_category_workflow_skill_coverage, assert_human_review_preamble, assert_primary_workflow_surface_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings, assert_prompt_references_workflow_skill, assert_prompt_meets_eval_harness_floor # noqa: E402
12+
from prompt_eval_helpers import category_prompt_dir, sorted_category_prompts, PROMPT_LIBRARY_REPO_ROOT, make_category_eval_harness_fixture, assert_category_workflow_skill_coverage, assert_human_review_preamble, assert_primary_workflow_surface_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings, assert_prompt_references_workflow_skill, assert_prompt_meets_eval_harness_floor # noqa: E402
1313

1414
REQUIRED_HEADINGS = PROMPT_CONTRACT_HEADINGS
1515
MIN_SCORE = PROMPT_EVAL_MIN_SCORE
1616

17-
CONTEXT_DIR = Path(__file__).parent.parent.parent / "03-context"
17+
CONTEXT_DIR = category_prompt_dir("03-context")
1818
REPO_ROOT = PROMPT_LIBRARY_REPO_ROOT
1919

20-
CONTEXT_PROMPTS = tuple(sorted(CONTEXT_DIR.glob("*.md")))
20+
CONTEXT_PROMPTS = sorted_category_prompts("03-context")
2121
CONTEXT_COVER_WORKFLOW_SKILLS = (
2222
"reflective-dispatch",
2323
"reflective-brief",

reflective-prompt-library/plans/tests/test_core_prompts_eval_harness.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010

1111
from eval_harness import EvalHarness # noqa: E402
1212
from prompt_eval_helpers import (
13+
category_prompt_dir,
14+
sorted_category_prompts,
1315
PROMPT_LIBRARY_REPO_ROOT,
1416
make_category_eval_harness_fixture,
1517
PROMPT_CONTRACT_HEADINGS,
@@ -28,10 +30,10 @@
2830
REQUIRED_HEADINGS = PROMPT_CONTRACT_HEADINGS
2931
MIN_SCORE = PROMPT_EVAL_MIN_SCORE
3032

31-
CORE_DIR = Path(__file__).parent.parent.parent / "00-core"
33+
CORE_DIR = category_prompt_dir("00-core")
3234
REPO_ROOT = PROMPT_LIBRARY_REPO_ROOT
3335

34-
CORE_PROMPTS = tuple(sorted(CORE_DIR.glob("*.md")))
36+
CORE_PROMPTS = sorted_category_prompts("00-core")
3537
CORE_COVER_WORKFLOW_SKILLS = (
3638
"reflective-brief",
3739
"reflective-dispatch",

0 commit comments

Comments
 (0)