Skip to content

Commit 19242e4

Browse files
committed
Round 94: cross-category Primary workflow surface preamble library registry
DRY assert_primary_workflow_surface_preamble and supporting-lens exemption map in prompt_eval_helpers; add test_prompt_primary_workflow_surface_library_registry for library-wide preamble parity across all 49 composable prompts.
1 parent 48254ba commit 19242e4

17 files changed

Lines changed: 197 additions & 50 deletions

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Full library docs: [reflective-prompt-library/README.md](reflective-prompt-libra
2121
## Governance
2222

2323
- **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) — quality gates, routing maintenance (R8–R12), `make all`
24-
- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–93)
24+
- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–94)
2525
- **Operator playbook:** [GLOSSARY.md](reflective-prompt-library/GLOSSARY.md) — Governance Maintenance Playbook
2626

2727
The repository contains:

reflective-prompt-library/GLOSSARY.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ Curated top-of-cheatsheet summary of high-confusion routing traps (ROUTE-002 hol
337337

338338
## Governance Maintenance Playbook / 治理維護手冊
339339

340-
Ongoing upkeep after panel close (Rounds 1–93). Not agent instructions — operator checklist.
340+
Ongoing upkeep after panel close (Rounds 1–94). Not agent instructions — operator checklist.
341341

342342
**Operational test:** Before router tuning, add fresh ROUTE-002/003 holdout phrases; run `make all`; record decisions in `PROJECT_KNOWLEDGE.md` Decision Index when governance surface changes.
343343

@@ -366,3 +366,4 @@ Ongoing upkeep after panel close (Rounds 1–93). Not agent instructions — ope
366366
23. When adding composable prompts or new categories, keep `PROMPT_LIBRARY_CATEGORIES` and `test_human_review_library_registry.py` aligned so frozen HR sets cover every `00-core``06-repo` prompt exactly once.
367367
24. When adding composable prompts or editing `*_SKILL_LINKS` / `*_THINKING_LINKS`, keep per-category dict keys aligned with prompt globs and run `test_prompt_skill_links_library_registry.py` plus `test_all_*_prompts_have_skill_link` in `test_prompt_cross_links.py`.
368368
25. When adding composable prompts or editing eval_harness contract preambles, keep `PROMPT_CONTRACT_HEADINGS` / `PROMPT_EVAL_MIN_SCORE` in `prompt_eval_helpers.py` and run `test_prompt_contract_library_registry.py` plus per-category `test_*_prompts_eval_harness.py` guards.
369+
26. When editing composable prompt Purpose preambles, keep `Primary workflow surface(s)` / Supporting-lens lines via `assert_primary_workflow_surface_preamble` in `prompt_eval_helpers.py`; update `SUPPORTING_LENS_PRIMARY_SURFACE_BY_CATEGORY` for exemptions; run `test_prompt_primary_workflow_surface_library_registry.py` plus per-category `test_*_prompts_eval_harness.py` guards.

reflective-prompt-library/PROJECT_KNOWLEDGE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
7373
## Decision Index
7474

7575
- 2026-06-25 Round 85 panel — composable prompt Primary workflow surface preamble guards (`test_*_prompts_eval_harness.py`) + Supporting-lens exemption → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
76+
- 2026-06-25 Round 94 panel — cross-category Primary workflow surface preamble library registry (`test_prompt_primary_workflow_surface_library_registry.py`, DRY `assert_primary_workflow_surface_preamble`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7677
- 2026-06-25 Round 93 panel — cross-category eval_harness contract heading library registry (`test_prompt_contract_library_registry.py`, DRY `PROMPT_CONTRACT_HEADINGS`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7778
- 2026-06-25 Round 92 panel — cross-category skill/thinking cross-link library registry (`test_prompt_skill_links_library_registry.py`) + missing `test_all_*_prompts_have_skill_link` guards → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7879
- 2026-06-25 Round 91 panel — cross-category Human Review library registry (`test_human_review_library_registry.py`, `PROMPT_LIBRARY_CATEGORIES`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)

reflective-prompt-library/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
3030

3131
## Governance Panel Record
3232

33-
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–93, options A–GQ) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
33+
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–94, options A–GV) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
3434

3535
## Directory Map
3636

reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ ROUTE-002 measures unseen phrasing separately from ROUTE-001. Round 7 (2026-06-2
314314
2. **ROUTE-001/002/003 in CI** — 128 + 102 + 53 paraphrases at 100% consistency (seeded fixtures); `validate_route_fixture.py` gates minimum coverage
315315
3. **Governance validators** — links, lint, governance metadata, PROJECT_KNOWLEDGE, benchmark fixture, skill examples
316316
4. **Harness policy docs** — CONTRIBUTING, AGENTS, SKILL_INSTALLATION, maintenance playbook
317-
5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_prompt_skill_links_library_registry.py`, `test_prompt_contract_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 615+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90); library-wide contract heading registry (`PROMPT_CONTRACT_HEADINGS`, Round 93)
317+
5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_prompt_skill_links_library_registry.py`, `test_prompt_contract_library_registry.py`, `test_prompt_primary_workflow_surface_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 630+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90); library-wide contract heading registry (`PROMPT_CONTRACT_HEADINGS`, Round 93)
318318

319319
### Ongoing maintenance (not blockers)
320320

@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
384384
- ✅ Benchmark fixture gate plus optional manual benchmark runs
385385
- ✅ Research-backed design decisions
386386

387-
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–93; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
387+
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–94; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.

reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2728,4 +2728,54 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
27282728

27292729
**Resealed 2026-06-25** after **Round 93** (options GM–GQ). Eval_harness contract headings are now library-registry checked across all `00-core``06-repo` prompts with shared `PROMPT_CONTRACT_HEADINGS`. Holdout expansion remains recurrence-gated maintenance.
27302730

2731+
## Round 94 — cross-category Primary workflow surface preamble library registry (2026-06-25)
2732+
2733+
**Options GR–GV** | Six-lens panel (Opus, Codex, Gemini, Composer, Sakana, GLM)
2734+
2735+
### Round 94 options
2736+
2737+
| ID | Proposal | Verdict |
2738+
| --- | --- | --- |
2739+
| GR | DRY `assert_primary_workflow_surface_preamble` + `SUPPORTING_LENS_PRIMARY_SURFACE_BY_CATEGORY` in `prompt_eval_helpers.py` | **Agree** |
2740+
| GS | `test_prompt_primary_workflow_surface_library_registry.py` cross-category registry + library glob parity | **Agree** |
2741+
| GT | GLOSSARY playbook step 26 + governance sync | **Agree** |
2742+
| GU | ROUTE holdout expansion | **Defer** |
2743+
| GV | Router / tenth skill / benchmark CI | **Reject** |
2744+
2745+
### Round 94 verdict table
2746+
2747+
| ID | Option | Verdict | Action |
2748+
| --- | --- | --- | --- |
2749+
| GR | Primary surface preamble helper | **Agree** | shared helper + supporting-lens map |
2750+
| GS | Primary surface library registry | **Agree** | `test_prompt_primary_workflow_surface_library_registry.py` |
2751+
| GT | Playbook + docs | **Agree** | step 26; panel round 94 sync |
2752+
| GU | Holdout expansion | **Defer** | maintenance |
2753+
| GV | Router/tenth skill/benchmark CI | **Reject** | no change |
2754+
2755+
### Socratic rationale (Round 94)
2756+
2757+
- **Opus:** Rounds 91–93 closed HR, cross-link, and contract registries; Primary workflow surface preamble guards remain duplicated across seven harness files with no library-wide falsifiability.
2758+
- **Codex:** Centralizing `assert_primary_workflow_surface_preamble` prevents per-category drift; `runtime-trust-boundary.md` Supporting-lens exemption belongs in one map.
2759+
- **Gemini:** Registry adds one sweep over 49 prompts without extra CI cost beyond pytest.
2760+
- **Composer:** IDE contributors edit Purpose preambles often — one helper matches Round 87 HR DRY pattern.
2761+
- **Sakana:** Thinking lenses use plural "surfaces"; substring guard still matches without a separate code path.
2762+
- **GLM:** Supporting-lens map is English-canonical; no TW SKILL translation needed.
2763+
2764+
## Implemented Changes (Round 94)
2765+
2766+
- `plans/tests/prompt_eval_helpers.py`: `SUPPORTING_LENS_PRIMARY_SURFACE_BY_CATEGORY`, `assert_primary_workflow_surface_preamble`
2767+
- `plans/tests/test_*_prompts_eval_harness.py`: DRY primary-surface preamble guards
2768+
- `plans/tests/test_prompt_primary_workflow_surface_library_registry.py`: cross-category registry + library glob parity
2769+
- `GLOSSARY.md`: playbook Rounds 1–94; step 26 for primary-surface library registry
2770+
- `QUALITY_GATES_SUMMARY.md`: primary-surface registry note; panel Rounds 1–94; 630+ pytest floor
2771+
- `PROJECT_KNOWLEDGE.md`: Decision Index Round 94 entry
2772+
- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 94 sync
2773+
2774+
## Verification (Round 94)
2775+
2776+
- `make all`: pytest + ROUTE-001/002/003 100%
2777+
2778+
---
2779+
2780+
**Resealed 2026-06-25** after **Round 94** (options GR–GV). Primary workflow surface preambles are now library-registry checked across all `00-core``06-repo` prompts with shared `assert_primary_workflow_surface_preamble` and Supporting-lens exemptions in one map. Holdout expansion remains recurrence-gated maintenance.
27312781

reflective-prompt-library/plans/tests/prompt_eval_helpers.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,3 +91,30 @@ def assert_human_review_sets_partition(
9191
f"required ∪ exempt {sorted(required | exempt)} != all prompts {sorted(all_names)}"
9292
)
9393
assert not required & exempt, "required and exempt Human Review sets must not overlap"
94+
95+
SUPPORTING_LENS_PRIMARY_SURFACE_BY_CATEGORY: dict[str, frozenset[str]] = {
96+
"04-agent": frozenset({"runtime-trust-boundary.md"}),
97+
}
98+
99+
100+
def supporting_lens_exempt_for_category(category: str) -> frozenset[str]:
101+
"""Prompts that declare Supporting lens for instead of Primary workflow surface(s)."""
102+
return SUPPORTING_LENS_PRIMARY_SURFACE_BY_CATEGORY.get(category, frozenset())
103+
104+
105+
def assert_primary_workflow_surface_preamble(
106+
prompt_path: Path,
107+
*,
108+
category: str,
109+
) -> None:
110+
"""Purpose preambles must declare Primary workflow surface(s) or Supporting lens."""
111+
preamble = prompt_preamble(prompt_path)
112+
if prompt_path.name in supporting_lens_exempt_for_category(category):
113+
assert "Supporting lens for" in preamble, (
114+
f"{prompt_path.name} Purpose should use Supporting lens for workflow skills"
115+
)
116+
else:
117+
assert "Primary workflow surface" in preamble, (
118+
f"{prompt_path.name} Purpose should list Primary workflow surface(s)"
119+
)
120+

reflective-prompt-library/plans/tests/test_agent_prompts_eval_harness.py

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
sys.path.insert(0, str(Path(__file__).parent))
1010

1111
from eval_harness import EvalHarness # noqa: E402
12-
from prompt_eval_helpers import assert_human_review_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings # noqa: E402
12+
from prompt_eval_helpers import assert_human_review_preamble, assert_primary_workflow_surface_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings # noqa: E402
1313

1414
REQUIRED_HEADINGS = PROMPT_CONTRACT_HEADINGS
1515
MIN_SCORE = PROMPT_EVAL_MIN_SCORE
@@ -19,7 +19,6 @@
1919

2020
AGENT_PROMPTS = tuple(sorted(AGENT_DIR.glob("*.md")))
2121
AGENT_PROMPTS_WITH_HUMAN_REVIEW = prompts_with_human_review(AGENT_PROMPTS)
22-
SUPPORTING_LENS_AGENT_PROMPTS = frozenset({"runtime-trust-boundary.md"})
2322
AGENT_HUMAN_REVIEW_REQUIRED = frozenset({
2423
"agent-scaffold-provenance.md",
2524
"agent-selection.md",
@@ -77,15 +76,7 @@ def test_agent_prompts_cover_agent_workflow_surfaces():
7776
def test_agent_prompts_have_workflow_surface_preamble_line():
7877
"""04-agent prompts use Primary workflow surface(s) or Supporting lens (trust boundary)."""
7978
for prompt_path in AGENT_PROMPTS:
80-
preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
81-
if prompt_path.name in SUPPORTING_LENS_AGENT_PROMPTS:
82-
assert "Supporting lens for" in preamble, (
83-
f"{prompt_path.name} Purpose should use Supporting lens for workflow skills"
84-
)
85-
else:
86-
assert "Primary workflow surface" in preamble, (
87-
f"{prompt_path.name} Purpose should list Primary workflow surface(s)"
88-
)
79+
assert_primary_workflow_surface_preamble(prompt_path, category="04-agent")
8980

9081

9182
@pytest.mark.parametrize(

reflective-prompt-library/plans/tests/test_context_prompts_eval_harness.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
sys.path.insert(0, str(Path(__file__).parent))
1010

1111
from eval_harness import EvalHarness # noqa: E402
12-
from prompt_eval_helpers import assert_human_review_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings # noqa: E402
12+
from prompt_eval_helpers import assert_human_review_preamble, assert_primary_workflow_surface_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings # noqa: E402
1313

1414
REQUIRED_HEADINGS = PROMPT_CONTRACT_HEADINGS
1515
MIN_SCORE = PROMPT_EVAL_MIN_SCORE
@@ -73,10 +73,7 @@ def test_context_prompts_cover_context_workflow_surfaces():
7373
def test_context_prompts_have_primary_workflow_surfaces_line():
7474
"""All 03-context prompts declare Primary workflow surface(s) in Purpose preambles."""
7575
for prompt_path in CONTEXT_PROMPTS:
76-
preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
77-
assert "Primary workflow surface" in preamble, (
78-
f"{prompt_path.name} Purpose should list Primary workflow surface(s)"
79-
)
76+
assert_primary_workflow_surface_preamble(prompt_path, category="03-context")
8077

8178

8279
@pytest.mark.parametrize(

reflective-prompt-library/plans/tests/test_core_prompts_eval_harness.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from prompt_eval_helpers import (
1313
PROMPT_CONTRACT_HEADINGS,
1414
PROMPT_EVAL_MIN_SCORE,
15+
assert_primary_workflow_surface_preamble,
1516
assert_prompt_contract_headings, # noqa: E402
1617
assert_human_review_exempt_have_no_preamble_section,
1718
assert_human_review_preamble,
@@ -79,10 +80,7 @@ def test_core_prompts_cover_brief_and_dispatch():
7980
def test_core_prompts_have_primary_workflow_surfaces_line():
8081
"""All 00-core prompts declare Primary workflow surface(s) in Purpose preambles."""
8182
for prompt_path in CORE_PROMPTS:
82-
preamble = prompt_path.read_text(encoding="utf-8").split("```", 1)[0]
83-
assert "Primary workflow surface" in preamble, (
84-
f"{prompt_path.name} Purpose should list Primary workflow surface(s)"
85-
)
83+
assert_primary_workflow_surface_preamble(prompt_path, category="00-core")
8684

8785

8886
@pytest.mark.parametrize(

0 commit comments

Comments
 (0)