Skip to content

Commit 6dd08db

Browse files
committed
Round 89: freeze 00-core Human Review required/exempt sets
Codify CORE_HUMAN_REVIEW_REQUIRED and CORE_HUMAN_REVIEW_EXEMPT in test_core_prompts_eval_harness.py with partition and detection parity tests. Add GLOSSARY playbook step 21 and sync panel/governance docs through Round 89 (options FT–FW).
1 parent acf0df4 commit 6dd08db

9 files changed

Lines changed: 93 additions & 11 deletions

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Full library docs: [reflective-prompt-library/README.md](reflective-prompt-libra
2121
## Governance
2222

2323
- **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) — quality gates, routing maintenance (R8–R12), `make all`
24-
- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–88)
24+
- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–89)
2525
- **Operator playbook:** [GLOSSARY.md](reflective-prompt-library/GLOSSARY.md) — Governance Maintenance Playbook
2626

2727
The repository contains:

reflective-prompt-library/GLOSSARY.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ Curated top-of-cheatsheet summary of high-confusion routing traps (ROUTE-002 hol
337337

338338
## Governance Maintenance Playbook / 治理維護手冊
339339

340-
Ongoing upkeep after panel close (Rounds 1–88). Not agent instructions — operator checklist.
340+
Ongoing upkeep after panel close (Rounds 1–89). Not agent instructions — operator checklist.
341341

342342
**Operational test:** Before router tuning, add fresh ROUTE-002/003 holdout phrases; run `make all`; record decisions in `PROJECT_KNOWLEDGE.md` Decision Index when governance surface changes.
343343

@@ -361,3 +361,4 @@ Ongoing upkeep after panel close (Rounds 1–88). Not agent instructions — ope
361361
18. When adding or editing composable prompts (`02-engineering``06-repo`) with `## Human Review`, keep preamble escalation routed to `reflective-risk` and run Human Review guards in `test_*_prompts_eval_harness.py` (exact heading match via `prompt_eval_helpers.py`).
362362
19. When editing Human Review guards, use `prompt_eval_helpers.assert_human_review_preamble` in all `test_*_prompts_eval_harness.py` files (thinking lenses + composable categories).
363363
20. When adding or editing risk-bearing `00-core/` prompts with `## Human Review`, keep preamble escalation routed to `reflective-risk` and run `test_core_prompts_eval_harness.py` Human Review guards via `prompt_eval_helpers.py`.
364+
21. When editing `00-core/` Human Review coverage, keep `CORE_HUMAN_REVIEW_REQUIRED` and `CORE_HUMAN_REVIEW_EXEMPT` in `test_core_prompts_eval_harness.py` aligned with preamble `## Human Review` sections; run core HR parity tests.

reflective-prompt-library/PROJECT_KNOWLEDGE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
7373
## Decision Index
7474

7575
- 2026-06-25 Round 85 panel — composable prompt Primary workflow surface preamble guards (`test_*_prompts_eval_harness.py`) + Supporting-lens exemption → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
76+
- 2026-06-25 Round 89 panel — `00-core` Human Review required/exempt set parity (`CORE_HUMAN_REVIEW_REQUIRED` / `CORE_HUMAN_REVIEW_EXEMPT`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7677
- 2026-06-25 Round 88 panel — `00-core` Human Review preamble guards on risk-bearing prompts + `test_core_prompts_eval_harness.py`[record](plans/multi-agent-panel-consensus-2026-06-25.md)
7778
- 2026-06-25 Round 87 panel — Human Review helper DRY + GLOSSARY playbook step repair → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
7879
- 2026-06-25 Round 86 panel — composable Human Review preamble guards + `reflective-risk` routing alignment → [record](plans/multi-agent-panel-consensus-2026-06-25.md)

reflective-prompt-library/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
3030

3131
## Governance Panel Record
3232

33-
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–88, options A–FS) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
33+
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–89, options A–FW) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
3434

3535
## Directory Map
3636

reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
384384
- ✅ Benchmark fixture gate plus optional manual benchmark runs
385385
- ✅ Research-backed design decisions
386386

387-
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–88; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
387+
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–89; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.

reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2498,3 +2498,44 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
24982498

24992499
**Resealed 2026-06-25** after **Round 88** (options FP–FS). Full prompt library now has Human Review preamble guards on thinking lenses (R81), composable prompts (R86), and risk-bearing `00-core` prompts (R88). Holdout expansion remains recurrence-gated maintenance.
25002500

2501+
---
2502+
2503+
## Round 89 — `00-core` Human Review required/exempt set parity (2026-06-25)
2504+
2505+
**Options FT–FW** | Six-lens panel (Opus, Codex, Gemini, Composer, Sakana, GLM)
2506+
2507+
### Round 89 options
2508+
2509+
| ID | Proposal | Verdict |
2510+
| --- | --- | --- |
2511+
| FT | Frozen `CORE_HUMAN_REVIEW_REQUIRED` / `CORE_HUMAN_REVIEW_EXEMPT` sets + pytest parity in `test_core_prompts_eval_harness.py` | **Agree** |
2512+
| FU | GLOSSARY playbook step 21 + governance sync | **Agree** |
2513+
| FV | ROUTE holdout expansion | **Defer** |
2514+
| FW | Router / tenth skill / benchmark CI | **Reject** |
2515+
2516+
### Round 89 verdict table
2517+
2518+
| ID | Option | Verdict | Action |
2519+
| --- | --- | --- | --- |
2520+
| FT | Core HR set parity | **Agree** | codify 6 required + 3 exempt opener prompts |
2521+
| FU | Playbook + docs | **Agree** | step 21; panel round 89 sync |
2522+
| FV | Holdout expansion | **Defer** | maintenance |
2523+
| FW | Router/tenth skill/benchmark CI | **Reject** | no change |
2524+
2525+
**All roles agree.**
2526+
2527+
## Implemented Changes (Round 89)
2528+
2529+
- `plans/tests/test_core_prompts_eval_harness.py`: `CORE_HUMAN_REVIEW_REQUIRED`, `CORE_HUMAN_REVIEW_EXEMPT`, partition + detection parity tests
2530+
- `GLOSSARY.md`: playbook Rounds 1–89; step 21 for core HR required/exempt sets
2531+
- `QUALITY_GATES_SUMMARY.md`: core HR set parity note; panel Rounds 1–89; 560+ pytest floor
2532+
- `PROJECT_KNOWLEDGE.md`: Decision Index Round 89 entry
2533+
- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 89 sync
2534+
2535+
## Verification (Round 89)
2536+
2537+
- `make all`: pytest + ROUTE-001/002/003 100%
2538+
2539+
## Panel status (updated)
2540+
2541+
**Resealed 2026-06-25** after **Round 89** (options FT–FW). `00-core` Human Review coverage is now explicit via frozen required/exempt sets; full library HR contract parity closed (thinking R81, composable R86, core R88–R89). Holdout expansion remains recurrence-gated maintenance.

reflective-prompt-library/plans/tests/test_core_prompts_eval_harness.py

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,11 @@
99
sys.path.insert(0, str(Path(__file__).parent))
1010

1111
from eval_harness import EvalHarness # noqa: E402
12-
from prompt_eval_helpers import assert_human_review_preamble, prompts_with_human_review # noqa: E402
12+
from prompt_eval_helpers import ( # noqa: E402
13+
assert_human_review_preamble,
14+
has_human_review_preamble,
15+
prompts_with_human_review,
16+
)
1317

1418
CORE_DIR = Path(__file__).parent.parent.parent / "00-core"
1519
REPO_ROOT = str(Path(__file__).parent.parent.parent.parent)
@@ -23,6 +27,20 @@
2327
)
2428

2529
CORE_PROMPTS = tuple(sorted(CORE_DIR.glob("*.md")))
30+
CORE_HUMAN_REVIEW_REQUIRED = frozenset({
31+
"core-full.md",
32+
"core-minimal.md",
33+
"core-short.md",
34+
"custom-instruction-en.md",
35+
"custom-instruction-zh.md",
36+
"important-task-full.md",
37+
})
38+
CORE_HUMAN_REVIEW_EXEMPT = frozenset({
39+
"daily-minimal.md",
40+
"global-controller.md",
41+
"master-prompt.md",
42+
})
43+
2644
CORE_PROMPTS_WITH_HUMAN_REVIEW = prompts_with_human_review(CORE_PROMPTS)
2745

2846

@@ -76,3 +94,24 @@ def test_core_prompts_have_primary_workflow_surfaces_line():
7694
def test_core_prompt_has_human_review_section(prompt_path: Path):
7795
"""Risk-bearing 00-core prompts declare Human Review escalation outside zh-TW templates."""
7896
assert_human_review_preamble(prompt_path)
97+
98+
99+
def test_core_human_review_required_set_matches_detection():
100+
"""Frozen required set must match prompts that declare ## Human Review in preambles."""
101+
detected = {p.name for p in CORE_PROMPTS_WITH_HUMAN_REVIEW}
102+
assert detected == CORE_HUMAN_REVIEW_REQUIRED
103+
104+
105+
def test_core_human_review_exempt_prompts_have_no_preamble_section():
106+
"""L1 opener prompts keep Human Review cues in fenced templates only."""
107+
for name in CORE_HUMAN_REVIEW_EXEMPT:
108+
assert not has_human_review_preamble(CORE_DIR / name), (
109+
f"{name} should not declare ## Human Review in preamble (exempt opener)"
110+
)
111+
112+
113+
def test_core_human_review_sets_partition_core_prompts():
114+
"""Required + exempt sets must cover all 00-core prompts without overlap."""
115+
all_names = {p.name for p in CORE_PROMPTS}
116+
assert CORE_HUMAN_REVIEW_REQUIRED | CORE_HUMAN_REVIEW_EXEMPT == all_names
117+
assert not CORE_HUMAN_REVIEW_REQUIRED & CORE_HUMAN_REVIEW_EXEMPT

reflective-prompt-library/plans/tests/test_glossary_structure.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@ def test_round_boundary_terms_present(glossary_text: str):
3030
assert heading in glossary_text, f"missing glossary section: {heading}"
3131

3232

33-
def test_maintenance_playbook_references_round_88(glossary_text: str):
33+
def test_maintenance_playbook_references_round_89(glossary_text: str):
3434
playbook = glossary_text.split("## Governance Maintenance Playbook", 1)[1]
35-
assert "Rounds 1–88" in playbook
36-
assert "Rounds 1–87" not in playbook and "Rounds 1-87" not in playbook
35+
assert "Rounds 1–89" in playbook
36+
assert "Rounds 1–88" not in playbook and "Rounds 1-88" not in playbook
3737

3838

3939

@@ -43,7 +43,7 @@ def test_maintenance_playbook_steps_on_separate_lines(glossary_text: str):
4343
assert re.search(r"guards\.\d+\.", playbook) is None, (
4444
"playbook steps merged without newline between numbers"
4545
)
46-
for step in ("17.", "18.", "19.", "20."):
46+
for step in ("17.", "18.", "19.", "20.", "21."):
4747
assert step in playbook
4848

4949

reflective-prompt-library/plans/tests/test_readme_governance.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@
1010
METHODOLOGY_MAP_EN = Path(__file__).parent.parent.parent / "METHODOLOGY_MAP.md"
1111
SKILL_MAP = Path(__file__).parent.parent.parent / "skills" / "skill-map.md"
1212

13-
CURRENT_PANEL_ROUND = "88"
14-
CURRENT_PANEL_OPTIONS = "A–FS"
13+
CURRENT_PANEL_ROUND = "89"
14+
CURRENT_PANEL_OPTIONS = "A–FW"
1515

1616

1717
@pytest.fixture(scope="module")

0 commit comments

Comments
 (0)