You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: reflective-prompt-library/GLOSSARY.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -337,7 +337,7 @@ Curated top-of-cheatsheet summary of high-confusion routing traps (ROUTE-002 hol
337
337
338
338
## Governance Maintenance Playbook / 治理維護手冊
339
339
340
-
Ongoing upkeep after panel close (Rounds 1–95). Not agent instructions — operator checklist.
340
+
Ongoing upkeep after panel close (Rounds 1–96). Not agent instructions — operator checklist.
341
341
342
342
**Operational test:** Before router tuning, add fresh ROUTE-002/003 holdout phrases; run `make all`; record decisions in `PROJECT_KNOWLEDGE.md` Decision Index when governance surface changes.
343
343
@@ -368,3 +368,4 @@ Ongoing upkeep after panel close (Rounds 1–95). Not agent instructions — ope
368
368
25. When adding composable prompts or editing eval_harness contract preambles, keep `PROMPT_CONTRACT_HEADINGS` / `PROMPT_EVAL_MIN_SCORE` in `prompt_eval_helpers.py` and run `test_prompt_contract_library_registry.py` plus per-category `test_*_prompts_eval_harness.py` guards.
369
369
26. When editing composable prompt Purpose preambles, keep `Primary workflow surface(s)` / Supporting-lens lines via `assert_primary_workflow_surface_preamble` in `prompt_eval_helpers.py`; update `SUPPORTING_LENS_PRIMARY_SURFACE_BY_CATEGORY` for exemptions; run `test_prompt_primary_workflow_surface_library_registry.py` plus per-category `test_*_prompts_eval_harness.py` guards.
370
370
27. When editing category workflow skill coverage tuples, keep frozen `*_COVER_WORKFLOW_SKILLS` in `test_*_prompts_eval_harness.py` aligned with `assert_category_workflow_skill_coverage`; `01-thinking` stays exempt (consumer graph); run `test_workflow_skill_coverage_library_registry.py`.
371
+
28. When editing eval_harness score floors, keep `PROMPT_EVAL_MIN_SCORE` in `prompt_eval_helpers.py` and use `assert_prompt_meets_eval_harness_floor` in per-category `test_*_prompts_eval_harness.py` guards; run `test_prompt_eval_harness_score_library_registry.py`.
Copy file name to clipboardExpand all lines: reflective-prompt-library/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
30
30
31
31
## Governance Panel Record
32
32
33
-
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–95, options A–HA) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
33
+
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–96, options A–HF) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_prompt_skill_links_library_registry.py`, `test_prompt_contract_library_registry.py`, `test_prompt_primary_workflow_surface_library_registry.py`, `test_workflow_skill_coverage_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 640+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90); library-wide contract heading registry (`PROMPT_CONTRACT_HEADINGS`, Round 93); workflow skill coverage registry (`*_COVER_WORKFLOW_SKILLS`, Round 95)
317
+
5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_prompt_skill_links_library_registry.py`, `test_prompt_contract_library_registry.py`, `test_prompt_primary_workflow_surface_library_registry.py`, `test_workflow_skill_coverage_library_registry.py`, `test_prompt_eval_harness_score_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 650+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90); library-wide contract heading registry (`PROMPT_CONTRACT_HEADINGS`, Round 93); workflow skill coverage registry (`*_COVER_WORKFLOW_SKILLS`, Round 95); eval_harness score floor registry (`PROMPT_EVAL_MIN_SCORE`, Round 96)
318
318
319
319
### Ongoing maintenance (not blockers)
320
320
@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
384
384
- ✅ Benchmark fixture gate plus optional manual benchmark runs
385
385
- ✅ Research-backed design decisions
386
386
387
-
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–95; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
387
+
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–96; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
Copy file name to clipboardExpand all lines: reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md
+52-1Lines changed: 52 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2829,4 +2829,55 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
2829
2829
2830
2830
---
2831
2831
2832
-
**Resealed 2026-06-25** after **Round 95** (options GW–HA). Workflow skill coverage is now library-registry checked across all `00-core`–`06-repo` categories with shared `assert_category_workflow_skill_coverage` and frozen tuples per harness (`01-thinking` exempt via empty tuple). Holdout expansion remains recurrence-gated maintenance.
| HE | Holdout expansion |**Defer**| maintenance |
2854
+
| HF | Router/tenth skill/benchmark CI |**Reject**| no change |
2855
+
2856
+
### Socratic rationale (Round 96)
2857
+
2858
+
-**Opus:** Rounds 91–95 closed HR, cross-link, contract, primary-surface, and workflow-coverage registries; per-category `meets_eval_harness_floor` guards remain duplicated with no library-wide falsifiability.
2859
+
-**Codex:** Centralizing `assert_prompt_meets_eval_harness_floor` prevents score-floor drift; registry sweep catches regressions across all 49 prompts in one place.
2860
+
-**Gemini:** Reject duplicating `EvalHarness.evaluate_file` logic in seven harness files — token/cost of maintenance, not runtime.
2861
+
-**Composer:** IDE adopters need one playbook step when bumping `PROMPT_EVAL_MIN_SCORE`; library registry mirrors R93 contract pattern.
2862
+
-**Sakana:** Score floor is orthogonal to thinking consumer graph — all seven categories including `01-thinking` belong in registry.
**Resealed 2026-06-25** after **Round 96** (options HB–HF). Eval_harness score floors are now library-registry checked across all `00-core`–`06-repo` categories with shared `assert_prompt_meets_eval_harness_floor` and per-category `MIN_SCORE` aliases to `PROMPT_EVAL_MIN_SCORE`. Holdout expansion remains recurrence-gated maintenance.
0 commit comments