TeaEntityLab
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎reflective-prompt-library/GLOSSARY.md‎
Lines changed: 2 additions & 2 deletions b/‎reflective-prompt-library/GLOSSARY.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎reflective-prompt-library/PROJECT_KNOWLEDGE.md‎
Lines changed: 1 addition & 0 deletions b/‎reflective-prompt-library/PROJECT_KNOWLEDGE.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎reflective-prompt-library/README.md‎
Lines changed: 1 addition & 1 deletion b/‎reflective-prompt-library/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md‎
Lines changed: 2 additions & 2 deletions b/‎reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md‎
Lines changed: 52 additions & 0 deletions b/‎reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md‎
Lines changed: 52 additions & 0 deletions
diff --git a/‎reflective-prompt-library/plans/tests/prompt_eval_helpers.py‎
Lines changed: 18 additions & 0 deletions b/‎reflective-prompt-library/plans/tests/prompt_eval_helpers.py‎
Lines changed: 18 additions & 0 deletions
diff --git a/‎reflective-prompt-library/plans/tests/test_agent_prompts_eval_harness.py‎
Lines changed: 3 additions & 5 deletions b/‎reflective-prompt-library/plans/tests/test_agent_prompts_eval_harness.py‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎reflective-prompt-library/plans/tests/test_context_prompts_eval_harness.py‎
Lines changed: 3 additions & 5 deletions b/‎reflective-prompt-library/plans/tests/test_context_prompts_eval_harness.py‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎reflective-prompt-library/plans/tests/test_core_prompts_eval_harness.py‎
Lines changed: 4 additions & 4 deletions b/‎reflective-prompt-library/plans/tests/test_core_prompts_eval_harness.py‎
Lines changed: 4 additions & 4 deletions
@@ -21,7 +21,7 @@ Full library docs: [reflective-prompt-library/README.md](reflective-prompt-libra
 ## Governance
 
 - **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md) — quality gates, routing maintenance (R8–R12), `make all`
-- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–97)
+- **Panel record:** [multi-agent-panel-consensus](reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md) — six-lens Socratic consensus (Rounds 1–98)
 - **Operator playbook:** [GLOSSARY.md](reflective-prompt-library/GLOSSARY.md) — Governance Maintenance Playbook
 
 The repository contains:
 
@@ -337,7 +337,7 @@ Curated top-of-cheatsheet summary of high-confusion routing traps (ROUTE-002 hol
 
 ## Governance Maintenance Playbook / 治理維護手冊
 
-Ongoing upkeep after panel close (Rounds 1–97). Not agent instructions — operator checklist.
+Ongoing upkeep after panel close (Rounds 1–98). Not agent instructions — operator checklist.
 
 **Operational test:** Before router tuning, add fresh ROUTE-002/003 holdout phrases; run `make all`; record decisions in `PROJECT_KNOWLEDGE.md` Decision Index when governance surface changes.
 
@@ -369,5 +369,5 @@ Ongoing upkeep after panel close (Rounds 1–97). Not agent instructions — ope
 26. When editing composable prompt Purpose preambles, keep `Primary workflow surface(s)` / Supporting-lens lines via `assert_primary_workflow_surface_preamble` in `prompt_eval_helpers.py`; update `SUPPORTING_LENS_PRIMARY_SURFACE_BY_CATEGORY` for exemptions; run `test_prompt_primary_workflow_surface_library_registry.py` plus per-category `test_*_prompts_eval_harness.py` guards.
 27. When editing category workflow skill coverage tuples, keep frozen `*_COVER_WORKFLOW_SKILLS` in `test_*_prompts_eval_harness.py` aligned with `assert_category_workflow_skill_coverage`; `01-thinking` stays exempt (consumer graph); run `test_workflow_skill_coverage_library_registry.py`.
 28. When editing eval_harness score floors, keep `PROMPT_EVAL_MIN_SCORE` in `prompt_eval_helpers.py` and use `assert_prompt_meets_eval_harness_floor` in per-category `test_*_prompts_eval_harness.py` guards; run `test_prompt_eval_harness_score_library_registry.py`.
-28. When editing eval_harness score floors, keep `PROMPT_EVAL_MIN_SCORE` in `prompt_eval_helpers.py` and use `assert_prompt_meets_eval_harness_floor` in per-category `test_*_prompts_eval_harness.py` guards; run `test_prompt_eval_harness_score_library_registry.py`.
 29. When editing per-category `reference_workflow_skills` guards, use `assert_prompt_references_workflow_skill` in `prompt_eval_helpers.py` (preamble-scoped, not fenced templates); run `test_prompt_workflow_skill_reference_library_registry.py` plus per-category harness guards.
+30. When editing per-category eval_harness fixtures, keep `PROMPT_LIBRARY_REPO_ROOT` and `make_category_eval_harness_fixture` in `prompt_eval_helpers.py`; run `test_prompt_eval_harness_fixture_library_registry.py` plus per-category harness guards.
@@ -72,6 +72,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
 
 ## Decision Index
 
+- 2026-06-25 Round 98 panel — cross-category eval_harness fixture library registry (`test_prompt_eval_harness_fixture_library_registry.py`, DRY `make_category_eval_harness_fixture`, `PROMPT_LIBRARY_REPO_ROOT`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 97 panel — cross-category workflow skill reference library registry (`test_prompt_workflow_skill_reference_library_registry.py`, DRY `assert_prompt_references_workflow_skill`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 96 panel — cross-category eval_harness score floor library registry (`test_prompt_eval_harness_score_library_registry.py`, DRY `assert_prompt_meets_eval_harness_floor`) → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 - 2026-06-25 Round 85 panel — composable prompt Primary workflow surface preamble guards (`test_*_prompts_eval_harness.py`) + Supporting-lens exemption → [record](plans/multi-agent-panel-consensus-2026-06-25.md)
 
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
 
 ## Governance Panel Record
 
-Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–97, options A–HK) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
+Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–98, options A–HP) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
 
 ## Directory Map
 
 
@@ -314,7 +314,7 @@ ROUTE-002 measures unseen phrasing separately from ROUTE-001. Round 7 (2026-06-2
 2. **ROUTE-001/002/003 in CI** — 128 + 102 + 53 paraphrases at 100% consistency (seeded fixtures); `validate_route_fixture.py` gates minimum coverage
 3. **Governance validators** — links, lint, governance metadata, PROJECT_KNOWLEDGE, benchmark fixture, skill examples
 4. **Harness policy docs** — CONTRIBUTING, AGENTS, SKILL_INSTALLATION, maintenance playbook
-5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_prompt_skill_links_library_registry.py`, `test_prompt_contract_library_registry.py`, `test_prompt_primary_workflow_surface_library_registry.py`, `test_workflow_skill_coverage_library_registry.py`, `test_prompt_eval_harness_score_library_registry.py`, `test_prompt_workflow_skill_reference_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 660+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90); library-wide contract heading registry (`PROMPT_CONTRACT_HEADINGS`, Round 93); workflow skill coverage registry (`*_COVER_WORKFLOW_SKILLS`, Round 95); eval_harness score floor registry (`PROMPT_EVAL_MIN_SCORE`, Round 96); workflow skill reference registry (`assert_prompt_references_workflow_skill`, Round 97)
+5. **Doc anti-drift** — `test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_human_review_library_registry.py`, `test_prompt_skill_links_library_registry.py`, `test_prompt_contract_library_registry.py`, `test_prompt_primary_workflow_surface_library_registry.py`, `test_workflow_skill_coverage_library_registry.py`, `test_prompt_eval_harness_score_library_registry.py`, `test_prompt_workflow_skill_reference_library_registry.py`, `test_prompt_eval_harness_fixture_library_registry.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py`, `test_validate_governance.py`, `test_validate_links.py`, `test_lint_skills.py`, `test_skill_module_contract.py` (Escalation subsection + Trigger/Methods/Output/Never; 670+ pytest anti-drift suite in CI); reciprocal thinking-lens ↔ skill checks and `00-core` + composable `Primary workflow surface(s)` ↔ `*_SKILL_LINKS` parity in `test_prompt_cross_links.py` (including strict Primary workflow surfaces parity via `test_thinking_lens_primary_surfaces_match_consumer_graph`); Human Review + Escalation route-target guards in thinking/skill contract tests; composable `Primary workflow surface(s)` / Supporting-lens preamble guards and composable `## Human Review` preamble guards (route to `reflective-risk`) via `prompt_eval_helpers.assert_human_review_preamble` in `test_*_prompts_eval_harness.py`; frozen `*_HUMAN_REVIEW_REQUIRED` / `*_HUMAN_REVIEW_EXEMPT` set parity across all prompt categories (Round 90); library-wide contract heading registry (`PROMPT_CONTRACT_HEADINGS`, Round 93); workflow skill coverage registry (`*_COVER_WORKFLOW_SKILLS`, Round 95); eval_harness score floor registry (`PROMPT_EVAL_MIN_SCORE`, Round 96); workflow skill reference registry (`assert_prompt_references_workflow_skill`, Round 97); eval_harness fixture registry (`make_category_eval_harness_fixture`, Round 98)
 
 ### Ongoing maintenance (not blockers)
 
@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
 - ✅ Benchmark fixture gate plus optional manual benchmark runs
 - ✅ Research-backed design decisions
 
-The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–97; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
+The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–98; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
@@ -2935,3 +2935,55 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
 
 **Resealed 2026-06-25** after **Round 97** (options HG–HK). Workflow skill references are now library-registry checked across all `00-core`–`06-repo` categories with shared `assert_prompt_references_workflow_skill` (preamble-scoped). Holdout expansion remains recurrence-gated maintenance.
 
+## Round 98 — cross-category eval_harness fixture library registry (2026-06-25)
+
+**Options HL–HP** | Six-lens panel (Opus, Codex, Gemini, Composer, Sakana, GLM)
+
+### Round 98 options
+
+| ID | Proposal | Verdict |
+| --- | --- | --- |
+| HL | DRY `make_category_eval_harness_fixture` + `PROMPT_LIBRARY_REPO_ROOT` in `prompt_eval_helpers.py` | **Agree** |
+| HM | `test_prompt_eval_harness_fixture_library_registry.py` — fixture + `REPO_ROOT` parity registry | **Agree** |
+| HN | GLOSSARY playbook step 30 + governance sync | **Agree** |
+| HO | ROUTE holdout expansion | **Defer** |
+| HP | Router / tenth skill / benchmark CI | **Reject** |
+
+### Round 98 verdict table
+
+| ID | Option | Verdict | Action |
+| --- | --- | --- | --- |
+| HL | EvalHarness fixture DRY factory | **Agree** | `make_category_eval_harness_fixture` + `PROMPT_LIBRARY_REPO_ROOT` |
+| HM | EvalHarness fixture library registry | **Agree** | `test_prompt_eval_harness_fixture_library_registry.py` |
+| HN | Playbook + docs | **Agree** | step 30; panel round 98 sync |
+| HO | Holdout expansion | **Defer** | maintenance |
+| HP | Router/tenth skill/benchmark CI | **Reject** | no change |
+
+### Socratic rationale (Round 98)
+
+- **Opus:** Round 97 closed workflow skill references; seven per-category harness files still duplicate identical module-scoped `EvalHarness` fixtures with no library-wide falsifiability.
+- **Codex:** Centralizing `make_category_eval_harness_fixture` prevents `repo_root` drift; `PROMPT_LIBRARY_REPO_ROOT` object-identity checks catch path miscalculations.
+- **Gemini:** Zero runtime cost; removes boilerplate only.
+- **Composer:** Mirrors R91–R97 registry pattern; one factory + one registry file.
+- **Sakana:** Fixture parity documents that all categories evaluate prompts against the same TeaPrompt root.
+- **GLM:** Playbook step 30 gives operators a single checklist line for harness fixture edits.
+
+**All roles agree.**
+
+## Implemented Changes (Round 98)
+
+- `plans/tests/prompt_eval_helpers.py`: `PROMPT_LIBRARY_REPO_ROOT`, `make_category_eval_harness_fixture`
+- `plans/tests/test_*_prompts_eval_harness.py`: DRY harness fixtures via shared factory
+- `plans/tests/test_prompt_eval_harness_fixture_library_registry.py`: cross-category fixture registry
+- `GLOSSARY.md`: playbook Rounds 1–98; step 30 for eval_harness fixture library registry; dedupe step 28
+- `QUALITY_GATES_SUMMARY.md`: fixture registry note; panel Rounds 1–98; 670+ pytest floor
+- `PROJECT_KNOWLEDGE.md`: Decision Index Round 98 entry
+- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 98 sync
+
+## Verification (Round 98)
+
+- `make all`: 672 pytest + ROUTE-001/002/003 100%
+
+---
+
+**Resealed 2026-06-25** after **Round 98** (options HL–HP). Eval_harness fixtures are now library-registry checked across all `00-core`–`06-repo` categories with shared `make_category_eval_harness_fixture` and `PROMPT_LIBRARY_REPO_ROOT`. Holdout expansion remains recurrence-gated maintenance.
@@ -24,6 +24,24 @@
 
 PROMPT_EVAL_MIN_SCORE = 80.0
 
+PROMPT_LIBRARY_REPO_ROOT = str(Path(__file__).resolve().parent.parent.parent.parent)
+
+CATEGORY_EVAL_HARNESS_FIXTURE_MARKER = "_from_category_eval_harness_fixture"
+
+
+def make_category_eval_harness_fixture(repo_root: str):
+    """Return a module-scoped EvalHarness pytest fixture bound to repo_root."""
+    import pytest
+    from eval_harness import EvalHarness
+
+    @pytest.fixture(scope="module")
+    def harness() -> EvalHarness:
+        return EvalHarness(repo_root=repo_root)
+
+    setattr(harness, CATEGORY_EVAL_HARNESS_FIXTURE_MARKER, True)
+    return harness
+
+
 
 def assert_prompt_contract_headings(prompt_path: Path) -> None:
     """Contract headings must appear in preamble outside fenced template blocks."""
 
@@ -9,13 +9,13 @@
 sys.path.insert(0, str(Path(__file__).parent))
 
 from eval_harness import EvalHarness  # noqa: E402
-from prompt_eval_helpers import assert_category_workflow_skill_coverage, assert_human_review_preamble, assert_primary_workflow_surface_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings, assert_prompt_references_workflow_skill, assert_prompt_meets_eval_harness_floor  # noqa: E402
+from prompt_eval_helpers import PROMPT_LIBRARY_REPO_ROOT, make_category_eval_harness_fixture, assert_category_workflow_skill_coverage, assert_human_review_preamble, assert_primary_workflow_surface_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings, assert_prompt_references_workflow_skill, assert_prompt_meets_eval_harness_floor  # noqa: E402
 
 REQUIRED_HEADINGS = PROMPT_CONTRACT_HEADINGS
 MIN_SCORE = PROMPT_EVAL_MIN_SCORE
 
 AGENT_DIR = Path(__file__).parent.parent.parent / "04-agent"
-REPO_ROOT = str(Path(__file__).parent.parent.parent.parent)
+REPO_ROOT = PROMPT_LIBRARY_REPO_ROOT
 
 AGENT_PROMPTS = tuple(sorted(AGENT_DIR.glob("*.md")))
 AGENT_COVER_WORKFLOW_SKILLS = (
@@ -43,9 +43,7 @@
 
 
 
-@pytest.fixture(scope="module")
-def harness() -> EvalHarness:
-    return EvalHarness(repo_root=REPO_ROOT)
+harness = make_category_eval_harness_fixture(REPO_ROOT)
 
 
 @pytest.mark.parametrize("prompt_path", AGENT_PROMPTS, ids=lambda p: p.name)
 
@@ -9,13 +9,13 @@
 sys.path.insert(0, str(Path(__file__).parent))
 
 from eval_harness import EvalHarness  # noqa: E402
-from prompt_eval_helpers import assert_category_workflow_skill_coverage, assert_human_review_preamble, assert_primary_workflow_surface_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings, assert_prompt_references_workflow_skill, assert_prompt_meets_eval_harness_floor  # noqa: E402
+from prompt_eval_helpers import PROMPT_LIBRARY_REPO_ROOT, make_category_eval_harness_fixture, assert_category_workflow_skill_coverage, assert_human_review_preamble, assert_primary_workflow_surface_preamble, prompts_with_human_review, assert_human_review_required_matches_detection, assert_human_review_exempt_have_no_preamble_section, assert_human_review_sets_partition, PROMPT_CONTRACT_HEADINGS, PROMPT_EVAL_MIN_SCORE, assert_prompt_contract_headings, assert_prompt_references_workflow_skill, assert_prompt_meets_eval_harness_floor  # noqa: E402
 
 REQUIRED_HEADINGS = PROMPT_CONTRACT_HEADINGS
 MIN_SCORE = PROMPT_EVAL_MIN_SCORE
 
 CONTEXT_DIR = Path(__file__).parent.parent.parent / "03-context"
-REPO_ROOT = str(Path(__file__).parent.parent.parent.parent)
+REPO_ROOT = PROMPT_LIBRARY_REPO_ROOT
 
 CONTEXT_PROMPTS = tuple(sorted(CONTEXT_DIR.glob("*.md")))
 CONTEXT_COVER_WORKFLOW_SKILLS = (
@@ -40,9 +40,7 @@
 
 
 
-@pytest.fixture(scope="module")
-def harness() -> EvalHarness:
-    return EvalHarness(repo_root=REPO_ROOT)
+harness = make_category_eval_harness_fixture(REPO_ROOT)
 
 
 @pytest.mark.parametrize("prompt_path", CONTEXT_PROMPTS, ids=lambda p: p.name)
 
@@ -10,6 +10,8 @@
 
 from eval_harness import EvalHarness  # noqa: E402
 from prompt_eval_helpers import (
+    PROMPT_LIBRARY_REPO_ROOT,
+    make_category_eval_harness_fixture,
     PROMPT_CONTRACT_HEADINGS,
     PROMPT_EVAL_MIN_SCORE,
     assert_primary_workflow_surface_preamble,
@@ -27,7 +29,7 @@
 MIN_SCORE = PROMPT_EVAL_MIN_SCORE
 
 CORE_DIR = Path(__file__).parent.parent.parent / "00-core"
-REPO_ROOT = str(Path(__file__).parent.parent.parent.parent)
+REPO_ROOT = PROMPT_LIBRARY_REPO_ROOT
 
 CORE_PROMPTS = tuple(sorted(CORE_DIR.glob("*.md")))
 CORE_COVER_WORKFLOW_SKILLS = (
@@ -51,9 +53,7 @@
 CORE_PROMPTS_WITH_HUMAN_REVIEW = prompts_with_human_review(CORE_PROMPTS)
 
 
-@pytest.fixture(scope="module")
-def harness() -> EvalHarness:
-    return EvalHarness(repo_root=REPO_ROOT)
+harness = make_category_eval_harness_fixture(REPO_ROOT)
 
 
 @pytest.mark.parametrize("prompt_path", CORE_PROMPTS, ids=lambda p: p.name)