Skip to content

Commit 3cf341c

Browse files
committed
Round 76: standardize 06-repo prompt contracts + eval anti-drift
1 parent fe42bfb commit 3cf341c

11 files changed

Lines changed: 271 additions & 7 deletions

reflective-prompt-library/06-repo/AGENTS.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# AGENTS.md
22

3+
## Purpose
4+
5+
Repository-level harness policy for reflective engineering agents. Primary workflow surfaces: `reflective-dispatch`, `reflective-implement`, and `reflective-spec-plan`. Pairs with `01-thinking/why-what-how-done.md`, `01-thinking/critical-thinking-check.md`, and `01-thinking/socratic-reviewer.md`.
6+
7+
## Scope
8+
9+
- In scope: strictness-first routing, nine frozen workflow skills, evidence-backed completion, project-knowledge authority boundary.
10+
- Out of scope: multi-agent runtime, tenth core skill without promotion gate, silent rigor downgrade.
11+
12+
## Acceptance Criteria
13+
14+
- Non-trivial tasks produce goal, scope, acceptance criteria, and test-backed completion evidence.
15+
- High-blast-radius work stops for human review before irreversible action.
16+
17+
## Falsifiability
18+
19+
Name one task that would fail if this harness policy were ignored at session start.
20+
321
## Mission
422

523
This repository uses Reflective Engineering Agent Protocol.

reflective-prompt-library/06-repo/PROJECT_KNOWLEDGE.template.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,28 @@
11
Language: English
22

3+
## Purpose
4+
5+
Scaffold for project-design judgement layer (non-authoritative). Primary workflow surfaces: `reflective-handoff-retro` and `reflective-brief`. Pairs with `01-thinking/socratic-reviewer.md` and `01-thinking/falsifiability.md`.
6+
7+
## Scope
8+
9+
- In scope: governing principles, active direction, durable lessons, decision index, growth gate.
10+
- Out of scope: agent operating rules (belong in `AGENTS.md` or `SKILL.md`).
11+
12+
## Acceptance Criteria
13+
14+
- Each lesson and principle has evidence and a review or retirement trigger.
15+
- Milestones carry verifiable targets; done milestones reflow before retirement.
16+
17+
## Falsifiability
18+
19+
State what evidence would retire a recorded principle as stale.
20+
21+
## Human Review
22+
23+
Require human approval before promoting lessons into executable skills or overriding `AGENTS.md`.
24+
25+
326
# Project Knowledge — [Project Name]
427

528
> **NON-AUTHORITATIVE FILE.** This artifact records project-design judgement:

reflective-prompt-library/06-repo/codex-opencode.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# Codex / OpenCode Task Prompt
22

3+
## Purpose
4+
5+
Repo-aware coding task harness for inspection-first implementation. Primary workflow surfaces: `reflective-implement` and `reflective-dispatch`. Pairs with `01-thinking/why-what-how-done.md` and `01-thinking/falsifiability.md`.
6+
7+
## Scope
8+
9+
- In scope: repository inspection, brief plan, smallest safe edits, tests, final report.
10+
- Out of scope: changing task requirements or weakening tests without explicit approval.
11+
12+
## Acceptance Criteria
13+
14+
- Final report lists goal, files changed, acceptance criteria status, tests run, and risks.
15+
- Failures and skipped checks are reported honestly.
16+
17+
## Falsifiability
18+
19+
Name one completion claim that would be invalid without listing tests run.
20+
321
```markdown
422
You are a repo-aware coding agent.
523

reflective-prompt-library/06-repo/cursor-rules.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,27 @@
11
# Cursor Rules
22

3+
## Purpose
4+
5+
IDE-native editing harness for small, safe, reviewable changes. Primary workflow surfaces: `reflective-implement` and `reflective-review`. Pairs with `01-thinking/critical-thinking-check.md` and `01-thinking/falsifiability.md`.
6+
7+
## Scope
8+
9+
- In scope: goal restatement, minimal diffs, acceptance criteria traceability, post-edit test plan.
10+
- Out of scope: silent refactors outside stated scope (escalate to `reflective-minimality` when disputed).
11+
12+
## Acceptance Criteria
13+
14+
- Every edit maps to stated acceptance criteria with explicit risks and follow-up tests.
15+
- Credential, schema, billing, or destructive changes trigger a human review stop.
16+
17+
## Falsifiability
18+
19+
State what test or diff review would reject the change as out of scope.
20+
21+
## Human Review
22+
23+
Require human approval before editing credential handling, schema, billing, public API, or destructive operations.
24+
325
```markdown
426
You are working inside an IDE. Prioritize small, safe, reviewable edits.
527

reflective-prompt-library/PROJECT_KNOWLEDGE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ deferred promotions are recurrence-gated — see [panel backlog](plans/multi-age
7575
> Pointers to the causal trail — plans, reflections, tests, commits. Detail is
7676
> not duplicated here; this is a map, not an archive.
7777
78+
- 2026-06-25 Round 76 panel — standardize `06-repo/` prompt contracts (Purpose/Scope/Acceptance/Falsifiability) + thinking/workflow cross-links + `test_repo_prompts_eval_harness.py`[record](plans/multi-agent-panel-consensus-2026-06-25.md)
7879
- 2026-06-25 Round 75 panel — standardize `05-domain/` prompt contracts (Purpose/Scope/Acceptance/Falsifiability) + thinking/workflow cross-links + `test_domain_prompts_eval_harness.py`[record](plans/multi-agent-panel-consensus-2026-06-25.md)
7980
- 2026-06-25 Round 74 panel — standardize `03-context/` prompt contracts (Purpose/Scope/Acceptance/Falsifiability) + thinking/workflow cross-links + `test_context_prompts_eval_harness.py`[record](plans/multi-agent-panel-consensus-2026-06-25.md)
8081
- 2026-06-25 Round 73 panel — standardize `04-agent/` prompt contracts (Purpose/Scope/Acceptance/Falsifiability) + thinking/workflow cross-links + `test_agent_prompts_eval_harness.py`[record](plans/multi-agent-panel-consensus-2026-06-25.md)

reflective-prompt-library/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Pick **Strictness L1–L6** first (`skills/reflective-dispatch/SKILL.md`, [GLOSS
3030

3131
## Governance Panel Record
3232

33-
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–75, options A–DT) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
33+
Multi-agent Socratic consensus on project goals and the nine skills (Rounds 1–76, options A–DW) is recorded in [plans/multi-agent-panel-consensus-2026-06-25.md](plans/multi-agent-panel-consensus-2026-06-25.md). Run `make all` before claiming routing or governance changes are verified.
3434

3535
## Directory Map
3636

reflective-prompt-library/plans/QUALITY_GATES_SUMMARY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ ROUTE-002 measures unseen phrasing separately from ROUTE-001. Round 7 (2026-06-2
314314
2. **ROUTE-001/002/003 in CI** — 128 + 102 + 53 paraphrases at 100% consistency (seeded fixtures); `validate_route_fixture.py` gates minimum coverage
315315
3. **Governance validators** — links, lint, governance metadata, PROJECT_KNOWLEDGE, benchmark fixture, skill examples
316316
4. **Harness policy docs** — CONTRIBUTING, AGENTS, SKILL_INSTALLATION, maintenance playbook
317-
5. **Doc anti-drift**`test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py` (370+ pytest anti-drift suite in CI)
317+
5. **Doc anti-drift**`test_routing_contract.py`, cheatsheet parity tests, `test_readme_governance.py`, `test_thinking_prompts_eval_harness.py`, `test_engineering_prompts_eval_harness.py`, `test_prompt_cross_links.py`, `test_core_prompts_eval_harness.py`, `test_agent_prompts_eval_harness.py`, `test_context_prompts_eval_harness.py`, `test_domain_prompts_eval_harness.py`, `test_repo_prompts_eval_harness.py` (400+ pytest anti-drift suite in CI)
318318

319319
### Ongoing maintenance (not blockers)
320320

@@ -384,4 +384,4 @@ Phase 1 quality-gate tooling and documentation are **complete**. Routing consist
384384
- ✅ Benchmark fixture gate plus optional manual benchmark runs
385385
- ✅ Research-backed design decisions
386386

387-
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–75; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.
387+
The project is positioned to grow sustainably with quality discipline built in from the start. **No open implementation blockers** remain from panel Rounds 1–76; work is recurrence-gated maintenance per playbook. The next measurable quality target is **holdout expansion before router tuning** and optional manual baseline-vs-skill benchmark runs — not shipping new core skills without promotion evidence.

reflective-prompt-library/plans/multi-agent-panel-consensus-2026-06-25.md

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1877,4 +1877,61 @@ User directive (repeat): review prompts, plans, skills, and Socratic/critical-th
18771877

18781878
## Panel status (updated)
18791879

1880-
**Resealed 2026-06-25** after **Round 75** (options DR–DT). Domain-prompt contract pass complete; all composable prompt categories (`00-core``05-domain`) now standardized. `06-repo` template Purpose sweep remains recurrence-gated.
1880+
## Round 76 — Repository template contract review (2026-06-25)
1881+
1882+
User directive (repeat): review prompts, plans, skills, and Socratic/critical-thinking lenses in parallel until all roles agree, then implement.
1883+
1884+
### DU: Standardize `06-repo/` prompt contracts + cross-links?
1885+
1886+
| Lens | Position |
1887+
| --- | --- |
1888+
| Opus | **Agree** — host-install templates are the last unstructured prompt layer; AGENTS.md is canonical harness surface |
1889+
| Codex | **Agree** — four files bounded; falsifiable via `test_repo_prompts_eval_harness.py` + cross-link pytest |
1890+
| Gemini | **Agree** — IDE/Codex templates are cost-relevant entry points; defer governance pytest mirrors |
1891+
| Composer | **Agree** — AGENTS harness policy already cited; contract headers close eval_harness gap |
1892+
| Sakana | **Agree** — no tenth skill; repo templates support existing nine |
1893+
| GLM | **Agree** — English contracts outside localized fences; Human Review on high-blast-radius templates |
1894+
1895+
**Socratic Q:** Why `06-repo` after `05-domain`?
1896+
**Answer:** Repo templates are host-install artifacts with distinct authority boundaries (`AGENTS.md` vs `PROJECT_KNOWLEDGE.md`); completing them finishes the prompt-library contract sweep.
1897+
1898+
**Consensus:** **Agree** — Purpose/Scope/Acceptance Criteria/Falsifiability on all four `06-repo/` templates; thinking + workflow cross-links; `test_repo_prompts_eval_harness.py`; extend `test_prompt_cross_links.py`; preserve existing Harness Policy section in AGENTS.md.
1899+
1900+
### DV: Governance pytest mirrors (`validate_links`, `validate_governance`, `lint_skills`) now?
1901+
1902+
| Lens | Position |
1903+
| --- | --- |
1904+
| All six | **Reject** — recurrence-gated (option DH); `make validate` already covers these |
1905+
1906+
### DW: Router / holdout / tenth skill?
1907+
1908+
| Lens | Position |
1909+
| --- | --- |
1910+
| All six | **Reject** — ROUTE-001/002/003 at 100%; nine-skill freeze holds |
1911+
1912+
### Round 76 verdict table
1913+
1914+
| ID | Option | Verdict | Action |
1915+
| --- | --- | --- | --- |
1916+
| DU | Repo template contracts + cross-links | **Agree** | 4 files + pytest anti-drift |
1917+
| DV | Governance pytest mirrors | **Reject** | backlog (DH) |
1918+
| DW | Router/holdout/tenth skill | **Reject** | no change |
1919+
1920+
**All roles agree.**
1921+
1922+
## Implemented Changes (Round 76)
1923+
1924+
- `06-repo/*.md`: Purpose, Scope, Acceptance Criteria, Falsifiability + workflow skill mapping; thinking lens links; Human Review where applicable
1925+
- `plans/tests/test_repo_prompts_eval_harness.py`: structural + 80%+ score floor anti-drift; AGENTS harness-policy guard
1926+
- `plans/tests/test_prompt_cross_links.py`: repo ↔ thinking ↔ skill cross-links
1927+
- `QUALITY_GATES_SUMMARY.md`: repo prompt test mention; pytest floor 400+
1928+
- `PROJECT_KNOWLEDGE.md`: Decision Index Round 76 entry
1929+
- `README.md`, `reflective-prompt-library/README.md`, `test_readme_governance.py`: panel round 76 sync
1930+
1931+
## Verification (Round 76)
1932+
1933+
- `make all`: pytest + ROUTE-001/002/003 100%
1934+
1935+
## Panel status (updated)
1936+
1937+
**Resealed 2026-06-25** after **Round 76** (options DU–DW). Repository-template contract pass complete; full prompt-library contract sweep (`00-core``06-repo`) finished. Governance pytest mirrors remain recurrence-gated.

reflective-prompt-library/plans/tests/test_prompt_cross_links.py

Lines changed: 58 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Anti-drift: thinking lenses, engineering/agent/context/domain prompts, and workflow skills cross-link."""
1+
"""Anti-drift: thinking lenses, engineering/agent/context/domain/repo prompts, and workflow skills cross-link."""
22

33
from pathlib import Path
44

@@ -10,6 +10,7 @@
1010
AGENT_DIR = LIBRARY_ROOT / "04-agent"
1111
CONTEXT_DIR = LIBRARY_ROOT / "03-context"
1212
DOMAIN_DIR = LIBRARY_ROOT / "05-domain"
13+
REPO_DIR = LIBRARY_ROOT / "06-repo"
1314
SKILLS_DIR = LIBRARY_ROOT / "skills"
1415

1516
ENGINEERING_THINKING_LINKS: dict[str, tuple[str, ...]] = {
@@ -183,6 +184,39 @@
183184

184185
DOMAIN_PROMPTS = tuple(sorted(DOMAIN_DIR.glob("*.md")))
185186

187+
REPO_THINKING_LINKS: dict[str, tuple[str, ...]] = {
188+
"AGENTS.md": (
189+
"01-thinking/why-what-how-done.md",
190+
"01-thinking/critical-thinking-check.md",
191+
"01-thinking/socratic-reviewer.md",
192+
),
193+
"cursor-rules.md": (
194+
"01-thinking/critical-thinking-check.md",
195+
"01-thinking/falsifiability.md",
196+
),
197+
"codex-opencode.md": (
198+
"01-thinking/why-what-how-done.md",
199+
"01-thinking/falsifiability.md",
200+
),
201+
"PROJECT_KNOWLEDGE.template.md": (
202+
"01-thinking/socratic-reviewer.md",
203+
"01-thinking/falsifiability.md",
204+
),
205+
}
206+
207+
REPO_SKILL_LINKS: dict[str, tuple[str, ...]] = {
208+
"AGENTS.md": (
209+
"reflective-dispatch",
210+
"reflective-implement",
211+
"reflective-spec-plan",
212+
),
213+
"cursor-rules.md": ("reflective-implement", "reflective-review"),
214+
"codex-opencode.md": ("reflective-implement", "reflective-dispatch"),
215+
"PROJECT_KNOWLEDGE.template.md": ("reflective-handoff-retro", "reflective-brief"),
216+
}
217+
218+
REPO_PROMPTS = tuple(sorted(REPO_DIR.glob("*.md")))
219+
186220
CONTEXT_PROMPTS = tuple(sorted(CONTEXT_DIR.glob("*.md")))
187221

188222

@@ -300,3 +334,26 @@ def test_thinking_lens_files_exist_for_domain_links():
300334
for ref in linked:
301335
assert (LIBRARY_ROOT / ref).is_file(), f"missing thinking lens file {ref}"
302336

337+
@pytest.mark.parametrize("prompt_name,thinking_refs", REPO_THINKING_LINKS.items())
338+
def test_repo_prompt_links_thinking_lens(prompt_name: str, thinking_refs: tuple[str, ...]):
339+
path = REPO_DIR / prompt_name
340+
preamble = _preamble(path)
341+
for ref in thinking_refs:
342+
assert ref in preamble, f"{prompt_name} preamble should reference {ref}"
343+
344+
345+
def test_all_repo_prompts_have_thinking_cross_link():
346+
assert set(REPO_THINKING_LINKS) == {p.name for p in REPO_PROMPTS}
347+
348+
349+
@pytest.mark.parametrize("prompt_name,skill_refs", REPO_SKILL_LINKS.items())
350+
def test_repo_prompt_maps_workflow_skill(prompt_name: str, skill_refs: tuple[str, ...]):
351+
preamble = _preamble(REPO_DIR / prompt_name)
352+
for skill in skill_refs:
353+
assert skill in preamble, f"{prompt_name} preamble should reference {skill}"
354+
355+
356+
def test_thinking_lens_files_exist_for_repo_links():
357+
linked = {ref for refs in REPO_THINKING_LINKS.values() for ref in refs}
358+
for ref in linked:
359+
assert (LIBRARY_ROOT / ref).is_file(), f"missing thinking lens file {ref}"

reflective-prompt-library/plans/tests/test_readme_governance.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@
1010
METHODOLOGY_MAP_EN = Path(__file__).parent.parent.parent / "METHODOLOGY_MAP.md"
1111
SKILL_MAP = Path(__file__).parent.parent.parent / "skills" / "skill-map.md"
1212

13-
CURRENT_PANEL_ROUND = "75"
14-
CURRENT_PANEL_OPTIONS = "A–DT"
13+
CURRENT_PANEL_ROUND = "76"
14+
CURRENT_PANEL_OPTIONS = "A–DW"
1515

1616

1717
@pytest.fixture(scope="module")

0 commit comments

Comments
 (0)