Skip to content

Commit 202ff40

Browse files
tbitcsoz-agent
andcommitted
feat(governance): YAML-native governance layer + strict validation + duplicate REQ cleanup
## Changes ### Data Fixes - Remove 23 duplicate REQs (REQ-221..243 were exact duplicates of REQ-130..135 and REQ-161..179 added in a previous sprint). REQs: 280 → 257, all now covered. - Sync workitems.json: 107 → 257 entries mirroring all 257 REQs ### YAML-native governance (Stage 2+3+4) - New module: src/specsmith/governance_yaml.py - load_yaml_requirements / load_yaml_tests: read from docs/requirements/*.yml - save_yaml_requirements / save_yaml_tests: write back to grouped files - generate_requirements_md / generate_tests_md: YAML → MD generation - is_yaml_mode: checks .specsmith/governance-mode flag - strict_validate: 8 schema checks (dup IDs, orphan tests, missing fields, etc.) - New: docs/requirements/*.yml and docs/tests/*.yml (7 files each, grouped by domain) - New: .specsmith/governance-mode = yaml (YAML-first mode enabled) - New: scripts/migrate_governance_to_yaml.py (idempotent migration script) ### Sync flipped to YAML-first (Stage 4) - src/specsmith/sync.py: when governance-mode=yaml, reads from YAML files, updates JSON cache, and regenerates docs/REQUIREMENTS.md + docs/TESTS.md as derived artifacts. Markdown-primary mode still works for legacy projects. ### New CLI commands (Stage 2+3) - specsmith validate --strict [--json]: schema enforcement (duplicate IDs, orphan tests, untested REQs, missing required fields, title duplicates, sync drift). Exits 1 on errors; warnings do not block. - specsmith generate docs [--check] [--json]: YAML → MD regeneration. The YAML-first equivalent of specsmith sync for human-readable artifacts. ### CI gate (Stage 2) - .github/workflows/ci.yml: new validate-strict job runs on every push/PR. Catches governance schema violations before they reach main. ### API surface - tests/fixtures/api_surface.json: regenerated to include new commands. Co-Authored-By: Oz <oz-agent@warp.dev>
1 parent f66c82d commit 202ff40

25 files changed

Lines changed: 6140 additions & 1451 deletions

.github/workflows/ci.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,23 @@ jobs:
8787
SPECSMITH_PYPI_CHECKED: "1"
8888
run: python -m specsmith sync --check --project-dir .
8989

90+
validate-strict:
91+
# YAML governance schema guard: duplicate IDs, orphan tests, missing fields.
92+
runs-on: ubuntu-latest
93+
steps:
94+
- uses: actions/checkout@v6
95+
- uses: actions/setup-python@v6
96+
with:
97+
python-version: "3.12"
98+
cache: pip
99+
- run: python -m pip install --upgrade pip
100+
- run: pip install -e ".[dev]"
101+
- name: Strict governance schema validation
102+
env:
103+
SPECSMITH_NO_AUTO_UPDATE: "1"
104+
SPECSMITH_PYPI_CHECKED: "1"
105+
run: python -m specsmith validate --strict --json --project-dir .
106+
90107
api-surface:
91108
# REQ-140 guard: regenerates the public CLI surface and fails the build
92109
# if the live output drifts from the committed fixture. Catches accidental

.specsmith/governance-mode

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
yaml

.specsmith/requirements.json

Lines changed: 0 additions & 161 deletions
Original file line numberDiff line numberDiff line change
@@ -1532,167 +1532,6 @@
15321532
"source": "BTWS-2027 AI Governance Report [REG-015]",
15331533
"status": "defined"
15341534
},
1535-
{
1536-
"id": "REQ-221",
1537-
"title": "Instinct Persistence System",
1538-
"description": "specsmith MUST implement an instinct persistence system in src/specsmith/instinct.py storing patterns extracted from successful sessions.",
1539-
"source": "PLANNED-REQUIREMENTS.md [LRN-001]",
1540-
"status": "defined"
1541-
},
1542-
{
1543-
"id": "REQ-222",
1544-
"title": "Instinct Record Schema",
1545-
"description": "Each instinct record MUST contain: id, trigger_pattern, content, confidence, project_scope, created, last_used, use_count.",
1546-
"source": "PLANNED-REQUIREMENTS.md [LRN-002]",
1547-
"status": "defined"
1548-
},
1549-
{
1550-
"id": "REQ-223",
1551-
"title": "Session End Instinct Extraction",
1552-
"description": "The SESSION_END hook MUST extract candidate instincts from session history for user review before the session closes.",
1553-
"source": "PLANNED-REQUIREMENTS.md [LRN-003]",
1554-
"status": "defined"
1555-
},
1556-
{
1557-
"id": "REQ-224",
1558-
"title": "Learn Command",
1559-
"description": "The /learn command MUST promote a user-approved pattern to an instinct with an initial confidence score and persist it to the instinct store.",
1560-
"source": "PLANNED-REQUIREMENTS.md [LRN-004]",
1561-
"status": "defined"
1562-
},
1563-
{
1564-
"id": "REQ-225",
1565-
"title": "Instinct Confidence Updates",
1566-
"description": "Instinct confidence MUST be updated based on application success or rejection — increasing on accepted application and decreasing on rejection.",
1567-
"source": "PLANNED-REQUIREMENTS.md [LRN-005]",
1568-
"status": "defined"
1569-
},
1570-
{
1571-
"id": "REQ-226",
1572-
"title": "Instinct Import Export",
1573-
"description": "Instincts MUST be importable and exportable as .md files for cross-project and cross-team sharing.",
1574-
"source": "PLANNED-REQUIREMENTS.md [LRN-006]",
1575-
"status": "defined"
1576-
},
1577-
{
1578-
"id": "REQ-227",
1579-
"title": "Instinct Status Command",
1580-
"description": "/instinct-status MUST display all active instincts sorted by confidence descending, with use_count and last_used fields.",
1581-
"source": "PLANNED-REQUIREMENTS.md [LRN-007]",
1582-
"status": "defined"
1583-
},
1584-
{
1585-
"id": "REQ-228",
1586-
"title": "Eval Harness Module",
1587-
"description": "specsmith MUST implement an eval harness in src/specsmith/eval/ supporting eval-driven development workflows.",
1588-
"source": "PLANNED-REQUIREMENTS.md [EDD-001]",
1589-
"status": "defined"
1590-
},
1591-
{
1592-
"id": "REQ-229",
1593-
"title": "Eval Data Model",
1594-
"description": "The eval model MUST define: Task, Trial, Grader, Transcript, Outcome as core types.",
1595-
"source": "PLANNED-REQUIREMENTS.md [EDD-002]",
1596-
"status": "defined"
1597-
},
1598-
{
1599-
"id": "REQ-230",
1600-
"title": "Eval Task Storage",
1601-
"description": "Tasks MUST be stored as Markdown files at .specsmith/evals/{feature}.md with YAML frontmatter.",
1602-
"source": "PLANNED-REQUIREMENTS.md [EDD-003]",
1603-
"status": "defined"
1604-
},
1605-
{
1606-
"id": "REQ-231",
1607-
"title": "Eval Grader Types",
1608-
"description": "The eval harness MUST support CodeGrader, ModelGrader, and HumanFlag grader types for different validation strategies.",
1609-
"source": "PLANNED-REQUIREMENTS.md [EDD-004]",
1610-
"status": "defined"
1611-
},
1612-
{
1613-
"id": "REQ-232",
1614-
"title": "Eval Pass at K Metrics",
1615-
"description": "The eval harness MUST compute pass@k and pass^k metrics for measuring agent capability across multiple trials.",
1616-
"source": "PLANNED-REQUIREMENTS.md [EDD-005]",
1617-
"status": "defined"
1618-
},
1619-
{
1620-
"id": "REQ-233",
1621-
"title": "Git-Based Eval Grading",
1622-
"description": "Default grading MUST be git-based outcome grading (checking actual changes in VCS) rather than execution-path assertion.",
1623-
"source": "PLANNED-REQUIREMENTS.md [EDD-006]",
1624-
"status": "defined"
1625-
},
1626-
{
1627-
"id": "REQ-234",
1628-
"title": "Eval Run Command",
1629-
"description": "/eval run --trials k MUST run k independent trials and report pass@k results with per-trial transcripts.",
1630-
"source": "PLANNED-REQUIREMENTS.md [EDD-007]",
1631-
"status": "defined"
1632-
},
1633-
{
1634-
"id": "REQ-235",
1635-
"title": "Capability vs Regression Evals",
1636-
"description": "The eval harness MUST distinguish capability evals (new functionality) from regression evals (existing functionality preservation).",
1637-
"source": "PLANNED-REQUIREMENTS.md [EDD-008]",
1638-
"status": "defined"
1639-
},
1640-
{
1641-
"id": "REQ-236",
1642-
"title": "Agent Memory Module",
1643-
"description": "specsmith MUST implement cross-session agent memory in src/specsmith/memory.py persisting patterns, facts, and history across sessions.",
1644-
"source": "PLANNED-REQUIREMENTS.md [MEM-001]",
1645-
"status": "defined"
1646-
},
1647-
{
1648-
"id": "REQ-237",
1649-
"title": "Agent Memory Schema",
1650-
"description": "Agent memory MUST be structured JSON containing: accumulated patterns, preferred approaches, known project facts, and failure history.",
1651-
"source": "PLANNED-REQUIREMENTS.md [MEM-002]",
1652-
"status": "defined"
1653-
},
1654-
{
1655-
"id": "REQ-238",
1656-
"title": "Session Start Memory Injection",
1657-
"description": "The SESSION_START hook MUST inject relevant memories into the system prompt, respecting the configured token budget to avoid context overrun.",
1658-
"source": "PLANNED-REQUIREMENTS.md [MEM-003]",
1659-
"status": "defined"
1660-
},
1661-
{
1662-
"id": "REQ-239",
1663-
"title": "Typed Execution Layer",
1664-
"description": "All tool handlers MUST use a typed ProjectOperations class for file, git/VCS, and search operations. Direct raw shell string assembly in tool handlers is prohibited.",
1665-
"source": "PLANNED-REQUIREMENTS.md [OPS-001]",
1666-
"status": "defined"
1667-
},
1668-
{
1669-
"id": "REQ-240",
1670-
"title": "ProjectOperations File Interface",
1671-
"description": "ProjectOperations MUST expose file operations (read_file, write_file, list_dir, glob, search) implemented via Python pathlib/stdlib with no subprocess calls.",
1672-
"source": "PLANNED-REQUIREMENTS.md [OPS-002]",
1673-
"status": "defined"
1674-
},
1675-
{
1676-
"id": "REQ-241",
1677-
"title": "ProjectOperations VCS Interface",
1678-
"description": "ProjectOperations MUST expose git/VCS operations (status, log, diff, add, commit, push, create_branch, create_pr) returning structured typed result objects.",
1679-
"source": "PLANNED-REQUIREMENTS.md [OPS-003]",
1680-
"status": "defined"
1681-
},
1682-
{
1683-
"id": "REQ-242",
1684-
"title": "ProjectOperations Result Schema",
1685-
"description": "All ProjectOperations methods MUST return a typed result containing at minimum: exit_code, stdout, stderr, and elapsed_ms.",
1686-
"source": "PLANNED-REQUIREMENTS.md [OPS-004]",
1687-
"status": "defined"
1688-
},
1689-
{
1690-
"id": "REQ-243",
1691-
"title": "Cross-Platform ProjectOperations",
1692-
"description": "ProjectOperations MUST be cross-platform (Windows, Linux, macOS) without platform-specific code branches at call sites.",
1693-
"source": "PLANNED-REQUIREMENTS.md [OPS-006]",
1694-
"status": "defined"
1695-
},
16961535
{
16971536
"id": "REQ-244",
16981537
"title": "GPU-Aware Context Window Sizing",

.specsmith/testcases.json

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2628,28 +2628,6 @@
26282628
"expected_behavior": {},
26292629
"confidence": 1.0
26302630
},
2631-
{
2632-
"id": "TEST-282",
2633-
"title": "HF Leaderboard Sync Persists Bucket Scores to JSON",
2634-
"description": "`sync_from_huggingface_blocking(force_static=True, scores_path=tmp_path/\"scores.json\")` creates the file at the given path, whose JSON root contains a `\"bucket_scores\"` dict. Each entry has `reasoning_score`, `conversational_score`, `longform_score`, and `model_name` keys.",
2635-
"requirement_id": "REQ-263",
2636-
"type": "unit",
2637-
"verification_method": "pytest",
2638-
"input": {},
2639-
"expected_behavior": {},
2640-
"confidence": 1.0
2641-
},
2642-
{
2643-
"id": "TEST-283",
2644-
"title": "HF Token Included in Request Headers When Set",
2645-
"description": "When `SPECSMITH_HF_TOKEN` is set to a non-empty string, `test_hf_connection()` returns `{\"token_set\": true}` and the rate_limit_tier includes \"authenticated\". The `_fetch_page` request (captured via mock) includes `Authorization: Bearer <token>` in its headers.",
2646-
"requirement_id": "REQ-265",
2647-
"type": "unit",
2648-
"verification_method": "pytest",
2649-
"input": {},
2650-
"expected_behavior": {},
2651-
"confidence": 1.0
2652-
},
26532631
{
26542632
"id": "TEST-263",
26552633
"title": "HF Leaderboard Static Fallback Loads Without Network",
@@ -2858,5 +2836,27 @@
28582836
"input": {},
28592837
"expected_behavior": {},
28602838
"confidence": 1.0
2839+
},
2840+
{
2841+
"id": "TEST-282",
2842+
"title": "HF Leaderboard Sync Persists Bucket Scores to JSON",
2843+
"description": "`sync_from_huggingface_blocking(force_static=True, scores_path=tmp_path/\"scores.json\")` creates the file at the given path, whose JSON root contains a `\"bucket_scores\"` dict. Each entry has `reasoning_score`, `conversational_score`, `longform_score`, and `model_name` keys.",
2844+
"requirement_id": "REQ-263",
2845+
"type": "unit",
2846+
"verification_method": "pytest",
2847+
"input": {},
2848+
"expected_behavior": {},
2849+
"confidence": 1.0
2850+
},
2851+
{
2852+
"id": "TEST-283",
2853+
"title": "HF Token Included in Request Headers When Set",
2854+
"description": "When `SPECSMITH_HF_TOKEN` is set to a non-empty string, `test_hf_connection()` returns `{\"token_set\": true}` and the rate_limit_tier includes \"authenticated\". The `_fetch_page` request (captured via mock) includes `Authorization: Bearer <token>` in its headers.",
2855+
"requirement_id": "REQ-265",
2856+
"type": "unit",
2857+
"verification_method": "pytest",
2858+
"input": {},
2859+
"expected_behavior": {},
2860+
"confidence": 1.0
28612861
}
28622862
]

0 commit comments

Comments
 (0)