Skip to content

test: phase 0 conformance harness — typed parser for all 68 fixtures#7

Merged
chris-colinsky merged 1 commit into
mainfrom
engine/phase-0-conformance-harness
May 5, 2026
Merged

test: phase 0 conformance harness — typed parser for all 68 fixtures#7
chris-colinsky merged 1 commit into
mainfrom
engine/phase-0-conformance-harness

Conversation

@chris-colinsky

Copy link
Copy Markdown
Member

Summary

  • Phase 0 of the v0.4 → v0.8 implementation plan: every fixture under `openarmature-spec/spec//conformance/` parses into a typed pydantic config. Phases 1+ add runtime interpretation under `harness/runtime/` without re-touching parsing.
  • Bumps the spec submodule to v0.8.0 (8 accepted proposals).
  • New `tests/conformance/harness/` package: `fixtures.py` (three top-level discriminated variants), `directives.py` (node directives + middleware), `expectations.py` (per-capability expected blocks), `loader.py` (auto-discovery), `skip.py` (structured skip reasons with phase mapping), `runtime/` (empty stub).
  • All 68 fixtures parse + round-trip (`test_fixture_parsing.py`) — Phase 0's exit criterion.
  • `test_conformance.py` skips fixtures that need runtime support from later phases (pair model, new node directives) with structured "needs phase X" messages.

Strictness contract

  • Strict (`extra="forbid"`) at the structural skeleton: top-level fixture types, `StateSchema`, `NodeSpec` primary directive set, `ObserverSpec`, `MiddlewareConfig`. Catches new directives the spec adds.
  • Permissive (`extra="allow"`) for payload-shape models: LLM mock responses, middleware params, flaky/fan-out config, case shapes. These evolve frequently without restructuring the directive surface.

Test plan

  • All 68 fixtures parse to a typed Fixture variant (`test_fixture_parses`)
  • All 68 fixtures round-trip stably (parse → model_dump → re-parse → equal)
  • Existing graph-engine runtime tests still pass (204 total, 7 skipped with phase-tagged reasons, 0 failed)
  • `uv run pyright src/ tests/` zero errors
  • `uv run ruff check . && uv run ruff format --check .` clean

What's next

Phase 1 — graph-engine retrofit to the v0.6 pair model (started/completed events, attempt_index, fan_out_index, phase). The 7 currently-skipped graph-engine fixtures (012-018) turn back on as Phase 1 lands.

Phase 0 of the v0.4 → v0.8 implementation plan: the conformance test
harness loads every YAML fixture under
openarmature-spec/spec/<capability>/conformance/ into a typed
pydantic config. Phases 1+ add runtime interpretation under
harness/runtime/ without re-touching parsing.

Bumps the spec submodule to v0.8.0 (8 accepted proposals: foundation,
explicit subgraph mapping, observer hooks, llm-provider, middleware,
fan-out, OTel, checkpointing).

Harness package layout (tests/conformance/harness/):
  - fixtures.py      Three top-level discriminated variants
                     (LlmProviderFixture | CasesFixture |
                     GraphFixture). Discriminator picks by presence
                     of `mock_provider` / `cases:` keys.
  - directives.py    Node directive sub-models with mutual-exclusion
                     validator on the 13 primary directives.
                     Middleware discriminated union over the 5 types
                     in fixtures.
  - expectations.py  Per-capability expected-block discriminator
                     (graph_engine | llm_provider |
                     pipeline_utilities | observability) chosen by
                     which assertion keys appear.
  - loader.py        Auto-discovers NNN-*.yaml across all four
                     capability directories.
  - skip.py          Structured SkipReason with capability +
                     directive list + phase mapping.
  - runtime/         Empty stub package + README locking in the
                     Phase 0 boundary.

Strictness contract: strict (extra="forbid") at the structural
skeleton (top-level fixture, StateSchema, NodeSpec primary directive
set, ObserverSpec, MiddlewareConfig). Permissive (extra="allow") for
payload-shape models (LLM mock responses, middleware params, flaky
config, case shapes) — those evolve frequently in the spec without
restructuring the directive surface.

Tests added (test_fixture_parsing.py): every fixture parses (68)
AND round-trips (parse → model_dump → re-parse → equal). 137
assertions total. Phase 0 exit criterion.

test_conformance.py: existing graph-engine runtime tests still pass
for fixtures the legacy adapter can translate. Fixtures using the
v0.6 pair model or new node directives (fan_out, flaky variants,
calls_llm, update_pure*, emits_log, etc.) skip with phase-tagged
reasons.

Result: 204 pass, 7 skip, 0 fail.
Copilot AI review requested due to automatic review settings May 5, 2026 05:53

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Phase 0 of the conformance harness by introducing typed (Pydantic v2) parsing for all spec fixtures and a parsing/round-trip stability test suite, while updating the existing graph-engine runtime conformance test to skip fixtures that require later-phase runtime support.

Changes:

  • Introduces tests/conformance/harness/ with discriminated fixture root models, directive/expected-block submodels, discovery/loading utilities, and structured skip-reason formatting.
  • Adds test_fixture_parsing.py to ensure every discovered spec fixture parses into a typed variant and round-trips via model_dump() stably.
  • Updates test_conformance.py (graph-engine runtime conformance) with Phase 0 skips for fixtures needing the pair-model or unsupported node directives.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/conformance/test_fixture_parsing.py New Phase 0 tests for fixture discovery, parsing, and round-trip stability.
tests/conformance/test_conformance.py Adds Phase 0 skip logic for runtime fixtures requiring later-phase support.
tests/conformance/harness/init.py Exposes the public harness surface (discover/load/Fixture/SkipReason).
tests/conformance/harness/loader.py Implements deterministic discovery and typed parsing via TypeAdapter.
tests/conformance/harness/fixtures.py Defines the three top-level discriminated fixture variants.
tests/conformance/harness/directives.py Adds typed models for fixture directives (nodes, edges, middleware, etc.).
tests/conformance/harness/expectations.py Adds typed models for per-capability expected: blocks + discriminator.
tests/conformance/harness/skip.py Adds structured skip reasons and directive→phase mapping for later phases.
tests/conformance/harness/runtime/README.md Documents the Phase 0 boundary and future runtime home.
tests/conformance/harness/runtime/init.py Introduces the runtime package stub (empty).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/conformance/test_conformance.py
Comment thread tests/conformance/harness/expectations.py
Comment thread tests/conformance/test_conformance.py
@chris-colinsky chris-colinsky merged commit aa07c88 into main May 5, 2026
8 checks passed
@chris-colinsky chris-colinsky deleted the engine/phase-0-conformance-harness branch May 5, 2026 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants