Skip to content

Commit b92c083

Browse files
feat(checkpoint): state migration for checkpoints (proposal 0014) (#46)
* feat(checkpoint): state migration registry, types, errors, builder surface Implements pipeline-utilities spec §10.12 (proposal 0014). - New errors: CheckpointStateMigrationMissing, CheckpointStateMigrationFailed. Both non-transient per §10.10. The missing-chain error carries from_version / to_version / registered_migrations_count / registry_description for actionable operator diagnostics. - New types: StateMigration (frozen dataclass — from_version, to_version, migrate callable) and MigrationRegistry (BFS chain resolution + ambiguity detection per §10.12.2). - Multi-shortest-path detection: when BFS finds a shortest path AND a second distinct path of equal length exists, the registry raises ValueError per the spec's ambiguous-chain rule. Resume surfaces this as CheckpointStateMigrationMissing with the ambiguity description in the payload. - State.schema_version: ClassVar[str] = '' (per spec §10.2's per-language carve-out). Empty-string sentinel; the framework reads type(state).schema_version at save time. - Checkpointer Protocol: supports_state_migration: ClassVar[bool] marker per §10.12.1. InMemoryCheckpointer: False (typed in- memory references can't expose a class-independent intermediate). SQLiteCheckpointer: True in JSON mode, False in pickle mode (pickle holds class identity and round-trips to typed instances; can't bridge versions). - GraphBuilder.with_state_migration / with_state_migrations thread a populated MigrationRegistry into CompiledGraph at compile time. - Resume-path routing (compiled.py): version mismatch → unsupported-backend check → registry lookup → chain application (with per-migration failure wrap) → final deserialization. The post-migration deserialization failure still surfaces as CheckpointRecordInvalid per §10.12.4; pre-migration version mismatch routes through the new two categories. Order matters; documented inline so a future reader doesn't swap it back. - Parent-state migration: same chain applied to each entry of parent_states in lockstep with the outer state per §10.12.2. Code comment records the spec-mandated equivalence so future contributors don't add per-parent metadata without a follow-on proposal. - Drop the CHECKPOINT_SCHEMA_VERSION = '1' constant: per Q1 spec answer, the old backend-internal record-shape role had no spec slot anyway. SQLiteCheckpointer no longer rejects records with non-default versions on load — that routing is now the engine's concern at resume time. Existing records carrying schema_version='1' get reinterpreted as user-facing v1 identifiers (single-user dev, no compat shim needed per Chris's note). * test(conformance): drive 0014 state-migration fixtures 039-046 New tests/conformance/test_state_migration.py drives all 8 spec state-migration fixtures end-to-end against the real engine (SQLite JSON-mode backend, real graph compile, real resume path). Harness pieces: - Migration mock library: add_new_field_default, add_v2_field, add_v3_field, identity_passthrough, raises_keyerror, should_not_run, irrelevant. Each fixture's migrate: <name> resolves through the library. - _MigrationTrace wraps each mock to capture invocation order for the migrations_run / migration_count / single_migration_invocation / migration_order_matches_chain assertions. Consecutive duplicates collapse (fixture 043 runs each step once for outer + once for each parent under the lockstep ordering; the assertion is per-step, not per-entity). - _seed_record persists a checkpoint matching the fixture's seeded_record: block before invoke(resume_invocation=...) so the resume path has data to load. Harness/model adjustments: - StateSchema in tests/conformance/harness/directives.py gains optional schema_version (default '') and the required field knob (no-default Pydantic field, used by fixture 044's required_v2_field deserialization-failure case). - _DEFERRED_FIXTURES in test_fixture_parsing.py loses the 039-046 rows; the CasesFixture model parses them via permissive extras on CaseSpec. - Initial-state construction in the resume path uses state_cls.model_construct() so fixtures with required fields (044) can pass a placeholder past Pydantic's validator before the resume even starts; the engine loads state from the checkpoint, not from the placeholder. Protocol-attribute shape: - Checkpointer.supports_state_migration declared as bool = False (not ClassVar) so SQLiteCheckpointer can set the value per-instance in __init__ based on serialization mode. Backends with a static answer (InMemoryCheckpointer) override at the class level with bool = False — Pyright accepts either shape because Protocol attribute conformance ignores the ClassVar marker on subclasses. * test(unit): state migration + docs(concepts) state migrations section + CHANGELOG tests/unit/test_state_migration.py (16 tests) covers gaps the conformance fixtures don't exercise directly: BFS edge cases on the registry, multi-shortest-path ambiguity detection, GraphBuilder ergonomics (singular + plural registration, duplicate-pair ValueError), and error attribute carriage including __cause__ preservation on CheckpointStateMigrationFailed. docs/concepts/checkpointing.md gains a State migrations section covering: the State.schema_version declaration, the registration surface (with_state_migration / with_state_migrations), BFS chain resolution and ambiguity cases (duplicate edges + multi-shortest- path), the two new error categories and how they relate to CheckpointRecordInvalid (§10.12.4), backend support and why SQLite-pickle / InMemory aren't migration-eligible, the lockstep parent_states migration rule, and the migrations-MUST-be-pure contract. CHANGELOG.md gains two new Added entries (state-migration surface + Checkpointer.supports_state_migration Protocol attribute) plus a Changed entry documenting the CheckpointRecord.schema_version semantic shift and the CHECKPOINT_SCHEMA_VERSION constant removal. Pre-1.0 breaking change covered by the consolidated-release flag. * feat(checkpoint): chain-ambiguous category + spec v0.16.0 bump Spec proposal 0018 (chain-ambiguity category) landed in spec v0.16.0 between PR-4 push and the spec agent's code review. This commit adds the new error category and the required routing swaps; PR-4 stays open as the carrier. - Submodule pinned to v0.16.0; pyproject + __spec_version__ + test_smoke pin assertions follow. - New canonical error class CheckpointStateMigrationChainAmbiguous (and the matching category string) added to checkpoint/errors.py, re-exported from openarmature.checkpoint. Non-transient. Carries optional from_version / to_version when known (always set for the registration-time case; resume-time multi-shortest-path also populates). - MigrationRegistry.register raises CheckpointStateMigrationChainAmbiguous directly at registration time per spec §10.10 (proposal 0018), so the canonical category surfaces at the registration boundary without wrapping. The internal BFS keeps raising ValueError for multi-shortest-path detection; CompiledGraph._migrate_record's except branch now routes that to CheckpointStateMigrationChainAmbiguous at the resume boundary. - Routing precedence on resume per §10.10 (v0.16.0): chain-ambiguous → missing → failed → record-invalid. Code-comment in _migrate_record documents the ordering so a future reader doesn't swap it back. - Code comments record two design-choice notes the spec agent flagged: the load-bearing equal-depth-re-entry behavior in BFS cycle protection (tightening to <= breaks multi-shortest-path detection), and the registration-order ordering of describe() output. - Conformance harness gains the expected_chain_ambiguity_error primitive. Accepts the canonical category at EITHER the case top-level (registration-time detection: registration raises, resume block unreachable) OR inside resume (load-time detection during BFS). Spec §10.12.2's compile-time-SHOULD / load-time-acceptable carve-out lands in the driver as a single try/except wrap around both phases. - Fixture 047 (state-migration-chain-ambiguous) covered; all 9 state-migration fixtures (039-047) pass. - 3 new unit tests for the new category's class shape + attribute carriage; existing duplicate-pair tests updated to assert the canonical category instead of ValueError. - docs/concepts/checkpointing.md updated for the third category + the v0.16.0 routing precedence. CHANGELOG entry updated and Pinned-version line bumped to 0.10.0 → 0.16.0. OTel openarmature.checkpoint.migrate span emission deferred to a follow-on; documented inline in _migrate_record. Spec §6 cross-ref is SHOULD-level, and the emission needs the _InvocationContext that's currently created AFTER the migration path runs. The follow-on can either restructure resume to build the context first or use a side-channel observer dispatch. * feat(observability): openarmature.checkpoint.migrate OTel span Lands the §6 cross-ref piece of proposal 0014 that the previous commit deferred. Versioned resumes whose migration chain runs now emit a zero-duration `openarmature.checkpoint.migrate` span on the OTel observer, parented under the invocation root. - New _MigrationSummary frozen dataclass in graph/compiled.py carries the chain metadata (from_version, to_version, chain_length) out of _migrate_record back to invoke(). - _migrate_record return type becomes tuple[CheckpointRecord, _MigrationSummary]; invoke() captures the summary across the migration → context-creation gap. - After context + delivery-worker setup but before the main execution loop, invoke() dispatches a synthetic NodeEvent with phase='checkpoint_migrated' carrying the summary on pre_state. Mirrors the checkpoint_saved pattern (state-on-pre, post=None) so the OTel observer's existing routing shape picks it up cleanly. - New phase 'checkpoint_migrated' added to NodeEvent's Literal union and to KNOWN_PHASES (rejects typos in observer.phases subscriptions). NodeEvent.pre_state typed as Any so the permissive _MigrationSummary payload fits; the OTel observer narrows defensively via isinstance. - OTelObserver._emit_checkpoint_migrate_span emits the span with the three normative attributes. Parented under the invocation root span (opened lazily if not yet present — versioned resumes are the first event on a new invocation_id, before any node-span emission). Mirrors _emit_checkpoint_save_span. - Two OTel unit tests: one positive (versioned resume emits the span with the right attrs) and one negative (version-match fast path emits no span, per spec §10.12.3). CHANGELOG entry promoted from 'deferred to follow-on' to the landed shape. Phase gated off default subscriptions so legacy observers don't surface a new event without opt-in. * docs(changelog): flag NodeEvent.pre_state Any widening for observer authors * fix: CoPilot review pass on PR #46 - compiled.py: align save-side schema_version read on self.state_cls (was type(post_state)); _maybe_save_checkpoint drops @staticmethod so it can access self. Symmetric with the resume-side check; subclass schema_versions no longer shadow the declared graph schema. - compiled.py _apply_migration_step: re-raise CheckpointError subclasses unchanged before the bare-Exception wrap, so a canonical-category raise from a user migration propagates without being squashed as CheckpointStateMigrationFailed. Trim the wrap-message to the from/to identity + 'raised while migrating <label>' — __cause__ already carries the underlying exception; embedding type(exc).__name__: str(exc) duplicates info Python's traceback formatter shows anyway. - builder.py with_state_migrations: pre-validate the full input list (against existing registry + against earlier entries in the same call) before mutating. A duplicate-pair raise in the middle of the list no longer leaves earlier entries half-registered. Update the singular with_state_migration docstring to reflect the canonical raise type (CheckpointStateMigrationChainAmbiguous, not ValueError) + flag the empty-to_version ValueError. - migration.py register: reject empty to_version (un-declared sentinel is not a valid chain target per spec §10.2). Empty from_version stays valid for the documented bridging case. - migration.py resolve_chain: raise CheckpointStateMigrationChainAmbiguous directly on multi-shortest-path detection (was ValueError). The registry's ambiguity contract is now one type regardless of when it surfaces (register vs resolve). compiled.py's wrap removed. - protocol.py Checkpointer: clarify the supports_state_migration docstring on the Protocol-class-body vs runtime-attribute semantic — class-body '= False' is a typing-level signal, not a runtime guarantee. Engine uses getattr-with-default; backends SHOULD declare for Pyright honesty. - events.py NodeEvent: document the synthetic-phases conventions (step=-1, dotted node_name, non-State pre_state on checkpoint_migrated) in the class docstring so third-party observer authors don't need to read the engine source. - observer.py KNOWN_PHASES: expand the comment about the synthetic-phase opt-in and cross-reference the NodeEvent docstring. - tests/conformance/test_state_migration.py: range docstring updated from 039-046 to 039-047 with a one-paragraph note on the chain-ambiguity surface 047 covers. - tests/unit/test_state_migration.py: existing ambiguous-shortest-paths test asserts the canonical category (was ValueError); two new tests for the empty-to_version rejection + empty-from_version bridging acceptance; two new tests for with_state_migrations atomic pre-validation (no partial-state on internal-list and against-existing-registry duplicate cases).
1 parent ddc99f2 commit b92c083

24 files changed

Lines changed: 1892 additions & 70 deletions

CHANGELOG.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
88

99
### Added
1010

11+
- **State migration for checkpointed graphs (proposal 0014, introduced in spec v0.15.0; refined by proposal 0018 in spec v0.16.0).** Saved checkpoints whose `schema_version` doesn't match the current state class now route through a registered migration chain instead of failing on resume. Surface: `State.schema_version: ClassVar[str] = ""` (declare a non-empty value to opt in), `GraphBuilder.with_state_migration(from_version, to_version, migrate)` and `with_state_migrations(*migrations)` for registration, `StateMigration` and `MigrationRegistry` types exported from `openarmature.checkpoint`. Chain resolution is BFS over the registered edges; the shortest path wins. Three new error categories: `CheckpointStateMigrationChainAmbiguous` (proposal 0018: duplicate `(from, to)` pair at registration time, or multiple distinct shortest paths between the saved and current versions at resume time), `CheckpointStateMigrationMissing` (no chain bridges the versions), and `CheckpointStateMigrationFailed` (a migration function raised). All non-transient. Post-migration deserialization failures still route to `CheckpointRecordInvalid` per §10.12.4. The same chain applies to each entry in `parent_states` in lockstep with the outer state per §10.12.2. Routing precedence per §10.10 (v0.16.0): chain-ambiguous → missing → failed → record-invalid.
12+
- **`Checkpointer.supports_state_migration` Protocol attribute.** Marks whether a backend can expose the structural intermediate form (a plain dict, JSON tree) the migration registry consumes. `SQLiteCheckpointer(serialization="json")` opts in; `SQLiteCheckpointer(serialization="pickle")` and `InMemoryCheckpointer` opt out. On version mismatch against a non-migration-eligible backend the engine raises `CheckpointRecordInvalid` per spec §10.12.1.
13+
- **`openarmature.checkpoint.migrate` OTel span (proposal 0014 §6 cross-ref).** Versioned resumes whose migration chain runs emit a zero-duration `openarmature.checkpoint.migrate` span on the OTel observer, parented under the invocation root span. Attributes: `openarmature.checkpoint.migrate.from_version`, `openarmature.checkpoint.migrate.to_version` (the final target), `openarmature.checkpoint.migrate.chain_length`. The §10.12.3 fast path (versions match, registry not consulted) emits no span. Engine-side: a synthetic `checkpoint_migrated` observer phase carries a `_MigrationSummary` payload from `_migrate_record` through to the OTel observer; the new phase is gated off default subscriptions (observers opt in explicitly via `phases={..., "checkpoint_migrated"}`).
1114
- **Prompt-management capability (proposal 0017, introduced in spec v0.15.0).** New `openarmature.prompts` subpackage. `PromptManager` composes one or more `PromptBackend`s, exposes `fetch` / `render` / `get`, applies the §8 fallback semantics (`prompt_store_unavailable` continues to the next backend; `prompt_not_found` stops the chain), and renders templates with Jinja2's `StrictUndefined` per §7. `Prompt` / `PromptResult` / `PromptGroup` are Pydantic models matching spec §3 / §4 / §9. Three error categories (`PromptNotFound`, `PromptRenderError`, `PromptStoreUnavailable`) with `PROMPT_TRANSIENT_CATEGORIES` exported for retry-middleware classifiers. `FilesystemPromptBackend` is the minimum local-filesystem reference backend (layout: `<root>/<label>/<name>.j2`; `version` derived from the first 16 hex chars of `template_hash`). New runtime dependency: `jinja2>=3.1`.
1215
- **`openarmature.prompts.context` — observability propagation per spec §11.** `with_active_prompt(result)` and `with_active_prompt_group(group)` context managers + `current_prompt_result()` / `current_prompt_group()` inspectors. When the OTel observer is active and an LLM call fires inside `with_active_prompt`, the `openarmature.llm.complete` span carries the normative `openarmature.prompt.*` attributes (`name`, `version`, `label`, `template_hash`, `rendered_hash`, `group_name`). Nesting is innermost-wins.
1316
- **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
@@ -22,7 +25,9 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
2225

2326
### Changed
2427

25-
- **Pinned spec version: 0.10.0 → 0.15.0.** Adopts the skip-ahead governance principle: the submodule jumps across v0.11.0–v0.15.0 (proposals 0009, 0011, 0014, 0015, 0016, 0017) in one bump. Only the surface introduced by proposal 0016 is implemented in this changelog entry; fixtures from 0011 / 0014 / 0015 / 0017 are marked deferred-skip in the conformance suite and unmark as their respective PRs land.
28+
- **Pinned spec version: 0.10.0 → 0.16.0.** Adopts the skip-ahead governance principle: the submodule jumps across v0.11.0–v0.16.0 (proposals 0009, 0011, 0014, 0015, 0016, 0017, 0018) in one bump. Only the surfaces introduced by proposals 0014–0017 are implemented in the batch's release; fixtures from 0011 are deferred-skip in the conformance suite and unmark with PR-5.
29+
- **`CheckpointRecord.schema_version` semantic shift (proposal 0014).** Previously a backend-internal record-shape version (`CHECKPOINT_SCHEMA_VERSION = "1"` constant), now the user-facing state-schema version per spec §10.2. The framework reads `type(state).schema_version` at save time. Pre-PR-4 records carrying `"1"` are reinterpreted as user-facing v1 identifiers; users with such records either declare `schema_version="1"` on their state class or discard the pre-PR-4 records. `SQLiteCheckpointer` no longer rejects records with non-default `schema_version` at the backend boundary; version-mismatch routing is now an engine concern at resume time. The `CHECKPOINT_SCHEMA_VERSION` module constant is removed; future record-shape evolution can add backend-private metadata fields if needed.
30+
- **`NodeEvent.pre_state` typed `Any` (was `State`).** Required by the new `checkpoint_migrated` phase which carries a `_MigrationSummary` payload rather than a `State` instance. Observer authors who type-narrowed `pre_state` to `State` should treat it as `Any` and narrow per-phase (e.g., `if event.phase == "completed": ...`). The `checkpoint_saved` phase already carried a State-flavored shape (not necessarily a typed `State` subclass instance), so this widens the declared type to match runtime reality rather than introducing a new constraint.
2631

2732
### Notes
2833

docs/concepts/checkpointing.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,134 @@ multi-process), S3 (cross-region durability). For event-sourced
149149
runtimes (Temporal, DBOS, Restate, Inngest) the Protocol is the
150150
adapter layer.
151151

152+
## State migrations
153+
154+
When a checkpoint was saved against an earlier version of your state
155+
schema and the code has since evolved, the engine consults a
156+
**migration registry** to bridge the saved record into the current
157+
shape. Without migrations, a schema change invalidates every prior
158+
checkpoint; with one short registration per change, you keep your
159+
saved records working across releases.
160+
161+
The wire-up is two pieces: declare a version on your state class,
162+
and register one migration per version bump.
163+
164+
```python
165+
from typing import ClassVar
166+
from openarmature.graph import State, GraphBuilder
167+
from openarmature.checkpoint import SQLiteCheckpointer
168+
169+
170+
class MyState(State):
171+
schema_version: ClassVar[str] = "v2"
172+
x: int = 0
173+
new_field: str = "default" # added in v2
174+
175+
176+
def add_new_field_default(state: dict) -> dict:
177+
return {**state, "new_field": "default"}
178+
179+
180+
graph = (
181+
GraphBuilder(MyState)
182+
.add_node(...)
183+
.with_checkpointer(SQLiteCheckpointer("ck.db", serialization="json"))
184+
.with_state_migration("v1", "v2", add_new_field_default)
185+
.compile()
186+
)
187+
```
188+
189+
On resume, the engine reads the saved record's `schema_version`. If
190+
it equals `MyState.schema_version`, the record loads via the §10.4
191+
fast path (no migration consulted). If it differs, the engine
192+
resolves a chain through the registry (BFS for the shortest path),
193+
applies each migration in order to the record's state, then
194+
deserializes the result into your current state class.
195+
196+
### Chain resolution
197+
198+
Registered migrations form a directed graph. Each
199+
`with_state_migration(a, b, fn)` is an edge from `a` to `b`. Chain
200+
resolution finds the shortest path between the saved version and the
201+
current version. Branching is fine: a v1 record can have one
202+
migration leading to v2 and another leading to v2-experimental;
203+
chain resolution picks the path that ends at the current declared
204+
version.
205+
206+
Two ambiguity cases are configuration errors. Both surface as
207+
`CheckpointStateMigrationChainAmbiguous`:
208+
209+
- **Duplicate edges.** Registering two migrations with the same
210+
`(from_version, to_version)` pair raises at registration time so
211+
the configuration error surfaces before any resume attempt.
212+
Either delete one or pick distinct version identifiers.
213+
- **Multiple shortest paths.** A diamond like
214+
`v1 → v2 → v4` and `v1 → v3 → v4` is ambiguous: both paths have
215+
length 2. The engine raises during resume so the user can
216+
register fewer migrations or pick a single canonical route.
217+
218+
### The three new error categories
219+
220+
- **`CheckpointStateMigrationChainAmbiguous`**: the registered
221+
migration graph is ambiguous (duplicate `(from, to)` pair at
222+
registration time, OR multiple distinct shortest paths between
223+
the saved and current versions at resume time). Surfaces before
224+
any migration function runs. Carries `from_version` and
225+
`to_version` when known.
226+
- **`CheckpointStateMigrationMissing`**: the saved version doesn't
227+
match the current version, and no chain bridges them. Carries
228+
`from_version`, `to_version`, a count of registered migrations,
229+
and a human-readable `registry_description` so operators see what
230+
IS available.
231+
- **`CheckpointStateMigrationFailed`**: a user-supplied migration
232+
function raised. Subsequent migrations in the chain don't run;
233+
the resume fails. The migration's exception rides `__cause__`.
234+
235+
Routing precedence on resume: chain-ambiguous → missing → failed →
236+
record-invalid.
237+
238+
A third category, `CheckpointRecordInvalid`, continues to cover the
239+
**post**-migration case: a migration ran cleanly but produced
240+
output that the current state class can't deserialize (missing a
241+
required field, wrong type, etc.). The three categories are
242+
mutually exclusive on any given resume.
243+
244+
### Backend support
245+
246+
Not every backend can migrate. Migration needs the backend to expose
247+
a **structural intermediate form** of the loaded state (a plain
248+
dict, JSON tree, or similar) that's independent of the current
249+
state class.
250+
251+
- **`SQLiteCheckpointer(serialization="json")`** can. JSON-encoded
252+
state loads to a dict; the migration function operates on the
253+
dict directly.
254+
- **`SQLiteCheckpointer(serialization="pickle")`** can NOT. Pickle
255+
holds class identity and round-trips back to typed instances.
256+
- **`InMemoryCheckpointer`** can NOT. It holds live typed-state
257+
references by reference; there's no serialization step.
258+
259+
On version mismatch against a non-migration-eligible backend, the
260+
engine raises `CheckpointRecordInvalid` (not
261+
`CheckpointStateMigrationMissing`): the registry has no chance to
262+
bridge.
263+
264+
### Parent-state migration
265+
266+
Subgraph saves carry a `parent_states` chain of the outer-graph
267+
state captured at the moment of the inner save. On resume, the same
268+
migration chain applies to each entry in `parent_states` in lockstep
269+
with the outer state. The spec treats `parent_states` as carrying
270+
the same `schema_version` as the outer record (no per-parent
271+
version metadata in v1).
272+
273+
### Migrations MUST be pure
274+
275+
A migration function MUST be deterministic, with no I/O, no implicit
276+
state, no random or wall-clock-derived output. The framework
277+
doesn't enforce purity, but violating it breaks determinism
278+
guarantees for resume.
279+
152280
## When NOT to use checkpointing
153281

154282
- **Pure pipelines that complete in seconds.** Restart-from-entry is

openarmature-spec

Submodule openarmature-spec updated 44 files

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ Repository = "https://github.com/LunarCommand/openarmature-python"
4848
Specification = "https://github.com/LunarCommand/openarmature-spec"
4949

5050
[tool.openarmature]
51-
spec_version = "0.15.0"
51+
spec_version = "0.16.0"
5252

5353
[dependency-groups]
5454
dev = [

src/openarmature/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
"""OpenArmature — workflow framework for LLM pipelines and tool-calling agents."""
22

33
__version__ = "0.5.0"
4-
__spec_version__ = "0.15.0"
4+
__spec_version__ = "0.16.0"

src/openarmature/checkpoint/__init__.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,12 @@
2626
CheckpointNotFound,
2727
CheckpointRecordInvalid,
2828
CheckpointSaveFailed,
29+
CheckpointStateMigrationChainAmbiguous,
30+
CheckpointStateMigrationFailed,
31+
CheckpointStateMigrationMissing,
2932
)
33+
from .migration import MigrationRegistry, StateMigration
3034
from .protocol import (
31-
CHECKPOINT_SCHEMA_VERSION,
3235
Checkpointer,
3336
CheckpointFilter,
3437
CheckpointRecord,
@@ -37,17 +40,21 @@
3740
)
3841

3942
__all__ = [
40-
"CHECKPOINT_SCHEMA_VERSION",
4143
"CheckpointError",
4244
"CheckpointFilter",
4345
"CheckpointNotFound",
4446
"CheckpointRecord",
4547
"CheckpointRecordInvalid",
4648
"CheckpointSaveFailed",
49+
"CheckpointStateMigrationChainAmbiguous",
50+
"CheckpointStateMigrationFailed",
51+
"CheckpointStateMigrationMissing",
4752
"CheckpointSummary",
4853
"Checkpointer",
4954
"InMemoryCheckpointer",
55+
"MigrationRegistry",
5056
"NodePosition",
5157
"SQLiteCheckpointer",
5258
"SerializationMode",
59+
"StateMigration",
5360
]

src/openarmature/checkpoint/backends/memory.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,25 @@ class InMemoryCheckpointer:
2828
Pydantic state instance the engine produces is what comes back
2929
from :meth:`load` — no serialization round-trip. (This is the
3030
feature: tests can assert on the saved state's identity.)
31+
32+
**State-migration eligibility:** none. Per spec §10.12.1, a
33+
backend supports migration only when it can expose a structural
34+
intermediate form of the loaded state independent of the current
35+
state class. This backend holds live typed instances by
36+
reference, so a version mismatch on resume raises
37+
``CheckpointRecordInvalid`` rather than consulting the
38+
migration registry.
3139
"""
3240

41+
# Per spec §10.12.1: in-memory storage holds live typed-state
42+
# references, so there's no class-independent intermediate form
43+
# the migration registry could consume. Declared at the class
44+
# level (not as a per-instance attribute) since the answer is
45+
# constructor-independent; the Protocol declaration in
46+
# ``protocol.py`` types this as ``bool`` (not ``ClassVar[bool]``)
47+
# so Pyright accepts a class-attribute override here.
48+
supports_state_migration: bool = False
49+
3350
def __init__(self) -> None:
3451
self._records: dict[str, CheckpointRecord] = {}
3552
self._lock = asyncio.Lock()

src/openarmature/checkpoint/backends/sqlite.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,6 @@
4242

4343
from ..errors import CheckpointRecordInvalid
4444
from ..protocol import (
45-
CHECKPOINT_SCHEMA_VERSION,
4645
CheckpointFilter,
4746
CheckpointRecord,
4847
CheckpointSummary,
@@ -109,6 +108,13 @@ def __init__(
109108
self._serialization: SerializationMode = serialization
110109
self._lock = asyncio.Lock()
111110
self._initialized = False
111+
# Per spec §10.12.1, a backend supports state migration only
112+
# when it can expose a structural intermediate form of the
113+
# loaded state that is independent of the current state
114+
# class. JSON serialization satisfies this (loads to dicts);
115+
# pickle holds class identity and round-trips to typed
116+
# instances, so it cannot bridge a schema-version mismatch.
117+
self.supports_state_migration: bool = serialization == "json"
112118

113119
def _connect(self) -> sqlite3.Connection:
114120
conn = sqlite3.connect(self._path)
@@ -230,12 +236,12 @@ def _do() -> tuple[Any, ...] | None:
230236
schema_version,
231237
recorded_serialization,
232238
) = row
233-
if schema_version != CHECKPOINT_SCHEMA_VERSION:
234-
raise CheckpointRecordInvalid(
235-
invocation_id,
236-
f"persisted schema_version={schema_version!r} does not match "
237-
f"current {CHECKPOINT_SCHEMA_VERSION!r}",
238-
)
239+
# Note: per spec §10.12 (proposal 0014), version mismatches
240+
# are no longer rejected at the backend boundary. The engine
241+
# routes mismatches through the migration registry on resume
242+
# (CheckpointStateMigrationMissing if no chain, else applies
243+
# the chain). The backend just round-trips the version
244+
# identifier as opaque data.
239245
state = self._decode(state_blob, recorded_serialization, invocation_id)
240246
position_dicts = self._decode(positions_blob, recorded_serialization, invocation_id)
241247
parent_states = self._decode(parent_states_blob, recorded_serialization, invocation_id)

0 commit comments

Comments
 (0)