Skip to content

Commit fa7b96e

Browse files
test(unit): state migration + docs(concepts) state migrations section + CHANGELOG
tests/unit/test_state_migration.py (16 tests) covers gaps the conformance fixtures don't exercise directly: BFS edge cases on the registry, multi-shortest-path ambiguity detection, GraphBuilder ergonomics (singular + plural registration, duplicate-pair ValueError), and error attribute carriage including __cause__ preservation on CheckpointStateMigrationFailed. docs/concepts/checkpointing.md gains a State migrations section covering: the State.schema_version declaration, the registration surface (with_state_migration / with_state_migrations), BFS chain resolution and ambiguity cases (duplicate edges + multi-shortest- path), the two new error categories and how they relate to CheckpointRecordInvalid (§10.12.4), backend support and why SQLite-pickle / InMemory aren't migration-eligible, the lockstep parent_states migration rule, and the migrations-MUST-be-pure contract. CHANGELOG.md gains two new Added entries (state-migration surface + Checkpointer.supports_state_migration Protocol attribute) plus a Changed entry documenting the CheckpointRecord.schema_version semantic shift and the CHECKPOINT_SCHEMA_VERSION constant removal. Pre-1.0 breaking change covered by the consolidated-release flag.
1 parent 63a5819 commit fa7b96e

3 files changed

Lines changed: 334 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
88

99
### Added
1010

11+
- **State migration for checkpointed graphs (proposal 0014, introduced in spec v0.15.0).** Saved checkpoints whose `schema_version` doesn't match the current state class now route through a registered migration chain instead of failing on resume. Surface: `State.schema_version: ClassVar[str] = ""` (declare a non-empty value to opt in), `GraphBuilder.with_state_migration(from_version, to_version, migrate)` and `with_state_migrations(*migrations)` for registration, `StateMigration` and `MigrationRegistry` types exported from `openarmature.checkpoint`. Chain resolution is BFS over the registered edges; the shortest path wins. Multi-shortest-path ambiguity (e.g., a diamond `v1→v2→v4` + `v1→v3→v4`) surfaces as `CheckpointStateMigrationMissing` per spec §10.12.2's load-time-detection allowance. Two new error categories: `CheckpointStateMigrationMissing` (no chain bridges the versions, or chain ambiguous) and `CheckpointStateMigrationFailed` (a migration function raised). Both non-transient. Post-migration deserialization failures still route to `CheckpointRecordInvalid` per §10.12.4. The same chain applies to each entry in `parent_states` in lockstep with the outer state per §10.12.2.
12+
- **`Checkpointer.supports_state_migration` Protocol attribute.** Marks whether a backend can expose the structural intermediate form (a plain dict, JSON tree) the migration registry consumes. `SQLiteCheckpointer(serialization="json")` opts in; `SQLiteCheckpointer(serialization="pickle")` and `InMemoryCheckpointer` opt out. On version mismatch against a non-migration-eligible backend the engine raises `CheckpointRecordInvalid` per spec §10.12.1.
1113
- **Prompt-management capability (proposal 0017, introduced in spec v0.15.0).** New `openarmature.prompts` subpackage. `PromptManager` composes one or more `PromptBackend`s, exposes `fetch` / `render` / `get`, applies the §8 fallback semantics (`prompt_store_unavailable` continues to the next backend; `prompt_not_found` stops the chain), and renders templates with Jinja2's `StrictUndefined` per §7. `Prompt` / `PromptResult` / `PromptGroup` are Pydantic models matching spec §3 / §4 / §9. Three error categories (`PromptNotFound`, `PromptRenderError`, `PromptStoreUnavailable`) with `PROMPT_TRANSIENT_CATEGORIES` exported for retry-middleware classifiers. `FilesystemPromptBackend` is the minimum local-filesystem reference backend (layout: `<root>/<label>/<name>.j2`; `version` derived from the first 16 hex chars of `template_hash`). New runtime dependency: `jinja2>=3.1`.
1214
- **`openarmature.prompts.context` — observability propagation per spec §11.** `with_active_prompt(result)` and `with_active_prompt_group(group)` context managers + `current_prompt_result()` / `current_prompt_group()` inspectors. When the OTel observer is active and an LLM call fires inside `with_active_prompt`, the `openarmature.llm.complete` span carries the normative `openarmature.prompt.*` attributes (`name`, `version`, `label`, `template_hash`, `rendered_hash`, `group_name`). Nesting is innermost-wins.
1315
- **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
@@ -23,6 +25,7 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
2325
### Changed
2426

2527
- **Pinned spec version: 0.10.0 → 0.15.0.** Adopts the skip-ahead governance principle: the submodule jumps across v0.11.0–v0.15.0 (proposals 0009, 0011, 0014, 0015, 0016, 0017) in one bump. Only the surface introduced by proposal 0016 is implemented in this changelog entry; fixtures from 0011 / 0014 / 0015 / 0017 are marked deferred-skip in the conformance suite and unmark as their respective PRs land.
28+
- **`CheckpointRecord.schema_version` semantic shift (proposal 0014).** Previously a backend-internal record-shape version (`CHECKPOINT_SCHEMA_VERSION = "1"` constant), now the user-facing state-schema version per spec §10.2. The framework reads `type(state).schema_version` at save time. Pre-PR-4 records carrying `"1"` are reinterpreted as user-facing v1 identifiers; users with such records either declare `schema_version="1"` on their state class or discard the pre-PR-4 records. `SQLiteCheckpointer` no longer rejects records with non-default `schema_version` at the backend boundary; version-mismatch routing is now an engine concern at resume time. The `CHECKPOINT_SCHEMA_VERSION` module constant is removed; future record-shape evolution can add backend-private metadata fields if needed.
2629

2730
### Notes
2831

docs/concepts/checkpointing.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,125 @@ multi-process), S3 (cross-region durability). For event-sourced
149149
runtimes (Temporal, DBOS, Restate, Inngest) the Protocol is the
150150
adapter layer.
151151

152+
## State migrations
153+
154+
When a checkpoint was saved against an earlier version of your state
155+
schema and the code has since evolved, the engine consults a
156+
**migration registry** to bridge the saved record into the current
157+
shape. Without migrations, a schema change invalidates every prior
158+
checkpoint; with one short registration per change, you keep your
159+
saved records working across releases.
160+
161+
The wire-up is two pieces: declare a version on your state class,
162+
and register one migration per version bump.
163+
164+
```python
165+
from typing import ClassVar
166+
from openarmature.graph import State, GraphBuilder
167+
from openarmature.checkpoint import SQLiteCheckpointer
168+
169+
170+
class MyState(State):
171+
schema_version: ClassVar[str] = "v2"
172+
x: int = 0
173+
new_field: str = "default" # added in v2
174+
175+
176+
def add_new_field_default(state: dict) -> dict:
177+
return {**state, "new_field": "default"}
178+
179+
180+
graph = (
181+
GraphBuilder(MyState)
182+
.add_node(...)
183+
.with_checkpointer(SQLiteCheckpointer("ck.db", serialization="json"))
184+
.with_state_migration("v1", "v2", add_new_field_default)
185+
.compile()
186+
)
187+
```
188+
189+
On resume, the engine reads the saved record's `schema_version`. If
190+
it equals `MyState.schema_version`, the record loads via the §10.4
191+
fast path (no migration consulted). If it differs, the engine
192+
resolves a chain through the registry (BFS for the shortest path),
193+
applies each migration in order to the record's state, then
194+
deserializes the result into your current state class.
195+
196+
### Chain resolution
197+
198+
Registered migrations form a directed graph. Each
199+
`with_state_migration(a, b, fn)` is an edge from `a` to `b`. Chain
200+
resolution finds the shortest path between the saved version and the
201+
current version. Branching is fine: a v1 record can have one
202+
migration leading to v2 and another leading to v2-experimental;
203+
chain resolution picks the path that ends at the current declared
204+
version.
205+
206+
Two ambiguity cases are configuration errors:
207+
208+
- **Duplicate edges.** Registering two migrations with the same
209+
`(from_version, to_version)` pair raises `ValueError` at
210+
registration. Either delete one or pick distinct version
211+
identifiers.
212+
- **Multiple shortest paths.** A diamond like
213+
`v1 → v2 → v4` and `v1 → v3 → v4` is ambiguous: both paths have
214+
length 2. The engine surfaces this as
215+
`CheckpointStateMigrationMissing` on resume so the user can
216+
register fewer migrations or pick a single canonical route.
217+
218+
### The two new error categories
219+
220+
- **`CheckpointStateMigrationMissing`**: the saved version doesn't
221+
match the current version, and no chain (or no unambiguous chain)
222+
bridges them. Carries `from_version`, `to_version`, a count of
223+
registered migrations, and a human-readable `registry_description`
224+
so operators see what IS available.
225+
- **`CheckpointStateMigrationFailed`**: a user-supplied migration
226+
function raised. Subsequent migrations in the chain don't run;
227+
the resume fails. The migration's exception rides `__cause__`.
228+
229+
A third category, `CheckpointRecordInvalid`, continues to cover the
230+
**post**-migration case: a migration ran cleanly but produced
231+
output that the current state class can't deserialize (missing a
232+
required field, wrong type, etc.). The three categories are
233+
mutually exclusive on any given resume.
234+
235+
### Backend support
236+
237+
Not every backend can migrate. Migration needs the backend to expose
238+
a **structural intermediate form** of the loaded state (a plain
239+
dict, JSON tree, or similar) that's independent of the current
240+
state class.
241+
242+
- **`SQLiteCheckpointer(serialization="json")`** can. JSON-encoded
243+
state loads to a dict; the migration function operates on the
244+
dict directly.
245+
- **`SQLiteCheckpointer(serialization="pickle")`** can NOT. Pickle
246+
holds class identity and round-trips back to typed instances.
247+
- **`InMemoryCheckpointer`** can NOT. It holds live typed-state
248+
references by reference; there's no serialization step.
249+
250+
On version mismatch against a non-migration-eligible backend, the
251+
engine raises `CheckpointRecordInvalid` (not
252+
`CheckpointStateMigrationMissing`): the registry has no chance to
253+
bridge.
254+
255+
### Parent-state migration
256+
257+
Subgraph saves carry a `parent_states` chain of the outer-graph
258+
state captured at the moment of the inner save. On resume, the same
259+
migration chain applies to each entry in `parent_states` in lockstep
260+
with the outer state. The spec treats `parent_states` as carrying
261+
the same `schema_version` as the outer record (no per-parent
262+
version metadata in v1).
263+
264+
### Migrations MUST be pure
265+
266+
A migration function MUST be deterministic, with no I/O, no implicit
267+
state, no random or wall-clock-derived output. The framework
268+
doesn't enforce purity, but violating it breaks determinism
269+
guarantees for resume.
270+
152271
## When NOT to use checkpointing
153272

154273
- **Pure pipelines that complete in seconds.** Restart-from-entry is

tests/unit/test_state_migration.py

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
"""Focused unit tests for the state-migration surface.
2+
3+
The conformance suite (``tests/conformance/test_state_migration.py``)
4+
covers the spec's behavioral surface end-to-end against fixtures
5+
039-046. These unit tests fill gaps the fixtures don't exercise
6+
directly: BFS edge cases on the registry, multi-shortest-path
7+
ambiguity detection, GraphBuilder ergonomics, and the error
8+
attribute carriage shape.
9+
"""
10+
11+
from __future__ import annotations
12+
13+
import pytest
14+
15+
from openarmature.checkpoint import (
16+
CheckpointStateMigrationFailed,
17+
CheckpointStateMigrationMissing,
18+
MigrationRegistry,
19+
StateMigration,
20+
)
21+
from openarmature.graph import END, GraphBuilder, State
22+
23+
# ---------------------------------------------------------------------------
24+
# MigrationRegistry — basic registration + iteration + describe
25+
# ---------------------------------------------------------------------------
26+
27+
28+
def _id(x: int) -> int:
29+
return x
30+
31+
32+
def test_registry_empty_describes_with_sentinel() -> None:
33+
registry = MigrationRegistry()
34+
assert len(registry) == 0
35+
assert registry.describe() == "<no migrations registered>"
36+
37+
38+
def test_registry_lists_registered_in_order() -> None:
39+
registry = MigrationRegistry()
40+
registry.register(StateMigration(from_version="v1", to_version="v2", migrate=_id))
41+
registry.register(StateMigration(from_version="v2", to_version="v3", migrate=_id))
42+
assert len(registry) == 2
43+
assert "v1 → v2" in registry.describe()
44+
assert "v2 → v3" in registry.describe()
45+
46+
47+
def test_registry_rejects_duplicate_edge() -> None:
48+
registry = MigrationRegistry()
49+
registry.register(StateMigration(from_version="v1", to_version="v2", migrate=_id))
50+
with pytest.raises(ValueError, match="duplicate state migration"):
51+
registry.register(StateMigration(from_version="v1", to_version="v2", migrate=_id))
52+
53+
54+
# ---------------------------------------------------------------------------
55+
# Chain resolution — empty, identity, single hop, multi hop
56+
# ---------------------------------------------------------------------------
57+
58+
59+
def test_resolve_chain_same_version_returns_empty_chain() -> None:
60+
registry = MigrationRegistry()
61+
chain = registry.resolve_chain("v1", "v1")
62+
assert chain == []
63+
64+
65+
def test_resolve_chain_empty_registry_returns_none() -> None:
66+
registry = MigrationRegistry()
67+
assert registry.resolve_chain("v1", "v2") is None
68+
69+
70+
def test_resolve_chain_unrelated_registry_returns_none() -> None:
71+
registry = MigrationRegistry()
72+
registry.register(StateMigration(from_version="v3", to_version="v4", migrate=_id))
73+
assert registry.resolve_chain("v1", "v2") is None
74+
75+
76+
def test_resolve_chain_single_hop() -> None:
77+
registry = MigrationRegistry()
78+
a_to_b = StateMigration(from_version="a", to_version="b", migrate=_id)
79+
registry.register(a_to_b)
80+
chain = registry.resolve_chain("a", "b")
81+
assert chain == [a_to_b]
82+
83+
84+
def test_resolve_chain_multi_hop_in_order() -> None:
85+
registry = MigrationRegistry()
86+
a_to_b = StateMigration(from_version="a", to_version="b", migrate=_id)
87+
b_to_c = StateMigration(from_version="b", to_version="c", migrate=_id)
88+
c_to_d = StateMigration(from_version="c", to_version="d", migrate=_id)
89+
# Register out of natural order to verify BFS doesn't depend on
90+
# registration order.
91+
registry.register(c_to_d)
92+
registry.register(a_to_b)
93+
registry.register(b_to_c)
94+
chain = registry.resolve_chain("a", "d")
95+
assert chain == [a_to_b, b_to_c, c_to_d]
96+
97+
98+
def test_resolve_chain_picks_shortest_when_unique() -> None:
99+
"""A short path exists alongside a longer one; BFS picks the short."""
100+
registry = MigrationRegistry()
101+
# Diamond with an extra step on one side.
102+
registry.register(StateMigration(from_version="v1", to_version="v2", migrate=_id))
103+
registry.register(StateMigration(from_version="v2", to_version="v3", migrate=_id))
104+
# Long detour: v1 -> v1a -> v1b -> v3.
105+
registry.register(StateMigration(from_version="v1", to_version="v1a", migrate=_id))
106+
registry.register(StateMigration(from_version="v1a", to_version="v1b", migrate=_id))
107+
registry.register(StateMigration(from_version="v1b", to_version="v3", migrate=_id))
108+
chain = registry.resolve_chain("v1", "v3")
109+
assert chain is not None
110+
assert [(m.from_version, m.to_version) for m in chain] == [
111+
("v1", "v2"),
112+
("v2", "v3"),
113+
]
114+
115+
116+
def test_resolve_chain_ambiguous_shortest_paths_raises() -> None:
117+
"""Diamond with two distinct same-length paths is ambiguous per spec §10.12.2."""
118+
registry = MigrationRegistry()
119+
registry.register(StateMigration(from_version="v1", to_version="v2", migrate=_id))
120+
registry.register(StateMigration(from_version="v1", to_version="v3", migrate=_id))
121+
registry.register(StateMigration(from_version="v2", to_version="v4", migrate=_id))
122+
registry.register(StateMigration(from_version="v3", to_version="v4", migrate=_id))
123+
with pytest.raises(ValueError, match="ambiguous migration chain"):
124+
registry.resolve_chain("v1", "v4")
125+
126+
127+
# ---------------------------------------------------------------------------
128+
# GraphBuilder ergonomics — singular + plural registration
129+
# ---------------------------------------------------------------------------
130+
131+
132+
class _Sv1(State):
133+
schema_version = "v1"
134+
x: int = 0
135+
136+
137+
def _build_minimal_graph() -> GraphBuilder[_Sv1]:
138+
async def _noop(_s: _Sv1) -> dict[str, int]:
139+
return {}
140+
141+
return GraphBuilder(_Sv1).add_node("noop", _noop).add_edge("noop", END).set_entry("noop")
142+
143+
144+
def test_builder_with_state_migration_singular() -> None:
145+
builder = _build_minimal_graph()
146+
builder.with_state_migration("v0", "v1", _id)
147+
compiled = builder.compile()
148+
assert len(compiled.migration_registry) == 1
149+
150+
151+
def test_builder_with_state_migrations_plural() -> None:
152+
builder = _build_minimal_graph()
153+
builder.with_state_migrations(
154+
StateMigration(from_version="v0", to_version="v1", migrate=_id),
155+
StateMigration(from_version="v1", to_version="v2", migrate=_id),
156+
)
157+
compiled = builder.compile()
158+
assert len(compiled.migration_registry) == 2
159+
160+
161+
def test_builder_duplicate_registration_raises() -> None:
162+
builder = _build_minimal_graph()
163+
builder.with_state_migration("v0", "v1", _id)
164+
with pytest.raises(ValueError, match="duplicate state migration"):
165+
builder.with_state_migration("v0", "v1", _id)
166+
167+
168+
# ---------------------------------------------------------------------------
169+
# Error attribute carriage
170+
# ---------------------------------------------------------------------------
171+
172+
173+
def test_migration_missing_carries_identity() -> None:
174+
exc = CheckpointStateMigrationMissing(
175+
"no chain",
176+
from_version="v1",
177+
to_version="v2",
178+
registered_migrations_count=3,
179+
registry_description="v3 → v4\nv5 → v6\nv7 → v8",
180+
)
181+
assert exc.from_version == "v1"
182+
assert exc.to_version == "v2"
183+
assert exc.registered_migrations_count == 3
184+
assert "v3 → v4" in exc.registry_description
185+
186+
187+
def test_migration_failed_carries_identity() -> None:
188+
exc = CheckpointStateMigrationFailed(
189+
"boom",
190+
from_version="v1",
191+
to_version="v2",
192+
)
193+
assert exc.from_version == "v1"
194+
assert exc.to_version == "v2"
195+
196+
197+
def test_migration_failed_preserves_cause_when_raised_from() -> None:
198+
"""The error's ``__cause__`` carries the original migration
199+
exception when raised from a try/except. This mirrors the engine's
200+
chain-application wrap."""
201+
underlying = KeyError("missing field")
202+
try:
203+
try:
204+
raise underlying
205+
except KeyError as exc:
206+
raise CheckpointStateMigrationFailed(
207+
"boom",
208+
from_version="v1",
209+
to_version="v2",
210+
) from exc
211+
except CheckpointStateMigrationFailed as final:
212+
assert final.__cause__ is underlying

0 commit comments

Comments
 (0)