Commit b92c083
authored
feat(checkpoint): state migration for checkpoints (proposal 0014) (#46)
* feat(checkpoint): state migration registry, types, errors, builder surface
Implements pipeline-utilities spec §10.12 (proposal 0014).
- New errors: CheckpointStateMigrationMissing,
CheckpointStateMigrationFailed. Both non-transient per §10.10.
The missing-chain error carries from_version / to_version /
registered_migrations_count / registry_description for
actionable operator diagnostics.
- New types: StateMigration (frozen dataclass — from_version,
to_version, migrate callable) and MigrationRegistry (BFS
chain resolution + ambiguity detection per §10.12.2).
- Multi-shortest-path detection: when BFS finds a shortest
path AND a second distinct path of equal length exists,
the registry raises ValueError per the spec's ambiguous-chain
rule. Resume surfaces this as CheckpointStateMigrationMissing
with the ambiguity description in the payload.
- State.schema_version: ClassVar[str] = '' (per spec §10.2's
per-language carve-out). Empty-string sentinel; the framework
reads type(state).schema_version at save time.
- Checkpointer Protocol: supports_state_migration: ClassVar[bool]
marker per §10.12.1. InMemoryCheckpointer: False (typed in-
memory references can't expose a class-independent
intermediate). SQLiteCheckpointer: True in JSON mode, False
in pickle mode (pickle holds class identity and round-trips
to typed instances; can't bridge versions).
- GraphBuilder.with_state_migration / with_state_migrations
thread a populated MigrationRegistry into CompiledGraph at
compile time.
- Resume-path routing (compiled.py): version mismatch →
unsupported-backend check → registry lookup → chain
application (with per-migration failure wrap) → final
deserialization. The post-migration deserialization failure
still surfaces as CheckpointRecordInvalid per §10.12.4;
pre-migration version mismatch routes through the new two
categories. Order matters; documented inline so a future
reader doesn't swap it back.
- Parent-state migration: same chain applied to each entry of
parent_states in lockstep with the outer state per §10.12.2.
Code comment records the spec-mandated equivalence so
future contributors don't add per-parent metadata without a
follow-on proposal.
- Drop the CHECKPOINT_SCHEMA_VERSION = '1' constant: per Q1
spec answer, the old backend-internal record-shape role had
no spec slot anyway. SQLiteCheckpointer no longer rejects
records with non-default versions on load — that routing
is now the engine's concern at resume time. Existing
records carrying schema_version='1' get reinterpreted as
user-facing v1 identifiers (single-user dev, no compat
shim needed per Chris's note).
* test(conformance): drive 0014 state-migration fixtures 039-046
New tests/conformance/test_state_migration.py drives all 8 spec
state-migration fixtures end-to-end against the real engine
(SQLite JSON-mode backend, real graph compile, real resume path).
Harness pieces:
- Migration mock library: add_new_field_default, add_v2_field,
add_v3_field, identity_passthrough, raises_keyerror,
should_not_run, irrelevant. Each fixture's migrate: <name>
resolves through the library.
- _MigrationTrace wraps each mock to capture invocation order
for the migrations_run / migration_count /
single_migration_invocation / migration_order_matches_chain
assertions. Consecutive duplicates collapse (fixture 043 runs
each step once for outer + once for each parent under the
lockstep ordering; the assertion is per-step, not per-entity).
- _seed_record persists a checkpoint matching the fixture's
seeded_record: block before invoke(resume_invocation=...) so
the resume path has data to load.
Harness/model adjustments:
- StateSchema in tests/conformance/harness/directives.py gains
optional schema_version (default '') and the required field
knob (no-default Pydantic field, used by fixture 044's
required_v2_field deserialization-failure case).
- _DEFERRED_FIXTURES in test_fixture_parsing.py loses the
039-046 rows; the CasesFixture model parses them via permissive
extras on CaseSpec.
- Initial-state construction in the resume path uses
state_cls.model_construct() so fixtures with required fields
(044) can pass a placeholder past Pydantic's validator before
the resume even starts; the engine loads state from the
checkpoint, not from the placeholder.
Protocol-attribute shape:
- Checkpointer.supports_state_migration declared as
bool = False (not ClassVar) so SQLiteCheckpointer can set
the value per-instance in __init__ based on serialization mode.
Backends with a static answer (InMemoryCheckpointer) override
at the class level with bool = False — Pyright accepts
either shape because Protocol attribute conformance ignores
the ClassVar marker on subclasses.
* test(unit): state migration + docs(concepts) state migrations section + CHANGELOG
tests/unit/test_state_migration.py (16 tests) covers gaps the
conformance fixtures don't exercise directly: BFS edge cases on
the registry, multi-shortest-path ambiguity detection, GraphBuilder
ergonomics (singular + plural registration, duplicate-pair
ValueError), and error attribute carriage including __cause__
preservation on CheckpointStateMigrationFailed.
docs/concepts/checkpointing.md gains a State migrations section
covering: the State.schema_version declaration, the registration
surface (with_state_migration / with_state_migrations), BFS chain
resolution and ambiguity cases (duplicate edges + multi-shortest-
path), the two new error categories and how they relate to
CheckpointRecordInvalid (§10.12.4), backend support and why
SQLite-pickle / InMemory aren't migration-eligible, the lockstep
parent_states migration rule, and the migrations-MUST-be-pure
contract.
CHANGELOG.md gains two new Added entries (state-migration
surface + Checkpointer.supports_state_migration Protocol
attribute) plus a Changed entry documenting the
CheckpointRecord.schema_version semantic shift and the
CHECKPOINT_SCHEMA_VERSION constant removal. Pre-1.0 breaking
change covered by the consolidated-release flag.
* feat(checkpoint): chain-ambiguous category + spec v0.16.0 bump
Spec proposal 0018 (chain-ambiguity category) landed in spec
v0.16.0 between PR-4 push and the spec agent's code review.
This commit adds the new error category and the required
routing swaps; PR-4 stays open as the carrier.
- Submodule pinned to v0.16.0; pyproject + __spec_version__ +
test_smoke pin assertions follow.
- New canonical error class CheckpointStateMigrationChainAmbiguous
(and the matching category string) added to checkpoint/errors.py,
re-exported from openarmature.checkpoint. Non-transient. Carries
optional from_version / to_version when known (always set for
the registration-time case; resume-time multi-shortest-path also
populates).
- MigrationRegistry.register raises
CheckpointStateMigrationChainAmbiguous directly at registration
time per spec §10.10 (proposal 0018), so the canonical category
surfaces at the registration boundary without wrapping. The
internal BFS keeps raising ValueError for multi-shortest-path
detection; CompiledGraph._migrate_record's except branch now
routes that to CheckpointStateMigrationChainAmbiguous at the
resume boundary.
- Routing precedence on resume per §10.10 (v0.16.0): chain-ambiguous
→ missing → failed → record-invalid. Code-comment in
_migrate_record documents the ordering so a future reader
doesn't swap it back.
- Code comments record two design-choice notes the spec agent
flagged: the load-bearing equal-depth-re-entry behavior in BFS
cycle protection (tightening to <= breaks multi-shortest-path
detection), and the registration-order ordering of describe()
output.
- Conformance harness gains the expected_chain_ambiguity_error
primitive. Accepts the canonical category at EITHER the case
top-level (registration-time detection: registration raises,
resume block unreachable) OR inside resume (load-time detection
during BFS). Spec §10.12.2's compile-time-SHOULD /
load-time-acceptable carve-out lands in the driver as a
single try/except wrap around both phases.
- Fixture 047 (state-migration-chain-ambiguous) covered; all 9
state-migration fixtures (039-047) pass.
- 3 new unit tests for the new category's class shape +
attribute carriage; existing duplicate-pair tests updated to
assert the canonical category instead of ValueError.
- docs/concepts/checkpointing.md updated for the third category +
the v0.16.0 routing precedence. CHANGELOG entry updated and
Pinned-version line bumped to 0.10.0 → 0.16.0.
OTel openarmature.checkpoint.migrate span emission deferred to
a follow-on; documented inline in _migrate_record. Spec §6
cross-ref is SHOULD-level, and the emission needs the
_InvocationContext that's currently created AFTER the migration
path runs. The follow-on can either restructure resume to build
the context first or use a side-channel observer dispatch.
* feat(observability): openarmature.checkpoint.migrate OTel span
Lands the §6 cross-ref piece of proposal 0014 that the previous
commit deferred. Versioned resumes whose migration chain runs
now emit a zero-duration `openarmature.checkpoint.migrate` span
on the OTel observer, parented under the invocation root.
- New _MigrationSummary frozen dataclass in graph/compiled.py
carries the chain metadata (from_version, to_version,
chain_length) out of _migrate_record back to invoke().
- _migrate_record return type becomes
tuple[CheckpointRecord, _MigrationSummary]; invoke() captures
the summary across the migration → context-creation gap.
- After context + delivery-worker setup but before the main
execution loop, invoke() dispatches a synthetic NodeEvent
with phase='checkpoint_migrated' carrying the summary on
pre_state. Mirrors the checkpoint_saved pattern (state-on-pre,
post=None) so the OTel observer's existing routing shape
picks it up cleanly.
- New phase 'checkpoint_migrated' added to NodeEvent's Literal
union and to KNOWN_PHASES (rejects typos in observer.phases
subscriptions). NodeEvent.pre_state typed as Any so the
permissive _MigrationSummary payload fits; the OTel observer
narrows defensively via isinstance.
- OTelObserver._emit_checkpoint_migrate_span emits the span
with the three normative attributes. Parented under the
invocation root span (opened lazily if not yet present —
versioned resumes are the first event on a new invocation_id,
before any node-span emission). Mirrors _emit_checkpoint_save_span.
- Two OTel unit tests: one positive (versioned resume emits the
span with the right attrs) and one negative (version-match
fast path emits no span, per spec §10.12.3).
CHANGELOG entry promoted from 'deferred to follow-on' to the
landed shape. Phase gated off default subscriptions so legacy
observers don't surface a new event without opt-in.
* docs(changelog): flag NodeEvent.pre_state Any widening for observer authors
* fix: CoPilot review pass on PR #46
- compiled.py: align save-side schema_version read on
self.state_cls (was type(post_state)); _maybe_save_checkpoint
drops @staticmethod so it can access self. Symmetric with the
resume-side check; subclass schema_versions no longer shadow
the declared graph schema.
- compiled.py _apply_migration_step: re-raise CheckpointError
subclasses unchanged before the bare-Exception wrap, so a
canonical-category raise from a user migration propagates
without being squashed as CheckpointStateMigrationFailed.
Trim the wrap-message to the from/to identity + 'raised while
migrating <label>' — __cause__ already carries the underlying
exception; embedding type(exc).__name__: str(exc) duplicates
info Python's traceback formatter shows anyway.
- builder.py with_state_migrations: pre-validate the full input
list (against existing registry + against earlier entries in
the same call) before mutating. A duplicate-pair raise in the
middle of the list no longer leaves earlier entries
half-registered. Update the singular with_state_migration
docstring to reflect the canonical raise type
(CheckpointStateMigrationChainAmbiguous, not ValueError) +
flag the empty-to_version ValueError.
- migration.py register: reject empty to_version (un-declared
sentinel is not a valid chain target per spec §10.2). Empty
from_version stays valid for the documented bridging case.
- migration.py resolve_chain: raise
CheckpointStateMigrationChainAmbiguous directly on
multi-shortest-path detection (was ValueError). The registry's
ambiguity contract is now one type regardless of when it
surfaces (register vs resolve). compiled.py's wrap removed.
- protocol.py Checkpointer: clarify the supports_state_migration
docstring on the Protocol-class-body vs runtime-attribute
semantic — class-body '= False' is a typing-level signal, not
a runtime guarantee. Engine uses getattr-with-default; backends
SHOULD declare for Pyright honesty.
- events.py NodeEvent: document the synthetic-phases conventions
(step=-1, dotted node_name, non-State pre_state on
checkpoint_migrated) in the class docstring so third-party
observer authors don't need to read the engine source.
- observer.py KNOWN_PHASES: expand the comment about the
synthetic-phase opt-in and cross-reference the NodeEvent
docstring.
- tests/conformance/test_state_migration.py: range docstring
updated from 039-046 to 039-047 with a one-paragraph note on
the chain-ambiguity surface 047 covers.
- tests/unit/test_state_migration.py: existing
ambiguous-shortest-paths test asserts the canonical category
(was ValueError); two new tests for the empty-to_version
rejection + empty-from_version bridging acceptance; two new
tests for with_state_migrations atomic pre-validation
(no partial-state on internal-list and
against-existing-registry duplicate cases).1 parent ddc99f2 commit b92c083
24 files changed
Lines changed: 1892 additions & 70 deletions
File tree
- docs/concepts
- src/openarmature
- checkpoint
- backends
- graph
- observability/otel
- tests
- conformance
- harness
- unit
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
11 | 14 | | |
12 | 15 | | |
13 | 16 | | |
| |||
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
25 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
26 | 31 | | |
27 | 32 | | |
28 | 33 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
149 | 149 | | |
150 | 150 | | |
151 | 151 | | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
152 | 280 | | |
153 | 281 | | |
154 | 282 | | |
| |||
Submodule openarmature-spec updated 44 files
- .github/workflows/docs.yml+76
- .gitignore+8-1
- CHANGELOG.md+53-37
- README.md+2-2
- docs/capabilities/graph-engine.md+1
- docs/capabilities/llm-provider.md+1
- docs/capabilities/observability.md+1
- docs/capabilities/pipeline-utilities.md+1
- docs/capabilities/prompt-management.md+1
- docs/changelog.md+1
- docs/governance.md+1
- docs/index.md+157
- docs/javascripts/header-link.js+17
- docs/javascripts/tablesort.js+6
- docs/javascripts/tablesort.min.js+6
- docs/openarmature.md-34
- docs/proposals.md+29
- docs/proposals/0001-graph-engine-foundation.md+1
- docs/proposals/0002-subgraph-explicit-mapping.md+1
- docs/proposals/0003-node-boundary-observer-hooks.md+1
- docs/proposals/0004-pipeline-utilities-middleware.md+1
- docs/proposals/0005-pipeline-utilities-parallel-fan-out.md+1
- docs/proposals/0006-llm-provider-core.md+1
- docs/proposals/0007-observability-otel-span-mapping.md+1
- docs/proposals/0008-pipeline-utilities-checkpointing.md+1
- docs/proposals/0009-pipeline-utilities-per-instance-fan-out-resume.md+1
- docs/proposals/0010-drain-timeout.md+1
- docs/proposals/0011-pipeline-utilities-parallel-branches.md+1
- docs/proposals/0012-graph-engine-completed-event-after-edges.md+1
- docs/proposals/0013-fan-out-config-on-node-event.md+1
- docs/proposals/0014-pipeline-utilities-state-migration.md+1
- docs/proposals/0015-llm-provider-multimodal-images.md+1
- docs/proposals/0016-llm-provider-structured-output.md+1
- docs/proposals/0017-prompt-management-core.md+1
- docs/proposals/0018-state-migration-chain-ambiguity.md+1
- docs/stylesheets/extra.css+353
- mkdocs.yml+132
- mkdocs_hooks.py+42
- proposals/0018-state-migration-chain-ambiguity.md+231
- pyproject.toml+18
- spec/pipeline-utilities/conformance/047-state-migration-chain-ambiguous.md+65
- spec/pipeline-utilities/conformance/047-state-migration-chain-ambiguous.yaml+96
- spec/pipeline-utilities/spec.md+39-16
- uv.lock+739
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
51 | | - | |
| 51 | + | |
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
29 | 32 | | |
| 33 | + | |
30 | 34 | | |
31 | | - | |
32 | 35 | | |
33 | 36 | | |
34 | 37 | | |
| |||
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
40 | | - | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
45 | 47 | | |
46 | 48 | | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
47 | 52 | | |
48 | 53 | | |
49 | 54 | | |
| 55 | + | |
50 | 56 | | |
51 | 57 | | |
52 | 58 | | |
| 59 | + | |
53 | 60 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
31 | 39 | | |
32 | 40 | | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
33 | 50 | | |
34 | 51 | | |
35 | 52 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
46 | 45 | | |
47 | 46 | | |
48 | 47 | | |
| |||
109 | 108 | | |
110 | 109 | | |
111 | 110 | | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
112 | 118 | | |
113 | 119 | | |
114 | 120 | | |
| |||
230 | 236 | | |
231 | 237 | | |
232 | 238 | | |
233 | | - | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
239 | 245 | | |
240 | 246 | | |
241 | 247 | | |
| |||
0 commit comments