Skip to content

Commit ca15567

Browse files
fix: CoPilot review pass on PR #46
- compiled.py: align save-side schema_version read on self.state_cls (was type(post_state)); _maybe_save_checkpoint drops @staticmethod so it can access self. Symmetric with the resume-side check; subclass schema_versions no longer shadow the declared graph schema. - compiled.py _apply_migration_step: re-raise CheckpointError subclasses unchanged before the bare-Exception wrap, so a canonical-category raise from a user migration propagates without being squashed as CheckpointStateMigrationFailed. Trim the wrap-message to the from/to identity + 'raised while migrating <label>' — __cause__ already carries the underlying exception; embedding type(exc).__name__: str(exc) duplicates info Python's traceback formatter shows anyway. - builder.py with_state_migrations: pre-validate the full input list (against existing registry + against earlier entries in the same call) before mutating. A duplicate-pair raise in the middle of the list no longer leaves earlier entries half-registered. Update the singular with_state_migration docstring to reflect the canonical raise type (CheckpointStateMigrationChainAmbiguous, not ValueError) + flag the empty-to_version ValueError. - migration.py register: reject empty to_version (un-declared sentinel is not a valid chain target per spec §10.2). Empty from_version stays valid for the documented bridging case. - migration.py resolve_chain: raise CheckpointStateMigrationChainAmbiguous directly on multi-shortest-path detection (was ValueError). The registry's ambiguity contract is now one type regardless of when it surfaces (register vs resolve). compiled.py's wrap removed. - protocol.py Checkpointer: clarify the supports_state_migration docstring on the Protocol-class-body vs runtime-attribute semantic — class-body '= False' is a typing-level signal, not a runtime guarantee. Engine uses getattr-with-default; backends SHOULD declare for Pyright honesty. - events.py NodeEvent: document the synthetic-phases conventions (step=-1, dotted node_name, non-State pre_state on checkpoint_migrated) in the class docstring so third-party observer authors don't need to read the engine source. - observer.py KNOWN_PHASES: expand the comment about the synthetic-phase opt-in and cross-reference the NodeEvent docstring. - tests/conformance/test_state_migration.py: range docstring updated from 039-046 to 039-047 with a one-paragraph note on the chain-ambiguity surface 047 covers. - tests/unit/test_state_migration.py: existing ambiguous-shortest-paths test asserts the canonical category (was ValueError); two new tests for the empty-to_version rejection + empty-from_version bridging acceptance; two new tests for with_state_migrations atomic pre-validation (no partial-state on internal-list and against-existing-registry duplicate cases).
1 parent 3f61a25 commit ca15567

8 files changed

Lines changed: 247 additions & 50 deletions

File tree

src/openarmature/checkpoint/migration.py

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,21 @@ def __init__(self) -> None:
7676
self._edges: dict[str, list[StateMigration]] = {}
7777

7878
def register(self, migration: StateMigration) -> None:
79+
# Per spec §10.2 / proposal 0014: empty ``to_version`` would
80+
# route migrations TO the "not declared" sentinel, which
81+
# the spec calls out as not migration-eligible — incoherent
82+
# as a chain target. Empty ``from_version`` stays valid:
83+
# it's the documented bridging path for pre-declaration
84+
# records (current class declares ``"v1"``; users register
85+
# ``with_state_migration("", "v1", fn)`` to migrate records
86+
# saved before the class declared a version). See Q4 in the
87+
# proposal-0014 coord-thread plan for the matrix.
88+
if not migration.to_version:
89+
raise ValueError(
90+
f"state migration to_version MUST be non-empty "
91+
f"(got from_version={migration.from_version!r}, "
92+
f"to_version={migration.to_version!r})"
93+
)
7994
key = (migration.from_version, migration.to_version)
8095
if key in self._migrations:
8196
# Per spec §10.10 / §10.12.1 (proposal 0018, spec v0.16.0):
@@ -106,8 +121,13 @@ def resolve_chain(
106121
"""Return an ordered chain of migrations bridging the two
107122
versions, or ``None`` if no chain exists.
108123
109-
Raises ``ValueError`` if multiple distinct shortest paths
110-
exist (ambiguous chain per §10.12.2).
124+
Raises ``CheckpointStateMigrationChainAmbiguous`` if
125+
multiple distinct shortest paths exist between
126+
``from_version`` and ``to_version`` (ambiguous chain per
127+
spec §10.10 / §10.12.2 — proposal 0018 / spec v0.16.0).
128+
Same canonical category as the duplicate-pair detection
129+
in ``register``; one type for chain ambiguity regardless
130+
of when it surfaces.
111131
"""
112132
if from_version == to_version:
113133
return []
@@ -162,11 +182,18 @@ def resolve_chain(
162182
return None
163183
if len(shortest_paths) > 1:
164184
descriptions = [" → ".join([from_version, *(e.to_version for e in p)]) for p in shortest_paths]
165-
raise ValueError(
185+
# Per spec §10.10 / §10.12.2 (proposal 0018, v0.16.0):
186+
# raise the canonical category directly so the
187+
# registry's ambiguity contract is one type regardless
188+
# of when it surfaces (duplicate-pair at register time
189+
# vs multi-shortest-path at resolve time).
190+
raise CheckpointStateMigrationChainAmbiguous(
166191
f"ambiguous migration chain from {from_version!r} to "
167192
f"{to_version!r}: multiple distinct shortest paths exist "
168193
f"({descriptions}); register fewer migrations or pick a "
169-
f"single canonical route"
194+
f"single canonical route",
195+
from_version=from_version,
196+
to_version=to_version,
170197
)
171198
return shortest_paths[0]
172199

src/openarmature/checkpoint/protocol.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,21 @@ class Checkpointer(Protocol):
160160
backends that cannot expose the intermediate MUST raise
161161
``CheckpointRecordInvalid`` on version mismatch even when
162162
migrations are registered — the registry has no chance to bridge.
163+
164+
**Attribute-presence contract.** The class-body ``= False``
165+
below is a typing-level signal, not a runtime guarantee:
166+
``typing.Protocol`` does not create an instance attribute on
167+
a conforming class that doesn't declare it itself. Concrete
168+
backends SHOULD declare ``supports_state_migration`` (either
169+
at the class level like ``InMemoryCheckpointer`` does, or as
170+
an ``__init__``-set instance attribute like
171+
``SQLiteCheckpointer`` does for the mode-dependent case) so
172+
Pyright accepts the structural conformance and ``getattr``
173+
sees the value. The engine's resume path reads the attribute
174+
via ``getattr(checkpointer, "supports_state_migration",
175+
False)``, so a third-party backend that omits the attribute
176+
entirely is treated as non-migration-eligible without
177+
raising — that's the runtime default the engine guarantees.
163178
"""
164179

165180
# Declared as an instance attribute (not ``ClassVar``) so backends
@@ -168,9 +183,11 @@ class Checkpointer(Protocol):
168183
# JSON-mode supports migration, pickle-mode doesn't, and the mode
169184
# is a per-instance constructor arg. Backends with a static answer
170185
# (InMemoryCheckpointer is always False) override at the class
171-
# level with ``ClassVar[bool] = False``; pyright is happy with
172-
# either shape because Protocol attribute conformance ignores the
173-
# ClassVar marker on subclasses.
186+
# level. Pyright accepts either shape because Protocol attribute
187+
# conformance ignores the ClassVar marker on subclasses. The
188+
# class-body default below is for typing only; see the
189+
# docstring's "Attribute-presence contract" section for the
190+
# runtime ``getattr``-based safety net.
174191
supports_state_migration: bool = False
175192

176193
async def save(self, invocation_id: str, record: CheckpointRecord) -> None:

src/openarmature/graph/builder.py

Lines changed: 39 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from types import GenericAlias, UnionType
1515
from typing import Any, Self, cast, get_args, get_origin
1616

17+
from openarmature.checkpoint.errors import CheckpointStateMigrationChainAmbiguous
1718
from openarmature.checkpoint.migration import MigrationRegistry, StateMigration
1819
from openarmature.checkpoint.protocol import Checkpointer
1920

@@ -274,9 +275,12 @@ def with_state_migration(
274275
state. The framework does not police purity (per §10.12.2),
275276
but violating it risks non-deterministic resume.
276277
277-
Raises ``ValueError`` at registration if the
278-
``(from_version, to_version)`` pair is already registered
279-
(per §10.12.1 chain-ambiguity rule).
278+
Raises ``CheckpointStateMigrationChainAmbiguous`` at
279+
registration if the ``(from_version, to_version)`` pair is
280+
already registered (per spec §10.10 / §10.12.1 — proposal
281+
0018 / spec v0.16.0). Also raises ``ValueError`` if
282+
``to_version`` is the empty-string sentinel (the un-declared
283+
marker per §10.2 is not a valid chain target).
280284
"""
281285
self._migration_registry.register(
282286
StateMigration(
@@ -290,7 +294,38 @@ def with_state_migration(
290294
def with_state_migrations(self, *migrations: StateMigration) -> Self:
291295
"""Register multiple migrations in one call. Convenience over
292296
``with_state_migration``; each entry is registered through the
293-
same path and obeys the same ambiguity rule."""
297+
same path and obeys the same ambiguity rule.
298+
299+
Pre-validates the full input list against the existing
300+
registry + against earlier entries in the same call before
301+
mutating, so a duplicate in the third entry cannot leave
302+
the first two half-registered. If any duplicate ``(from,
303+
to)`` pair is detected the call raises
304+
``CheckpointStateMigrationChainAmbiguous`` without mutating
305+
the registry; otherwise all entries register atomically.
306+
"""
307+
# Pre-validation pass: collect every (from, to) we're about
308+
# to add, check both against the existing registry and
309+
# against earlier entries in the input. Raise before
310+
# mutating if anything collides.
311+
seen_in_call: set[tuple[str, str]] = set()
312+
for m in migrations:
313+
key = (m.from_version, m.to_version)
314+
if key in self._migration_registry._migrations: # noqa: SLF001
315+
raise CheckpointStateMigrationChainAmbiguous(
316+
f"duplicate state migration {m.from_version!r}{m.to_version!r} already registered",
317+
from_version=m.from_version,
318+
to_version=m.to_version,
319+
)
320+
if key in seen_in_call:
321+
raise CheckpointStateMigrationChainAmbiguous(
322+
f"duplicate state migration {m.from_version!r}→"
323+
f"{m.to_version!r} repeated in with_state_migrations call",
324+
from_version=m.from_version,
325+
to_version=m.to_version,
326+
)
327+
seen_in_call.add(key)
328+
# Validation passed — commit them all.
294329
for migration in migrations:
295330
self._migration_registry.register(migration)
296331
return self

src/openarmature/graph/compiled.py

Lines changed: 42 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,10 @@
4646
from pydantic import ValidationError
4747

4848
from openarmature.checkpoint.errors import (
49+
CheckpointError,
4950
CheckpointNotFound,
5051
CheckpointRecordInvalid,
5152
CheckpointSaveFailed,
52-
CheckpointStateMigrationChainAmbiguous,
5353
CheckpointStateMigrationFailed,
5454
CheckpointStateMigrationMissing,
5555
)
@@ -274,10 +274,22 @@ def _apply_migration_step(
274274
"""
275275
try:
276276
return migration.migrate(value)
277+
except CheckpointError:
278+
# Preserve canonical category — if a migration raises a
279+
# CheckpointError subclass itself (rare; migrations are
280+
# spec-mandated pure per §10.12.2), propagate the original
281+
# category rather than wrapping it as
282+
# CheckpointStateMigrationFailed.
283+
raise
277284
except Exception as exc:
285+
# Concise wrap-message intentionally. ``raise ... from exc``
286+
# preserves the original exception on ``__cause__``;
287+
# Python's traceback formatter surfaces it, so embedding the
288+
# underlying ``type/str`` in this message would just
289+
# duplicate information (and confuse the output when the
290+
# underlying ``__str__`` is multi-line).
278291
raise CheckpointStateMigrationFailed(
279-
f"migration {migration.from_version!r}{migration.to_version!r} "
280-
f"raised while migrating {label}: {type(exc).__name__}: {exc}",
292+
f"migration {migration.from_version!r}{migration.to_version!r} raised while migrating {label}",
281293
from_version=migration.from_version,
282294
to_version=migration.to_version,
283295
) from exc
@@ -419,24 +431,16 @@ async def _migrate_record(
419431
f"support state migration",
420432
)
421433

422-
try:
423-
chain = self.migration_registry.resolve_chain(
424-
record.schema_version,
425-
current_schema_version,
426-
)
427-
except ValueError as exc:
428-
# MigrationRegistry signals multi-shortest-path ambiguity
429-
# via ValueError. Per spec §10.10 / §10.12.2 (proposal 0018,
430-
# spec v0.16.0), this routes to the canonical
431-
# CheckpointStateMigrationChainAmbiguous category. Spec
432-
# accepts load-time detection (this is the resume-side
433-
# path); the duplicate-pair case raises the same category
434-
# directly from MigrationRegistry.register at build time.
435-
raise CheckpointStateMigrationChainAmbiguous(
436-
str(exc),
437-
from_version=record.schema_version,
438-
to_version=current_schema_version,
439-
) from exc
434+
# resolve_chain raises CheckpointStateMigrationChainAmbiguous
435+
# directly on multi-shortest-path detection per spec §10.10
436+
# / §10.12.2 (proposal 0018, spec v0.16.0). No except-wrap
437+
# needed here — the canonical category propagates straight
438+
# through and the registry's exception contract is one type
439+
# regardless of when ambiguity surfaces (register vs resolve).
440+
chain = self.migration_registry.resolve_chain(
441+
record.schema_version,
442+
current_schema_version,
443+
)
440444

441445
if chain is None:
442446
raise CheckpointStateMigrationMissing(
@@ -1505,8 +1509,15 @@ def _dispatch_completed(
15051509
),
15061510
)
15071511

1508-
@staticmethod
1512+
# Instance method (not @staticmethod) so the save-time
1513+
# schema_version read goes through ``self.state_cls`` — matches
1514+
# the resume-side check, per spec §10.2's "framework reads
1515+
# schema_version from the state definition at save time"
1516+
# wording. Reading from ``type(post_state)`` would let a State
1517+
# subclass instance shadow the declared graph schema and
1518+
# trigger spurious migrations on resume.
15091519
async def _maybe_save_checkpoint(
1520+
self,
15101521
context: _InvocationContext,
15111522
*,
15121523
node_name: str,
@@ -1574,13 +1585,15 @@ async def _maybe_save_checkpoint(
15741585
# within-invocation order.
15751586
last_saved_at=time.time(),
15761587
# Per spec §10.2 (proposal 0014): read the user's
1577-
# state-schema version off the state class at save time.
1578-
# Empty-string sentinel when the user hasn't declared
1579-
# one — those records are not migration-eligible until
1580-
# they declare a non-empty version (per §10.2). The
1581-
# runtime type of ``post_state`` is the authoritative
1582-
# source (subclasses MAY override the ClassVar).
1583-
schema_version=cast("type[State]", type(post_state)).schema_version,
1588+
# state-schema version off the declared state class at
1589+
# save time. Empty-string sentinel when the user hasn't
1590+
# declared one — those records are not migration-eligible
1591+
# until they declare a non-empty version (per §10.2).
1592+
# ``self.state_cls`` is the authoritative source so the
1593+
# save-side read symmetrizes with the resume-side check
1594+
# (subclass schema_versions don't shadow the declared
1595+
# graph schema).
1596+
schema_version=self.state_cls.schema_version,
15841597
)
15851598
try:
15861599
await checkpointer.save(context.invocation_id, record)

src/openarmature/graph/events.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,32 @@ class NodeEvent:
122122
be ``None``.
123123
- On ``completed`` events, exactly one of ``post_state`` and
124124
``error`` is populated.
125+
126+
**Synthetic phases.** ``"checkpoint_saved"`` (pipeline-utilities
127+
§10.8) and ``"checkpoint_migrated"`` (proposal 0014 §6
128+
cross-ref) repurpose this dataclass for non-node events. Both
129+
are opt-in via ``phases={...}`` on observer registration —
130+
default subscriptions are ``{"started", "completed"}`` only, so
131+
legacy observers never see them. Conventions on synthetic
132+
events:
133+
134+
- ``checkpoint_saved``: ``pre_state`` carries the saved
135+
post-merge state (still a real ``State`` instance for this
136+
phase), ``post_state`` is ``None``. ``step`` matches the
137+
saving node's step.
138+
- ``checkpoint_migrated``: ``step=-1`` (no graph-step
139+
sequencing — migrations run before any node fires).
140+
``node_name="openarmature.checkpoint.migrate"`` and
141+
``namespace=("openarmature.checkpoint.migrate",)`` are
142+
dotted-pseudo identifiers, not real node names. ``pre_state``
143+
carries a private ``_MigrationSummary`` dataclass with
144+
``from_version`` / ``to_version`` / ``chain_length``, NOT a
145+
``State`` instance. ``parent_states`` is the empty tuple.
146+
147+
Because ``pre_state`` is no longer guaranteed to be a ``State``
148+
on the synthetic phases, its type is declared as ``Any`` and
149+
observer authors who subscribe to those phases MUST narrow
150+
per-phase before reading ``pre_state``.
125151
"""
126152

127153
node_name: str

src/openarmature/graph/observer.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,16 @@ async def __call__(self, event: NodeEvent, /) -> None: ...
9595
ALL_PHASES: frozenset[str] = frozenset({"started", "completed"})
9696

9797
# All phase values the engine produces (per spec graph-engine §6 +
98-
# pipeline-utilities §10.8). Used by the registration-time validator
99-
# to reject typos like ``phases={"complete"}``.
98+
# pipeline-utilities §10.8 + proposal 0014 §6 cross-ref). Used by
99+
# the registration-time validator to reject typos like
100+
# ``phases={"complete"}``.
101+
#
102+
# The two synthetic phases (``checkpoint_saved`` and
103+
# ``checkpoint_migrated``) repurpose the ``NodeEvent`` shape for
104+
# non-node events — see the ``NodeEvent`` docstring for conventions.
105+
# Both are opt-in via explicit ``phases={...}``; the default
106+
# subscription ``ALL_PHASES`` above is ``{"started", "completed"}``
107+
# only, so legacy observers never receive them.
100108
KNOWN_PHASES: frozenset[str] = frozenset({"started", "completed", "checkpoint_saved", "checkpoint_migrated"})
101109

102110

tests/conformance/test_state_migration.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Run every spec state-migration conformance fixture (039-046) end-to-end.
1+
"""Run every spec state-migration conformance fixture (039-047) end-to-end.
22
33
The fixtures live under
44
``spec/pipeline-utilities/conformance/`` as ``cases`` shapes; each
@@ -9,6 +9,14 @@
99
specifying either an ``expected`` happy-path or an
1010
``expected_error`` raise.
1111
12+
Fixture 047 (``state-migration-chain-ambiguous``, added in spec
13+
v0.16.0 / proposal 0018) exercises the
14+
``expected_chain_ambiguity_error`` harness primitive that accepts
15+
the canonical ``checkpoint_state_migration_chain_ambiguous``
16+
category at EITHER build time (duplicate-pair registration) or
17+
resume time (multi-shortest-path detection) per spec §10.12.2's
18+
compile-time-SHOULD / load-time-acceptable carve-out.
19+
1220
The driver:
1321
1422
1. Builds a State subclass via ``adapter.build_state_cls``, then
@@ -26,7 +34,8 @@
2634
5. Calls ``invoke(resume_invocation=<seeded id>)`` and asserts
2735
against ``resume.expected`` (final state, migrations_run,
2836
invariants) OR ``resume.expected_error`` (category, carries,
29-
cause).
37+
cause) OR ``expected_chain_ambiguity_error`` at either case
38+
top-level (build-time) or inside ``resume:`` (load-time).
3039
"""
3140

3241
from __future__ import annotations

0 commit comments

Comments
 (0)