Skip to content

Commit 316bc52

Browse files
feat(checkpoint): chain-ambiguous category + spec v0.16.0 bump
Spec proposal 0018 (chain-ambiguity category) landed in spec v0.16.0 between PR-4 push and the spec agent's code review. This commit adds the new error category and the required routing swaps; PR-4 stays open as the carrier. - Submodule pinned to v0.16.0; pyproject + __spec_version__ + test_smoke pin assertions follow. - New canonical error class CheckpointStateMigrationChainAmbiguous (and the matching category string) added to checkpoint/errors.py, re-exported from openarmature.checkpoint. Non-transient. Carries optional from_version / to_version when known (always set for the registration-time case; resume-time multi-shortest-path also populates). - MigrationRegistry.register raises CheckpointStateMigrationChainAmbiguous directly at registration time per spec §10.10 (proposal 0018), so the canonical category surfaces at the registration boundary without wrapping. The internal BFS keeps raising ValueError for multi-shortest-path detection; CompiledGraph._migrate_record's except branch now routes that to CheckpointStateMigrationChainAmbiguous at the resume boundary. - Routing precedence on resume per §10.10 (v0.16.0): chain-ambiguous → missing → failed → record-invalid. Code-comment in _migrate_record documents the ordering so a future reader doesn't swap it back. - Code comments record two design-choice notes the spec agent flagged: the load-bearing equal-depth-re-entry behavior in BFS cycle protection (tightening to <= breaks multi-shortest-path detection), and the registration-order ordering of describe() output. - Conformance harness gains the expected_chain_ambiguity_error primitive. Accepts the canonical category at EITHER the case top-level (registration-time detection: registration raises, resume block unreachable) OR inside resume (load-time detection during BFS). Spec §10.12.2's compile-time-SHOULD / load-time-acceptable carve-out lands in the driver as a single try/except wrap around both phases. - Fixture 047 (state-migration-chain-ambiguous) covered; all 9 state-migration fixtures (039-047) pass. - 3 new unit tests for the new category's class shape + attribute carriage; existing duplicate-pair tests updated to assert the canonical category instead of ValueError. - docs/concepts/checkpointing.md updated for the third category + the v0.16.0 routing precedence. CHANGELOG entry updated and Pinned-version line bumped to 0.10.0 → 0.16.0. OTel openarmature.checkpoint.migrate span emission deferred to a follow-on; documented inline in _migrate_record. Spec §6 cross-ref is SHOULD-level, and the emission needs the _InvocationContext that's currently created AFTER the migration path runs. The follow-on can either restructure resume to build the context first or use a side-channel observer dispatch.
1 parent fa7b96e commit 316bc52

12 files changed

Lines changed: 228 additions & 51 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
88

99
### Added
1010

11-
- **State migration for checkpointed graphs (proposal 0014, introduced in spec v0.15.0).** Saved checkpoints whose `schema_version` doesn't match the current state class now route through a registered migration chain instead of failing on resume. Surface: `State.schema_version: ClassVar[str] = ""` (declare a non-empty value to opt in), `GraphBuilder.with_state_migration(from_version, to_version, migrate)` and `with_state_migrations(*migrations)` for registration, `StateMigration` and `MigrationRegistry` types exported from `openarmature.checkpoint`. Chain resolution is BFS over the registered edges; the shortest path wins. Multi-shortest-path ambiguity (e.g., a diamond `v1→v2→v4` + `v1→v3→v4`) surfaces as `CheckpointStateMigrationMissing` per spec §10.12.2's load-time-detection allowance. Two new error categories: `CheckpointStateMigrationMissing` (no chain bridges the versions, or chain ambiguous) and `CheckpointStateMigrationFailed` (a migration function raised). Both non-transient. Post-migration deserialization failures still route to `CheckpointRecordInvalid` per §10.12.4. The same chain applies to each entry in `parent_states` in lockstep with the outer state per §10.12.2.
11+
- **State migration for checkpointed graphs (proposal 0014, introduced in spec v0.15.0; refined by proposal 0018 in spec v0.16.0).** Saved checkpoints whose `schema_version` doesn't match the current state class now route through a registered migration chain instead of failing on resume. Surface: `State.schema_version: ClassVar[str] = ""` (declare a non-empty value to opt in), `GraphBuilder.with_state_migration(from_version, to_version, migrate)` and `with_state_migrations(*migrations)` for registration, `StateMigration` and `MigrationRegistry` types exported from `openarmature.checkpoint`. Chain resolution is BFS over the registered edges; the shortest path wins. Three new error categories: `CheckpointStateMigrationChainAmbiguous` (proposal 0018: duplicate `(from, to)` pair at registration time, or multiple distinct shortest paths between the saved and current versions at resume time), `CheckpointStateMigrationMissing` (no chain bridges the versions), and `CheckpointStateMigrationFailed` (a migration function raised). All non-transient. Post-migration deserialization failures still route to `CheckpointRecordInvalid` per §10.12.4. The same chain applies to each entry in `parent_states` in lockstep with the outer state per §10.12.2. Routing precedence per §10.10 (v0.16.0): chain-ambiguous → missing → failed → record-invalid.
1212
- **`Checkpointer.supports_state_migration` Protocol attribute.** Marks whether a backend can expose the structural intermediate form (a plain dict, JSON tree) the migration registry consumes. `SQLiteCheckpointer(serialization="json")` opts in; `SQLiteCheckpointer(serialization="pickle")` and `InMemoryCheckpointer` opt out. On version mismatch against a non-migration-eligible backend the engine raises `CheckpointRecordInvalid` per spec §10.12.1.
1313
- **Prompt-management capability (proposal 0017, introduced in spec v0.15.0).** New `openarmature.prompts` subpackage. `PromptManager` composes one or more `PromptBackend`s, exposes `fetch` / `render` / `get`, applies the §8 fallback semantics (`prompt_store_unavailable` continues to the next backend; `prompt_not_found` stops the chain), and renders templates with Jinja2's `StrictUndefined` per §7. `Prompt` / `PromptResult` / `PromptGroup` are Pydantic models matching spec §3 / §4 / §9. Three error categories (`PromptNotFound`, `PromptRenderError`, `PromptStoreUnavailable`) with `PROMPT_TRANSIENT_CATEGORIES` exported for retry-middleware classifiers. `FilesystemPromptBackend` is the minimum local-filesystem reference backend (layout: `<root>/<label>/<name>.j2`; `version` derived from the first 16 hex chars of `template_hash`). New runtime dependency: `jinja2>=3.1`.
1414
- **`openarmature.prompts.context` — observability propagation per spec §11.** `with_active_prompt(result)` and `with_active_prompt_group(group)` context managers + `current_prompt_result()` / `current_prompt_group()` inspectors. When the OTel observer is active and an LLM call fires inside `with_active_prompt`, the `openarmature.llm.complete` span carries the normative `openarmature.prompt.*` attributes (`name`, `version`, `label`, `template_hash`, `rendered_hash`, `group_name`). Nesting is innermost-wins.
@@ -24,7 +24,7 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
2424

2525
### Changed
2626

27-
- **Pinned spec version: 0.10.0 → 0.15.0.** Adopts the skip-ahead governance principle: the submodule jumps across v0.11.0–v0.15.0 (proposals 0009, 0011, 0014, 0015, 0016, 0017) in one bump. Only the surface introduced by proposal 0016 is implemented in this changelog entry; fixtures from 0011 / 0014 / 0015 / 0017 are marked deferred-skip in the conformance suite and unmark as their respective PRs land.
27+
- **Pinned spec version: 0.10.0 → 0.16.0.** Adopts the skip-ahead governance principle: the submodule jumps across v0.11.0–v0.16.0 (proposals 0009, 0011, 0014, 0015, 0016, 0017, 0018) in one bump. Only the surfaces introduced by proposals 0014–0017 are implemented in the batch's release; fixtures from 0011 are deferred-skip in the conformance suite and unmark with PR-5.
2828
- **`CheckpointRecord.schema_version` semantic shift (proposal 0014).** Previously a backend-internal record-shape version (`CHECKPOINT_SCHEMA_VERSION = "1"` constant), now the user-facing state-schema version per spec §10.2. The framework reads `type(state).schema_version` at save time. Pre-PR-4 records carrying `"1"` are reinterpreted as user-facing v1 identifiers; users with such records either declare `schema_version="1"` on their state class or discard the pre-PR-4 records. `SQLiteCheckpointer` no longer rejects records with non-default `schema_version` at the backend boundary; version-mismatch routing is now an engine concern at resume time. The `CHECKPOINT_SCHEMA_VERSION` module constant is removed; future record-shape evolution can add backend-private metadata fields if needed.
2929

3030
### Notes

docs/concepts/checkpointing.md

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -203,29 +203,38 @@ migration leading to v2 and another leading to v2-experimental;
203203
chain resolution picks the path that ends at the current declared
204204
version.
205205

206-
Two ambiguity cases are configuration errors:
206+
Two ambiguity cases are configuration errors. Both surface as
207+
`CheckpointStateMigrationChainAmbiguous`:
207208

208209
- **Duplicate edges.** Registering two migrations with the same
209-
`(from_version, to_version)` pair raises `ValueError` at
210-
registration. Either delete one or pick distinct version
211-
identifiers.
210+
`(from_version, to_version)` pair raises at registration time so
211+
the configuration error surfaces before any resume attempt.
212+
Either delete one or pick distinct version identifiers.
212213
- **Multiple shortest paths.** A diamond like
213214
`v1 → v2 → v4` and `v1 → v3 → v4` is ambiguous: both paths have
214-
length 2. The engine surfaces this as
215-
`CheckpointStateMigrationMissing` on resume so the user can
215+
length 2. The engine raises during resume so the user can
216216
register fewer migrations or pick a single canonical route.
217217

218-
### The two new error categories
218+
### The three new error categories
219219

220+
- **`CheckpointStateMigrationChainAmbiguous`**: the registered
221+
migration graph is ambiguous (duplicate `(from, to)` pair at
222+
registration time, OR multiple distinct shortest paths between
223+
the saved and current versions at resume time). Surfaces before
224+
any migration function runs. Carries `from_version` and
225+
`to_version` when known.
220226
- **`CheckpointStateMigrationMissing`**: the saved version doesn't
221-
match the current version, and no chain (or no unambiguous chain)
222-
bridges them. Carries `from_version`, `to_version`, a count of
223-
registered migrations, and a human-readable `registry_description`
224-
so operators see what IS available.
227+
match the current version, and no chain bridges them. Carries
228+
`from_version`, `to_version`, a count of registered migrations,
229+
and a human-readable `registry_description` so operators see what
230+
IS available.
225231
- **`CheckpointStateMigrationFailed`**: a user-supplied migration
226232
function raised. Subsequent migrations in the chain don't run;
227233
the resume fails. The migration's exception rides `__cause__`.
228234

235+
Routing precedence on resume: chain-ambiguous → missing → failed →
236+
record-invalid.
237+
229238
A third category, `CheckpointRecordInvalid`, continues to cover the
230239
**post**-migration case: a migration ran cleanly but produced
231240
output that the current state class can't deserialize (missing a

openarmature-spec

Submodule openarmature-spec updated 44 files

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ Repository = "https://github.com/LunarCommand/openarmature-python"
4848
Specification = "https://github.com/LunarCommand/openarmature-spec"
4949

5050
[tool.openarmature]
51-
spec_version = "0.15.0"
51+
spec_version = "0.16.0"
5252

5353
[dependency-groups]
5454
dev = [

src/openarmature/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
"""OpenArmature — workflow framework for LLM pipelines and tool-calling agents."""
22

33
__version__ = "0.5.0"
4-
__spec_version__ = "0.15.0"
4+
__spec_version__ = "0.16.0"

src/openarmature/checkpoint/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
CheckpointNotFound,
2727
CheckpointRecordInvalid,
2828
CheckpointSaveFailed,
29+
CheckpointStateMigrationChainAmbiguous,
2930
CheckpointStateMigrationFailed,
3031
CheckpointStateMigrationMissing,
3132
)
@@ -45,6 +46,7 @@
4546
"CheckpointRecord",
4647
"CheckpointRecordInvalid",
4748
"CheckpointSaveFailed",
49+
"CheckpointStateMigrationChainAmbiguous",
4850
"CheckpointStateMigrationFailed",
4951
"CheckpointStateMigrationMissing",
5052
"CheckpointSummary",

src/openarmature/checkpoint/errors.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,42 @@ def __init__(
111111
self.registry_description = registry_description
112112

113113

114+
class CheckpointStateMigrationChainAmbiguous(CheckpointError):
115+
"""Raised when the registered migration graph is ambiguous per
116+
spec §10.10 / §10.12 (proposal 0018, spec v0.16.0):
117+
118+
- Duplicate-pair case (§10.12.1): two migrations register with the
119+
same ``(from_version, to_version)`` pair. Raised at registration
120+
time so the user sees the ambiguity before any resume attempt.
121+
- Multi-shortest-path case (§10.12.2): the registered migration
122+
graph has multiple distinct shortest paths between the saved
123+
and current versions (e.g., a diamond ``v1→v2→v4`` + ``v1→v3→v4``).
124+
Spec accepts either compile-time detection (recommended) or
125+
load-time detection (this impl runs the check inside BFS at
126+
resume time).
127+
128+
Non-transient: retrying without changing the migration graph
129+
will not succeed. Carries ``from_version`` / ``to_version`` when
130+
known (always set for the duplicate-pair case, set on the resume
131+
side too for multi-shortest-path detection).
132+
"""
133+
134+
category = "checkpoint_state_migration_chain_ambiguous"
135+
136+
from_version: str | None
137+
to_version: str | None
138+
139+
def __init__(
140+
self,
141+
*args: Any,
142+
from_version: str | None = None,
143+
to_version: str | None = None,
144+
) -> None:
145+
super().__init__(*args)
146+
self.from_version = from_version
147+
self.to_version = to_version
148+
149+
114150
class CheckpointStateMigrationFailed(CheckpointError):
115151
"""Raised on resume when a registered migration function raises
116152
during chain application (per spec §10.12.2). The migration's
@@ -140,6 +176,7 @@ def __init__(
140176
"CheckpointNotFound",
141177
"CheckpointRecordInvalid",
142178
"CheckpointSaveFailed",
179+
"CheckpointStateMigrationChainAmbiguous",
143180
"CheckpointStateMigrationFailed",
144181
"CheckpointStateMigrationMissing",
145182
]

src/openarmature/checkpoint/migration.py

Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
from dataclasses import dataclass
1616
from typing import Any
1717

18+
from .errors import CheckpointStateMigrationChainAmbiguous
19+
1820

1921
@dataclass(frozen=True)
2022
class StateMigration:
@@ -43,11 +45,14 @@ class MigrationRegistry:
4345
Registration-time invariants:
4446
4547
- Two migrations with the same ``from_version`` AND
46-
``to_version`` raise ``ValueError`` (chain ambiguity per
47-
§10.12.1).
48+
``to_version`` raise ``CheckpointStateMigrationChainAmbiguous``
49+
directly per spec §10.10 (proposal 0018) so the canonical
50+
category surfaces at the registration boundary without any
51+
wrapping by the builder.
4852
- Two migrations with the same ``from_version`` and different
4953
``to_version`` are permitted (branched migration graph;
50-
chain resolution picks a path).
54+
chain resolution picks a path or raises ambiguity if multiple
55+
shortest paths exist).
5156
5257
Resolution-time semantics (per §10.12.2):
5358
@@ -59,9 +64,11 @@ class MigrationRegistry:
5964
- Non-empty registry with no connecting path → same.
6065
- Found a unique shortest path → return ordered list.
6166
- Found multiple distinct shortest paths (same edge count,
62-
different edge sequences) → raise ``ValueError`` per
63-
§10.12.2's ambiguous-chain rule. Spec accepts load-time
64-
detection.
67+
different edge sequences) → raise ``ValueError`` internally;
68+
``CompiledGraph._migrate_record`` wraps the ``ValueError`` as
69+
``CheckpointStateMigrationChainAmbiguous`` at the resume
70+
boundary. The internal ``ValueError`` keeps the registry
71+
module dependency-light (no canonical-error import cycle).
6572
"""
6673

6774
def __init__(self) -> None:
@@ -71,9 +78,16 @@ def __init__(self) -> None:
7178
def register(self, migration: StateMigration) -> None:
7279
key = (migration.from_version, migration.to_version)
7380
if key in self._migrations:
74-
raise ValueError(
81+
# Per spec §10.10 / §10.12.1 (proposal 0018, spec v0.16.0):
82+
# duplicate-pair detection raises the canonical category
83+
# directly at registration time. The category surfaces
84+
# before any resume attempt — neither the builder nor the
85+
# caller needs to wrap.
86+
raise CheckpointStateMigrationChainAmbiguous(
7587
f"duplicate state migration {migration.from_version!r}→"
76-
f"{migration.to_version!r} registered; chain would be ambiguous"
88+
f"{migration.to_version!r} registered; chain would be ambiguous",
89+
from_version=migration.from_version,
90+
to_version=migration.to_version,
7791
)
7892
self._migrations[key] = migration
7993
self._edges.setdefault(migration.from_version, []).append(migration)
@@ -132,6 +146,12 @@ def resolve_chain(
132146
# shortest path. Allow re-entry only when the new
133147
# arrival is at the same layer as the first arrival
134148
# (distinct shortest paths through the same node).
149+
# NOTE: the strict-less-than comparison is load-
150+
# bearing for multi-shortest-path detection — a
151+
# diamond v1→v2→v4 + v1→v3→v4 lets BFS reach v4 via
152+
# both v2 and v3 at layer 2, and both paths land in
153+
# ``shortest_paths``. Tightening this to ``<=`` would
154+
# break the ambiguity check.
135155
prior_depth = distances.get(next_version)
136156
if prior_depth is not None and prior_depth < depth + 1:
137157
continue
@@ -154,6 +174,13 @@ def describe(self) -> str:
154174
"""Human-readable description of the registered set, used
155175
in the ``CheckpointStateMigrationMissing`` error payload.
156176
Empty registry returns ``"<no migrations registered>"``.
177+
178+
Output is registration-order (Python's dict preserves
179+
insertion order). Diff-friendly test assertions should
180+
not depend on the order across distinct registration
181+
sequences; if cross-language conformance ever needs a
182+
canonical order, a future change can sort by
183+
``(from_version, to_version)``.
157184
"""
158185
if not self._migrations:
159186
return "<no migrations registered>"

src/openarmature/graph/compiled.py

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949
CheckpointNotFound,
5050
CheckpointRecordInvalid,
5151
CheckpointSaveFailed,
52+
CheckpointStateMigrationChainAmbiguous,
5253
CheckpointStateMigrationFailed,
5354
CheckpointStateMigrationMissing,
5455
)
@@ -403,16 +404,17 @@ async def _migrate_record(
403404
current_schema_version,
404405
)
405406
except ValueError as exc:
406-
# MigrationRegistry signals ambiguous chains (multiple
407-
# distinct shortest paths) via ValueError. Spec §10.12.2
408-
# treats this as a configuration error — surface it
409-
# promptly during the resume attempt.
410-
raise CheckpointStateMigrationMissing(
407+
# MigrationRegistry signals multi-shortest-path ambiguity
408+
# via ValueError. Per spec §10.10 / §10.12.2 (proposal 0018,
409+
# spec v0.16.0), this routes to the canonical
410+
# CheckpointStateMigrationChainAmbiguous category. Spec
411+
# accepts load-time detection (this is the resume-side
412+
# path); the duplicate-pair case raises the same category
413+
# directly from MigrationRegistry.register at build time.
414+
raise CheckpointStateMigrationChainAmbiguous(
411415
str(exc),
412416
from_version=record.schema_version,
413417
to_version=current_schema_version,
414-
registered_migrations_count=len(self.migration_registry),
415-
registry_description=self.migration_registry.describe(),
416418
) from exc
417419

418420
if chain is None:
@@ -431,6 +433,17 @@ async def _migrate_record(
431433
for i, parent in enumerate(migrated_parents):
432434
migrated_parents[i] = _apply_migration_step(migration, parent, f"parent_states[{i}]")
433435

436+
# TODO(observability): emit an ``openarmature.checkpoint.migrate``
437+
# span per spec §6 cross-ref. Deferred to a follow-on because
438+
# ``_migrate_record`` runs before the invocation's
439+
# ``_InvocationContext`` is created (the engine needs the
440+
# migrated state shape to build the context), so the existing
441+
# ``_dispatch``-based observer pathway is not yet available
442+
# here. A natural fix is to dispatch a synthetic
443+
# ``checkpoint_migrated`` event as the first event of the
444+
# invocation after context creation. The chain metadata
445+
# captured for that event: ``from_version``, ``to_version``
446+
# (final target), ``chain_length = len(chain)``.
434447
return dataclass_replace(
435448
record,
436449
state=migrated_state,

0 commit comments

Comments
 (0)