Skip to content

Fix custom serialization gaps from #154#161

Merged
berndverst merged 9 commits into
microsoft:mainfrom
andystaples:andystaples-fix-serialization-gaps
Jun 26, 2026
Merged

Fix custom serialization gaps from #154#161
berndverst merged 9 commits into
microsoft:mainfrom
andystaples:andystaples-fix-serialization-gaps

Conversation

@andystaples

@andystaples andystaples commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

PR #154 introduced type-aware custom object serialization but left several round-tripping gaps. This PR closes them — and fixes a couple introduced by #154

Builds on / follows up: #154.

Major fixes

1. to_json() hook precedence on the serialize side

The encoder checked the dataclass / SimpleNamespace branches before the to_json() hook, so those types could never use a custom to_json() — and a dataclass with a field that is not JSON-serializable on its own would raise even when it provided a to_json() hook to handle that field. The serialize side now prefers to_json(), mirroring the decode side which already prefers from_json().

2. Nested to_json() honored (drop dataclasses.asdict)

Dataclasses were encoded with dataclasses.asdict(), which recursively flattens nested dataclasses to plain dicts (bypassing their to_json()) and deep-copies every leaf. Dataclasses are now encoded via a shallow field mapping so json.dumps recurses through the same hook, honoring nested to_json() and avoiding the deep copy.

3. from_json / type-directed recursion symmetry

to_json is recursive but reconstruction only recursed through list / Optional / Union / dataclass fields. It now also recurses into dict/Mapping values and tuple elements. For cases the built-in recursion can't cover, a from_json classmethod may now optionally accept the active converter — from_json(cls, value, converter) — and call converter.coerce(...) / converter.deserialize(...) to rebuild nested typed values. The single-argument form is unchanged.

Fixed-length tuple[T1, T2, ...] coercion now fails fast: a JSON array whose length does not match the tuple arity raises a TypeError rather than silently dropping trailing elements (or producing a short tuple) via zip.

4. Entity state: serialized-at-rest representation

StateShim eagerly deserialized the entity's wire state in its constructor, before the caller could supply a target type. It now holds the entity state in its serialized JSON string form at all times — the raw wire payload is stored verbatim, and a live value supplied via set_state is serialized immediately. This single, always-serialized representation has three benefits:

  • Deferred, type-directed deserialization. get_state(intended_type) hands the raw string and the requested type to the converter together, so a custom converter can deserialize directly into the target type.
  • No double-encoding on persist. An unmodified payload is handed straight back to the wire unchanged.
  • Earlier error isolation. A value that can't be serialized raises at set_state time — inside the failing operation, which rolls back — instead of after the whole batch has run.

A redundant serialize→deserialize round-trip in the legacy entity event path was replaced with converter.coerce(...).

Note

Behavior change: because the held state is always the serialized form, get_state() now returns a freshly reconstructed value on every call rather than a reference to a single cached object. Mutating a value returned by get_state() in place no longer affects the persisted state — write it back with set_state() to persist. This is documented in the changelog under CHANGED.

5. enum.Enum round-tripping

Non-int enums failed to serialize (the decode side already reconstructed them). Enums now serialize to their underlying .value and reconstruct to the member when a target type is supplied — standalone, as dataclass fields, and inside list / dict / tuple. (IntEnum / IntFlag already serialized as integers.)

6. Forward-reference resolution (Python 3.10 parity)

On Python 3.10, typing.get_type_hints() does not deep-resolve forward references nested inside container args (e.g. the element type of list["TreeNode"] on a self-referential dataclass), leaving a bare string / ForwardRef. Type-directed reconstruction then skipped nested coercion and returned raw dicts — so a recursive / self-referential dataclass would not reconstruct its nested values on 3.10 (it worked on 3.11+). _build_dataclass now resolves forward references in each field hint against the dataclass's defining module before coercing, restoring 3.11+ behavior. Unresolvable names are left untouched (harmless fall back to parsed JSON), and the destination type is still entirely caller-supplied, so the security model is unchanged.

7. Module consolidation + deprecation

The internal json_codec module (added in #154, unreleased) is merged into durabletask.serialization and its functions are made private; the supported, stable surface is the pluggable DataConverter. durabletask.internal.shared.to_json / from_json are deprecated with a DeprecationWarning and continue to work for backwards compatibility.

Tests

  • New tests/durabletask/test_data_converter_roundtrip.py — comprehensive JsonDataConverter round-trip matrix (plain/nested/recursive dataclasses, to_json/from_json hooks incl. nested, custom classes, enums, containers, Optional/Union, primitives).
  • New tests/durabletask/test_shared.py — deprecation-warning coverage for the legacy shim.
  • Extended serialization + entity-executor tests for each fix: hook precedence, nested-hook recursion, dict/tuple recursion, fixed-length-tuple length-mismatch, serialized-at-rest StateShim (deferred deserialization, no-double-encode passthrough, eager-serialize-on-set_state, commit/rollback), converter-aware from_json, and version-independent forward-reference resolution.
  • flake8 and pyright --strict clean on changed files.

Intentional limitations (documented via tests)

  • Multi-member Union[A, B] is left as parsed JSON rather than guessing a member (security tradeoff; Optional[T] works).
  • datetime, Decimal, set, bytes, and arbitrary plain objects still require a to_json hook or a custom DataConverter; they raise a clear TypeError naming the type.

Compatibility

No new breaking changes vs 1.5.0 — all fixes are additive or warnings-only, except the entity get_state() reference-identity behavior change noted in §4 (documented in the changelog). Payloads produced by older SDK versions still deserialize (including into SimpleNamespace when no type is supplied). durabletask.azuremanaged (which subclasses the core worker/client) continues to work.

Close several round-tripping gaps left by the type-aware custom
serialization work in microsoft#154, without introducing new breaking changes
versus 1.5.0 or any serialization-related security concerns.

Serialize side:
- Prefer a to_json() hook over the built-in dataclass / SimpleNamespace
  handling so a dataclass (or namespace) with a non-serializable field can
  opt in, mirroring the decode side which already prefers from_json().
- Encode dataclasses via a shallow field mapping instead of
  dataclasses.asdict(), so nested to_json() hooks are honored and leaf
  values are not deep-copied.
- Serialize enum.Enum values to their underlying .value so non-int enums
  round-trip (IntEnum already serialized as integers).

Deserialize side:
- Recurse type-directed reconstruction into dict/Mapping values and tuple
  elements, in addition to the existing list / Optional / Union / dataclass
  recursion.
- Optionally pass the active DataConverter to a from_json(cls, value,
  converter) hook so it can rebuild nested typed values the built-in
  recursion does not cover.

Entity state:
- Defer deserialization of an entity's wire state until get_state() is
  called, so the caller's requested type reaches the converter together
  with the raw payload. Track whether the held value is still the raw
  serialized string and pass it back through unchanged on persist to avoid
  double-encoding.
- Replace a redundant serialize/deserialize round-trip in the legacy entity
  event path with converter.coerce().

Module structure / deprecation:
- Merge the internal json_codec module into durabletask.serialization and
  make the codec functions private; the supported surface is the pluggable
  DataConverter.
- Deprecate durabletask.internal.shared.to_json / from_json with a
  DeprecationWarning; they continue to work for backwards compatibility.

Adds a comprehensive JsonDataConverter round-trip test suite plus targeted
tests for each fix, and documents intentional limitations (multi-member
Union, types needing a custom converter such as datetime/Decimal/set).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 26, 2026 20:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a follow-up to #154 that closes several round-tripping gaps in the SDK’s type-directed, secure JSON serialization model by consolidating the internal codec into durabletask.serialization, expanding typed recursion support (dict/tuple/enums), and fixing entity-state handling to avoid eager deserialization and double-encoding.

Changes:

  • Prefer to_json() hooks over dataclass/SimpleNamespace branches during encoding and avoid dataclasses.asdict() to preserve nested hook behavior.
  • Expand type-directed reconstruction to handle dict/Mapping values, tuple elements, and enum.Enum round-tripping; allow optional converter-aware from_json(cls, value, converter).
  • Defer entity state deserialization in StateShim, pass through raw serialized payloads unchanged when unmodified, and remove redundant serialize→deserialize round-trips.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/durabletask/test_shared.py Adds coverage for deprecated durabletask.internal.shared serialization shims.
tests/durabletask/test_serialization.py Updates tests to target private codec in durabletask.serialization and adds regression coverage.
tests/durabletask/test_entity_executor.py Adds tests for deferred entity-state deserialization and double-encode prevention.
tests/durabletask/test_data_converter_roundtrip.py Adds a comprehensive JsonDataConverter round-trip matrix across supported/unsupported shapes.
durabletask/worker.py Switches entity-state handling to use deferred deserialization and removes redundant serialization round-trips.
durabletask/serialization.py Consolidates the JSON codec, adds enum handling, dict/tuple recursion, and converter-aware from_json.
durabletask/internal/shared.py Deprecates legacy to_json/from_json shims while keeping back-compat wrappers.
durabletask/internal/json_codec.py Removes the old internal codec module after consolidation.
durabletask/internal/entity_state_shim.py Implements deferred deserialization, raw-payload passthrough, and encode_state().
CHANGELOG.md Documents user-facing fixes and deprecations under ## Unreleased.

Comment thread durabletask/serialization.py Outdated
Comment thread durabletask/internal/entity_state_shim.py
@andystaples andystaples marked this pull request as draft June 26, 2026 21:31
- Rename is_reconstructible to can_reconstruct
- Correct ownership of _can_reconstruct
- Required DataConverter for internal classes
@andystaples andystaples marked this pull request as ready for review June 26, 2026 22:10
@andystaples

Copy link
Copy Markdown
Contributor Author

Follow-up changes (this round)

A few additions and refinements on top of the original PR, captured here to keep the summary from growing unbounded.

1. New example: pydantic-backed DataConverter

Added examples/custom_data_converter/ — a self-contained subproject showing how to plug a third-party serialization/validation library into the SDK via a custom DataConverter:

  • PydanticDataConverter serializes pydantic.BaseModel subclasses with model_dump_json() and reconstructs (with validation) via model_validate_json() / model_validate(), delegating everything else to the default JsonDataConverter.
  • In-process tests against the in-memory backend prove the round-trip end-to-end (typed input arrives validated, typed Receipt read back via state.get_output(...), constraint violations fail the orchestration) — no sidecar/emulator/Azure required.

2. New extension point: DataConverter.can_reconstruct(target_type)

Building the example surfaced a real gap: inbound type-discovery hard-coded the default converter's notion of what's reconstructable (dataclasses / from_json), so a custom converter could serialize a type but never get that type back on the inbound path — a pydantic input annotated order: Order arrived as a plain dict.

Fixed this properly rather than working around it:

  • DataConverter now exposes an overridable can_reconstruct(target_type). type_discovery consults the active converter instead of a hard-coded gate, and the worker threads its converter into all four discovery call sites (activity in/out, orchestrator in, entity in).
  • On the abstract/concrete split: can_reconstruct is not abstract — it's an optional inbound-discovery hook, and forcing it on every converter would break converters used only for entity state / custom status (and minimal test doubles) that never touch type discovery. The base returns a conservative False (a converter claims nothing unless it opts in); the dataclass/from_json policy moved into JsonDataConverter where it belongs.
  • The pydantic example overrides can_reconstruct to recognize models and defers the rest to its JsonDataConverter fallback.

3. Internal classes: data_converter is now required

_RuntimeOrchestrationContext, _OrchestrationExecutor, _ActivityExecutor, and _EntityExecutor took data_converter: DataConverter | None = None and silently fell back to JsonDataConverter(). Since these are private and always constructed by the worker with its already-resolved converter, the fallback could mask a mismatched-converter bug (it would "work" but with the wrong serialization). Made the parameter required and dropped the fallback so any missed wiring fails loudly at construction. (No public-API impact; unreleased.)

Misc

  • Renamed the new hook is_reconstructablecan_reconstruct (more idiomatic; unreleased, so no compat concern).
  • Fixed a Python 3.10 reconstruction gap: forward references nested in container args (e.g. list["TreeNode"] on a self-referential dataclass) weren't deep-resolved by get_type_hints on 3.10, so nested values came back as raw dicts. Now resolved against the class's defining module, matching 3.11+.
  • Fixed-length tuple[...] coercion now raises on a length mismatch instead of silently truncating via zip; StateShim error message now names the reconstructed value's type rather than always str.

All tests/durabletask unit tests + the new example tests pass; flake8 clean and pyright clean on changed source.

@berndverst berndverst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at head 8db4510. The seven documented fixes are real and correctly implemented, the can_reconstruct extension design is sound, and the CHANGELOG is honest and well-consolidated. Two things to flag — one that needs a fix before the feature is actually usable, plus a changelog suggestion — and a few smaller inline notes.

Note

Baseline for this review is v1.6.0 — both #154 and #161 landed after the v1.6.0 changelog entry and are still unreleased, so the whole ## Unreleased block is the v1.6.0 -> next delta.

Most important: the custom DataConverter isn't wired through the Azure-managed (DTS) worker/client

data_converter appears nowhere in durabletask-azuremanaged. DurableTaskSchedulerWorker / DurableTaskSchedulerClient (sync + async) don't accept it, have no **kwargs, and don't forward it to super().__init__(). Consequences:

  • This PR's runnable example (examples/custom_data_converter/src/app.py) calls DurableTaskSchedulerWorker(data_converter=...) / DurableTaskSchedulerClient(data_converter=...) and will raise TypeError at runtime (inline comments below). The example's tests pass only because they bypass app.py and use the core in-memory TaskHubGrpcWorker/TaskHubGrpcClient; example tests aren't run in CI (testpaths = ["tests"]), so nothing catches this.
  • More broadly, the headline feature of #154/#161 is currently unreachable from DTS — the package most production users run.

Fix is small: add data_converter: DataConverter | None = None to the three azuremanaged __init__s and forward it to super().__init__(...). Then the example runs as written.

Smaller items (see inline)

  • Entity get_state() reference-identity change is a real, silent behavior change vs v1.6.0 for entities holding mutable state. No shipped example breaks (they all read-modify-set_state on immutable int), and it's documented under CHANGED — but consider making it a prominent migration note.
  • Pydantic example only intercepts top-level model types; list[Model] / dict[str, Model] fall through to the fallback (converter.py note).
  • _hook_accepts_converter arity heuristic can misfire on a from_json(cls, value, unrelated=default) hook (serialization.py note).
  • type_discovery.py docstring references a non-existent DataConverter.is_reconstructable — should be can_reconstruct (suggestion inline).

Security and back-compat look solid: destination types are always caller-supplied (never read from the payload), eval in forward-ref resolution runs only on caller-authored annotation strings (never wire data), multi-member Union is left untouched rather than guessed, the lru_caches are bounded, old _AUTO_SERIALIZED payloads still replay, and the deprecated shared.to_json / shared.from_json still work.

Changelog suggestion

Since #154 and #161 both landed after v1.6.0 and are unreleased, the ## Unreleased section should read as a single v1.6.0 -> next delta — which it now does well. Nothing from #154 is left dangling as a separate "fixed the unreleased thing" entry, and the falsy-state FIXED bullet is correctly a real fix vs v1.6.0 (v1.6.0 still used a truthiness check entityState=...to_json(entity_state._current_state) if entity_state._current_state else None that wrote 0/""/[]/{} as None). Two suggestions:

1. durabletask-azuremanaged/CHANGELOG.md is currently N/A under ## Unreleased. Once the worker/client forward data_converter (above), document the capability for the package that ships DTS support:

## Unreleased

ADDED

- `DurableTaskSchedulerWorker`, `DurableTaskSchedulerClient`, and the async
  client now accept a `data_converter` argument and forward it to the base
  worker/client, so a custom `durabletask.serialization.DataConverter` (for
  example a pydantic-backed one) can be used with the Durable Task Scheduler.

2. Optional: make the entity get_state() CHANGED bullet state that it's a behavior change relative to the last release (v1.6.0) for entities that mutate state in place:

EntityContext.get_state() / DurableEntity.get_state() now return a freshly reconstructed value on every call rather than a reference to a single cached object. This changes v1.6.0 behavior: mutating the returned value in place no longer affects persisted state — write it back with set_state(). State is also serialized eagerly at set_state() time, so a non-serializable value fails inside the operation (which rolls back) instead of after the batch has run.

Otherwise the folding is solid.

Comment thread examples/custom_data_converter/src/app.py
Comment thread examples/custom_data_converter/src/app.py
def deserialize(self, data: str | None, target_type: type | None = None) -> Any:
if data is None or data == "":
return None
if _is_model_type(target_type):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice "handle my types, delegate the rest" shape. One limitation worth documenting (or handling): this only intercepts when the model is the top-level target_type. For list[Model], dict[str, Model], or a dataclass field typed as a model, _is_model_type(...) is False, so it falls to the JsonDataConverter fallback, which reconstructs elements via type_ctor(value) -> Model(dict). Pydantic models reject a positional dict, so e.g. call_activity(..., return_type=list[OrderItem]) would raise TypeError, and inbound discovery of such an annotation silently passes raw dicts instead.

The example's own shapes avoid this (nested list[OrderItem] lives inside Order, handled by pydantic's own recursion), so it works — but since the README calls this "the recommended pattern for a real custom converter," it'd help to either note the top-level-generic limitation or recurse into generics that wrap a model.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — documented the limitation directly on PydanticDataConverter (a > [!NOTE] in the class docstring) and in the example README, and softened the "recommended pattern" wording to "a good starting point." The note explains that a model nested inside another model (like Order.items) round-trips via pydantic's own recursion (why the example works), but a model nested in a top-level generic the SDK rebuilds directly — return_type=list[OrderItem], an input annotated dict[str, OrderItem] — is not intercepted and falls to the default codec, and that a production converter would recurse into such generics (e.g. via pydantic.TypeAdapter). I kept the example's surface minimal on purpose so it stays focused on the seam rather than turning into a full pydantic integration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an interesting follow-up: turning these kinds of shapes into first-class recursion scenarios within DataConverter/JsonDataConverter instead of a recursive external function. But, probably outside the scope for now.

Comment thread durabletask/serialization.py
Comment thread durabletask/internal/type_discovery.py Outdated
@berndverst berndverst merged commit f6bfd44 into microsoft:main Jun 26, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants