Skip to content

Commit 532479d

Browse files
authored
Add type-aware custom object serialization (#154)
* Add secure, type-aware custom object serialization Rewrite the JSON codec in shared.py to emit plain JSON (no internal type marker) and add type-directed deserialization via an optional expected_type. Custom objects round-trip everywhere: - call_activity/call_sub_orchestrator/call_entity gain return_type; wait_for_external_event gains data_type; these also refine the returned task's static type via overloads. - Inbound payloads (orchestrator/activity/entity inputs) and call_activity results are reconstructed from function type annotations (new internal type_discovery module), best-effort and conservative. - Entity get_state and new client OrchestrationState.get_input/get_output/get_custom_status accessors route through the shared codec. - Fix nested-dataclass round-trip bug; chain serialization errors with the original cause. Legacy AUTO_SERIALIZED payloads still deserialize for in-flight replay. * PR Feedback (DataConverter pattern, small fixes) * Standardize codec, entity fixes * Fix history export serialization quirk * Best-effort breaking change fixes, update changelog * Update example and docs for type-directed deserialization * Address PR feedback
1 parent c115409 commit 532479d

24 files changed

Lines changed: 2109 additions & 239 deletions

CHANGELOG.md

Lines changed: 74 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,80 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

88
## Unreleased
99

10-
N/A
10+
ADDED
11+
12+
- Added a pluggable `DataConverter` (`durabletask.serialization`) accepted by
13+
`TaskHubGrpcWorker`, `TaskHubGrpcClient`, and `AsyncTaskHubGrpcClient` via a
14+
`data_converter` argument. Every payload boundary (inputs, outputs, events,
15+
custom status, entity state) routes through it. The default
16+
`JsonDataConverter` preserves existing behavior, so a custom converter (for
17+
example one backed by pydantic) is opt-in. Custom objects can opt in via a
18+
`to_json()` hook and a `from_json(value)` classmethod.
19+
- `OrchestrationContext.call_activity`, `call_sub_orchestrator`, and
20+
`call_entity` accept an optional `return_type`, and `wait_for_external_event`
21+
accepts an optional `data_type`. When provided, the result/event payload is
22+
reconstructed as that type (dataclasses — including nested dataclass,
23+
`Optional`, and `list` fields — and `from_json()`-capable types) and the
24+
returned task is typed accordingly (e.g. `call_activity(..., return_type=Foo)`
25+
yields `CompletableTask[Foo]`). When omitted, the raw deserialized JSON is
26+
returned as before.
27+
- Inbound payloads are reconstructed from function type annotations. When an
28+
orchestrator, activity, or entity operation annotates its input parameter (or
29+
an activity its return value) with a dataclass or `from_json()`-capable type,
30+
the payload is reconstructed as that type. Builtins and unannotated/unknown
31+
types are passed through unchanged. An explicit `return_type` takes precedence
32+
over a discovered annotation.
33+
- Added typed accessors to `client.OrchestrationState`: `get_input()`,
34+
`get_output()`, and `get_custom_status()` each accept an optional
35+
`expected_type` and deserialize the corresponding payload, reconstructing
36+
dataclasses and `from_json()`-capable types. The raw `serialized_*` fields are
37+
retained.
38+
- Objects exposing a `to_json()` method are now JSON-serializable when passed as
39+
activity/orchestrator inputs or outputs.
40+
- Added `EntityMetadata.get_typed_state(intended_type=...)`, which deserializes
41+
the entity's persisted state and reconstructs dataclasses and
42+
`from_json()`-capable types. The existing `get_state()` is unchanged: with no
43+
argument it returns the raw serialized JSON payload, and `get_state(some_type)`
44+
applies constructor-based coercion (`some_type(raw)`).
45+
- Entity runtime state retrieval (`EntityContext.get_state(intended_type=...)` /
46+
`DurableEntity.get_state(...)`) now also reconstructs dataclasses and
47+
`from_json()`-capable types, in addition to the existing constructor-based
48+
coercion.
49+
50+
CHANGED
51+
52+
- Custom objects (dataclasses, `SimpleNamespace`, namedtuples) are now
53+
serialized as plain JSON. Decoding such a payload *without* a type hint now
54+
yields a plain `dict` (previously a `SimpleNamespace`; a namedtuple now
55+
round-trips as a JSON array). To get the original type back, pass the new
56+
`return_type` / `data_type` arguments, annotate the consuming function's
57+
parameter or return type, or use the typed client accessors. Payloads produced
58+
by older SDK versions still deserialize — including into a `SimpleNamespace`
59+
when no type is supplied — so in-flight orchestrations continue to replay
60+
across an upgrade.
61+
- JSON serialization failures now raise a `TypeError` that chains the original
62+
error (`__cause__`) and names the offending type.
63+
64+
FIXED
65+
66+
- Falsy entity states (`0`, `""`, `[]`, `{}`) are no longer dropped when an
67+
entity batch is persisted. Previously a falsy current state was treated as
68+
"no state" and written as `None`, effectively deleting it; only an actual
69+
`None` state now clears the persisted entity state.
70+
71+
BREAKING CHANGES (type-level only — no runtime impact for typical users)
72+
73+
These changes do not alter runtime behavior, but because the package ships
74+
`py.typed`, consumers running strict type checkers (pyright/mypy) — or
75+
subclassing the public abstract types — may need to update their code:
76+
77+
- `OrchestrationContext.call_activity`, `call_sub_orchestrator`, `call_entity`,
78+
and `wait_for_external_event` gained new keyword-only parameters
79+
(`return_type` / `data_type`). Subclasses overriding these methods should add
80+
the parameter to match the base signature.
81+
- `client.OrchestrationState` gained a non-public `_data_converter` field
82+
(excluded from equality and `repr`). Code constructing `OrchestrationState`
83+
positionally should pass it via the new field or rely on its default.
1184

1285
## v1.6.0
1386

docs/supported-patterns.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,8 @@ def purchase_order_workflow(ctx: task.OrchestrationContext, order: Order):
6868
yield ctx.call_activity(send_approval_request, input=order)
6969

7070
# Approvals must be received within 24 hours or they will be cancelled.
71-
approval_event = ctx.wait_for_external_event("approval_received")
71+
# Passing ``data_type`` reconstructs the event payload as an ``Approval``.
72+
approval_event = ctx.wait_for_external_event("approval_received", data_type=Approval)
7273
timeout_event = ctx.create_timer(timedelta(hours=24))
7374
winner = yield task.when_any([approval_event, timeout_event])
7475
if winner == timeout_event:
@@ -81,9 +82,11 @@ def purchase_order_workflow(ctx: task.OrchestrationContext, order: Order):
8182
```
8283

8384
As an aside, you'll also notice that the example orchestration above works with custom business
84-
objects. Support for custom business objects includes support for custom classes, custom data
85-
classes, and named tuples. Serialization and deserialization of these objects is handled
86-
automatically by the SDK.
85+
objects. Custom classes, data classes, and named tuples are serialized to plain JSON automatically.
86+
To reconstruct the original type on the receiving side, supply the type — for example via the
87+
`data_type` argument to `wait_for_external_event` (shown above), the `return_type` argument to
88+
`call_activity` / `call_sub_orchestrator` / `call_entity`, or by annotating the consuming function's
89+
input parameter. Without a type, the payload is returned as plain JSON (a `dict` or `list`).
8790

8891
See the full [human interaction sample](../examples/human_interaction.py).
8992

0 commit comments

Comments
 (0)