Status: Draft
Author: (Durable Functions team)
Target repo: Azure/azure-functions-python-library
Target files: azure/functions/_durable_functions.py, azure/functions/durable_functions.py
Upstream consumer: Azure/azure-functions-durable-python (will drop its local df_serialization.py and import from here)
The _serialize_custom_object / _deserialize_custom_object pair in
azure/functions/_durable_functions.py is the canonical codec for Durable
Functions custom-object payloads. Today every call site (the Durable SDK's
action classes, context objects, entity state, and the activity trigger
converters in azure/functions/durable_functions.py) independently calls
json.dumps(..., default=_serialize_custom_object) /
json.loads(..., object_hook=_deserialize_custom_object).
Problems:
-
No type validation.
_deserialize_custom_objectunconditionally callsimportlib.import_module(module_name)on whatever__module__string is in the payload — there is no check that the declared class matches what the caller expects. -
Security concern. An attacker-controlled payload can embed arbitrary
__module__/__class__values, causingimport_moduleto load any installed module and call itsfrom_json. This is especially relevant forActivityTriggerConverter.decodeandLegacyActivityTriggerConverter.decodewhich deserialize inbound trigger data with no gating. -
No single entry point. Serialization logic is scattered; adding cross-cutting behavior (logging, validation, strict mode) requires touching every call site.
Add two public functions to azure/functions/_durable_functions.py:
def df_dumps(value: Any) -> str:
"""Serialize *value* to a JSON string using the Durable Functions convention.
In loose mode (default), equivalent to
``json.dumps(value, default=_serialize_custom_object)``.
In strict mode, the top-level custom object is wrapped in the
legacy envelope but its ``to_json()`` output is serialized as
plain JSON (no ``default=`` hook). Primitives and plain
containers are also serialized without ``default=``.
A ``TypeError`` is raised if any nested value is not natively
JSON-serializable.
"""
def df_loads(
s: str,
expected_type: Optional[type] = None,
) -> Any:
"""Deserialize a JSON string, optionally validating against *expected_type*.
When *expected_type* is None, behaves identically to
``json.loads(s, object_hook=_deserialize_custom_object)``.
When *expected_type* is provided, the raw JSON is parsed first
(without ``object_hook``) so the payload can be inspected before
``import_module`` fires. Behavior then depends on the typing mode:
* **Loose mode** (default) — logs a warning on type mismatch, then
falls through to the legacy ``object_hook`` path.
* **Strict mode** — raises ``TypeError`` on mismatch. For custom-
object payloads, calls ``expected_type.from_json(raw["__data__"])``
directly without ``import_module``. Opted in via the
``AZURE_FUNCTIONS_DURABLE_STRICT_TYPING`` environment variable.
"""No change. df_dumps produces the same JSON that
json.dumps(value, default=_serialize_custom_object) produces today:
- Builtins → plain JSON
- Custom objects with
to_json→{"__class__": ..., "__module__": ..., "__data__": ...} - Nested custom objects are recursively wrapped via the
default=hook
df_dumps in strict mode:
- Top-level custom objects with
to_json→ same{"__class__", "__module__", "__data__"}envelope - But
__data__is the plain JSON output ofto_json()— serialized withoutdefault=_serialize_custom_object - Primitives and plain containers → serialized without
default= - Any nested value that is not natively JSON-serializable →
TypeErrorat encode time
This means to_json() must produce a value that json.dumps can handle
natively. Nested custom objects must be serialized explicitly inside
to_json() (e.g., call Hat.to_json(self.hat) rather than returning
self.hat directly). This is a deliberate breaking change for strict
mode — it ensures no __module__ strings reach storage at nested levels,
eliminating the import_module attack surface entirely.
Behaves identically to json.loads(s, object_hook=_deserialize_custom_object).
This is the backward-compatible path for call sites that have no type info.
Parses without object_hook so import_module is never called. If the
top-level value is a legacy custom-object dict, raises TypeError — the caller
must supply an expected_type to deserialize custom objects in strict mode.
Primitive / plain-JSON payloads are returned as-is (no security risk, no
import_module involved).
This ensures that enabling strict mode surfaces every untyped call site as a
loud failure rather than silently falling through to import_module.
- Parse
swith plainjson.loads(s)(noobject_hook) →raw. - If
rawis a legacy custom-object dict ({"__class__", "__module__", "__data__"}⊆ keys): a. Compareraw["__class__"]/raw["__module__"]againstexpected_type.__name__/expected_type.__module__. b. On mismatch →logger.warning(...). c. Fall through tojson.loads(s, object_hook=_deserialize_custom_object)(legacy behavior preserved). - If
rawis a primitive/plain-JSON value: a. Best-effortisinstance(raw, expected_type)check (tolerateTypeErrorfortypinggenerics). b. On mismatch →logger.warning(...). c. Fall through tojson.loads(s, object_hook=_deserialize_custom_object)so nested custom objects in dicts/lists are still reconstructed.
Opted in by setting AZURE_FUNCTIONS_DURABLE_STRICT_TYPING to 1, true, or yes.
- Parse
swith plainjson.loads(s)(noobject_hook) →raw. - If
rawis a legacy custom-object dict: a. Compare class/module as above. b. On mismatch →raise TypeError(...). c. Verifyexpected_typehas callablefrom_json; if not →raise TypeError(...). d. Returnexpected_type.from_json(raw["__data__"])—import_moduleis never called. e. Becausedf_dumpsin strict mode produces plain-JSON__data__(no nested envelopes),from_jsonreceives clean data. If consuming legacy (loose-encoded) payloads, nested{"__class__", ...}dicts may still appear —from_jsonshould handle both shapes. - If
rawis a primitive/plain-JSON value: a.isinstancecheck as above. b. On mismatch →raise TypeError(...). c. Returnrawdirectly (noobject_hookpass).
| Variable | Values | Default |
|---|---|---|
AZURE_FUNCTIONS_DURABLE_STRICT_TYPING |
1, true, yes (case-insensitive, stripped) |
unset = loose mode |
This is the same env var already used by the Durable SDK's interim implementation.
The Durable SDK currently imports _serialize_custom_object directly for
OrchestratorState.to_json_string (which builds its own json.dumps call).
To avoid reaching into private names, also expose:
def _get_serialize_default() -> Callable:
"""Return the ``default`` callback for ``json.dumps``.
For use in call sites that build their own ``json.dumps`` invocation
(e.g. ``OrchestratorState.to_json_string``).
"""
return _serialize_custom_objectAlternatively, if the preference is to keep the public surface minimal, the
Durable SDK can continue importing _serialize_custom_object directly — it
already does so today.
Both converters currently do:
callback = _durable_functions._deserialize_custom_object
result = json.loads(data.value, object_hook=callback)Change to:
result = _durable_functions.df_loads(data.value)This is behavior-identical in the default (no expected_type) case. A future
enhancement could pass the activity function's input type annotation as
expected_type if the converter framework makes it available (see §6).
Both converters currently do:
callback = _durable_functions._serialize_custom_object
result = json.dumps(obj, default=callback)Change to:
result = _durable_functions.df_dumps(obj)The converters' existing try/except json.JSONDecodeError and
try/except TypeError blocks remain unchanged — they wrap the df_loads /
df_dumps calls exactly as they wrap the current json.loads / json.dumps
calls.
All new code goes in azure/functions/_durable_functions.py, next to the
existing _serialize_custom_object / _deserialize_custom_object functions.
The existing functions remain for backward compatibility (they are still called
internally by df_loads in loose mode).
Use logging.getLogger("azure.functions._durable_functions").
The Durable SDK's interim implementation is at:
azure-functions-durable-python/azure/durable_functions/models/utils/df_serialization.py
(branch: the PR that adds this spec)
That file is ~180 lines and contains the complete logic for df_dumps,
df_loads, _loads_with_expected_type, _is_strict_mode, _is_legacy_custom_dict,
_has_json_protocol, and _is_compatible. The implementation should be
moved here essentially verbatim, with the only difference being that
_serialize_custom_object and _deserialize_custom_object are local
rather than imported.
In strict mode, the to_json / from_json contract is symmetric:
to_json()must return a value that is natively JSON-serializable — dicts, lists, strings, numbers, bools, None. Nested custom objects must be serialized explicitly (e.g., callHat.to_json(self.hat)).
from_json(data)receives exactly whatto_json()produced — plain JSON data with no{"__class__", "__module__", "__data__"}markers at any nesting level. Reconstruct nested objects using theirfrom_json.
Example:
class Order:
@staticmethod
def to_json(obj):
return {
"item": obj.item,
"hat": Hat.to_json(obj.hat), # explicit — not obj.hat
}
@staticmethod
def from_json(data):
return Order(
item=data["item"],
hat=Hat.from_json(data["hat"]), # symmetric
)If the application may also receive legacy (loose-encoded) payloads during
a rollout, from_json can check for both shapes:
@staticmethod
def from_json(data):
hat_data = data["hat"]
if isinstance(hat_data, Hat):
hat = hat_data # loose mode: object_hook already fired
else:
hat = Hat.from_json(hat_data) # strict mode: plain dict
return Order(item=data["item"], hat=hat)-
Pass
expected_typeinto converterdecode: TheInConverter.decodeinterface currently only receivesdataandtrigger_metadata. Adding an optionalexpected_typekwarg (sourced from the function's parameter type annotation) would letActivityTriggerConverter.decodecalldf_loads(data.value, expected_type=pytype)— closing the last unprotectedimport_modulepath. This requires a change to the worker's converter dispatch inazure-functions-python-worker. -
Deprecate direct use of
_serialize_custom_object/_deserialize_custom_object: Oncedf_dumps/df_loadsare available, the underscore-prefixed functions become internal implementation details.
df_dumpsround-trips — primitives, custom objects withto_json, nested structures.df_loadswithout expected_type (loose) — identical to legacyjson.loads(s, object_hook=...).df_loadswithout expected_type (strict), primitive payload — returns raw value, noimport_module.df_loadswithout expected_type (strict), custom-object payload — raisesTypeError.df_loadsloose mode with matching type — no warning, correct object returned.df_loadsloose mode with mismatched type — warning logged, legacy path still runs.df_loadsstrict mode with matching type —from_jsoncalled directly,import_modulenever called.df_loadsstrict mode with mismatched type —TypeErrorraised.df_loadsstrict mode, type lacksfrom_json—TypeErrorraised.df_loadswith primitive payload and expected_type — isinstance validation.df_loadswithtypinggenerics as expected_type — no crash (isinstance tolerance).- Converter integration —
ActivityTriggerConverter.decode/.encodeusedf_loads/df_dumps.
The Durable SDK has 101 round-trip / validation tests in
tests/utils/test_df_serialization.py that exercise the exact same logic.
These can serve as a reference / be ported.
-
This PR (functions SDK): Add
df_dumps,df_loadsto_durable_functions.py. Update the four converter methods to use them. Ship as a new patch/minor. -
Same PR (Durable SDK) — updated after functions SDK ships: Remove
azure/durable_functions/models/utils/df_serialization.py. Change all imports from.utils.df_serializationtoazure.functions._durable_functions. Addazure-functions>=<new_version>toinstall_requires. This is the same in-progress Durable SDK PR — it will be rebased to consume the functions SDK's definitive implementation before merging. -
Future (worker): Extend
InConverter.decodeto acceptexpected_type, enablingActivityTriggerConverterto pass the function's input annotation through todf_loads.