Skip to content

Commit e038192

Browse files
GitHub #243: Sample-App Phase 3: Polyglot Reality (#520)
1 parent 0b10896 commit e038192

1 file changed

Lines changed: 137 additions & 0 deletions

File tree

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Polyglot Codec Round-Trip Contract
2+
3+
This document is the language-neutral contract for which payload values
4+
round-trip cleanly across the PHP and Python SDKs and which require an
5+
explicit codec adapter at the call site. It sits downstream of the SDK
6+
neutrality contract (`docs/architecture/sdk-neutrality.md`) and the
7+
codec-name advertisement rule (`codec_neutrality`), and is enforced by
8+
the platform conformance suite.
9+
10+
The `payload_codec` envelope tag on every wire payload identifies the
11+
codec used to encode the blob. The language-neutral v2 surface advertises
12+
one universal codec:
13+
14+
| Codec | Use |
15+
| --- | --- |
16+
| `avro` | Default for new v2 workflows and activities. The blob is a base64-encoded Avro generic-wrapper around a JSON document. |
17+
18+
Legacy PHP history can still name PHP-engine-specific codecs. Those
19+
codecs are exposed under `payload_codecs_engine_specific["php"]`, not
20+
in the universal `payload_codecs` list, so non-PHP workers are not
21+
required to decode PHP serializer payloads:
22+
23+
| Engine | Codec | Use |
24+
| --- | --- | --- |
25+
| `php` | `workflow-serializer-y` | Legacy PHP SerializableClosure payloads with byte-escape encoding. |
26+
| `php` | `workflow-serializer-base64` | Legacy PHP SerializableClosure payloads with base64 encoding. |
27+
28+
## Round-trip categories
29+
30+
The contract sorts every value that can appear on the wire into one of
31+
three categories. The category determines whether the value crosses the
32+
boundary unchanged, crosses with a documented loss, or requires an
33+
explicit adapter the workflow author writes before encode.
34+
35+
### Clean round-trip
36+
37+
These values are JSON-native in both languages and round-trip with
38+
identical observable behaviour:
39+
40+
| Wire shape | PHP type | Python type |
41+
| --- | --- | --- |
42+
| `null` | `null` | `None` |
43+
| `boolean` | `bool` | `bool` |
44+
| `integer` | `int` | `int` |
45+
| `number` | `float` | `float` |
46+
| `string` | `string` | `str` |
47+
| `array` | indexed `array<int, mixed>` | `list[Any]` |
48+
| `object` | associative `array<string, mixed>` | `dict[str, Any]` |
49+
50+
The Avro generic-wrapper accepts any JSON document built from this set.
51+
Both languages read and write the same `payload_codec: "avro"` envelope
52+
without further configuration.
53+
54+
### Round-trip with documented coercion
55+
56+
These values decode in both languages but to a different concrete type
57+
on the receiving side. Workflows that need the original concrete type
58+
must adapt the value back at the consumer.
59+
60+
| Producer | Wire shape | Consumer | Coercion |
61+
| --- | --- | --- | --- |
62+
| PHP `int` outside the JS-safe range (above 2^53-1) | JSON `number` | Python `int` | No loss in Python; PHP to Python preserves precision because Avro carries the integer in its wrapper. |
63+
| Python `IntEnum` / `StrEnum` | JSON scalar | PHP `int`/`string` | The receiver sees the raw scalar. Re-attach the enum class on the consumer side if it is significant. |
64+
| Python `Decimal` | JSON `string` (via `to_avro_payload_value`) | PHP `string` | The receiver must re-parse to its money/fixed-point type. |
65+
| Python `datetime` / `date` / `time` | ISO 8601 `string` | PHP `string` (parse with `Carbon`/`DateTimeImmutable`) | Time zone is preserved when the producer emits a tz-aware `datetime`; naive datetimes are wire-ambiguous and SHOULD be avoided. |
66+
| Python `UUID` | JSON `string` | PHP `string` | Parse on the consumer with `Ramsey\Uuid\Uuid::fromString()` or equivalent. |
67+
| Empty PHP `array` `[]` | JSON `[]` (always) | Python `list` | The PHP encoder always tags an empty `array` as a JSON list. Producers that need an empty mapping must encode `(object)[]` (`stdClass`) or `[]` typed as `array<string, mixed>` via an explicit adapter. |
68+
69+
### Requires an explicit adapter at the call site
70+
71+
These values are not Avro JSON payload safe. The producer MUST adapt
72+
them to a value in the clean round-trip set before encode, or the
73+
encoder raises:
74+
75+
- Python `dataclasses` instances (use `to_avro_payload_value`,
76+
`dataclasses.asdict`, or a hand-written serializer)
77+
- Python `attrs` classes (the SDK's `_attrs_payload_dict` helper covers
78+
them, but the producer is still opting in)
79+
- Python `pydantic` models (the SDK calls `model_dump(mode="json")`;
80+
any custom `to_dict` should match that contract)
81+
- Python `pendulum` values (convert with `.isoformat()`)
82+
- Python `bytes` / `bytearray` (encode as base64 `string` or split
83+
into a `dict` with explicit `encoding` and `data` fields)
84+
- Python `set` / `frozenset` (convert to a sorted `list`)
85+
- Python custom objects without a registered adapter
86+
- PHP objects that are not plain `stdClass` or arrays (the workflow
87+
package's serializer rejects them at the boundary; convert to an
88+
associative array before scheduling the activity or workflow)
89+
- PHP closures and resources (rejected unconditionally)
90+
- PHP `BackedEnum` values (convert to `->value` before scheduling)
91+
92+
A producer that does not adapt one of these values gets a synchronous
93+
`TypeError` (Python) or `WorkflowPayloadDecodeException` (PHP) at the
94+
call site. The error never crosses the worker protocol; the workflow
95+
never advances on an unadapted value. This is intentional: the codec
96+
boundary is the only place where the workflow author can choose how a
97+
language-specific shape is represented in durable history.
98+
99+
## Test surfaces
100+
101+
The round-trip contract is exercised in CI from three places. A change
102+
to any of the three SHOULD be co-landed with a change to the other two
103+
when it crosses category boundaries:
104+
105+
- `sdk-python``tests/test_serializer.py` covers Python encode/decode
106+
for every category and the producer-side rejection of unadapted
107+
values.
108+
- `sdk-python``tests/integration/test_polyglot.py` exercises real
109+
PHP↔Python interop through a running server and asserts the
110+
receiving language observes the documented coerced type.
111+
- The sample app (`sample-app`) `polyglot/` smoke runs two scenarios
112+
end to end against the standalone server: a Python-authored workflow
113+
on a separate Python image, and a protocol-driver workflow task that
114+
schedules activities handled by a separate Python worker. Both
115+
scenarios assert that activity arguments and results round-trip with
116+
the documented codec envelope. The smoke is wired into the sample-app
117+
`polyglot` GitHub Actions workflow on every push and pull request.
118+
119+
The sample app's polyglot smoke is a release gate alongside the
120+
sdk-python integration tests: a regression in either is a release
121+
blocker for both packages.
122+
123+
## Operator guidance
124+
125+
Operators of polyglot fleets SHOULD:
126+
127+
- Pin `avro` as the language-neutral `payload_codec` in namespace
128+
policy. Expose legacy PHP serializer codecs only through the
129+
engine-specific codec list when old PHP history still needs to drain.
130+
- Treat the `Requires an explicit adapter` set as a workflow-author
131+
contract, not a runtime fallback. The SDKs deliberately fail closed
132+
rather than guess at a serialisation for these values.
133+
- Audit search attributes and memos with the same categories. They
134+
cross the same payload boundary, and the same adapters apply.
135+
136+
A fuller worked example, with side-by-side PHP and Python snippets, is
137+
in the public docs under `polyglot/codec-roundtrip`.

0 commit comments

Comments
 (0)