|
| 1 | +# Polyglot Codec Round-Trip Contract |
| 2 | + |
| 3 | +This document is the language-neutral contract for which payload values |
| 4 | +round-trip cleanly across the PHP and Python SDKs and which require an |
| 5 | +explicit codec adapter at the call site. It sits downstream of the SDK |
| 6 | +neutrality contract (`docs/architecture/sdk-neutrality.md`) and the |
| 7 | +codec-name advertisement rule (`codec_neutrality`), and is enforced by |
| 8 | +the platform conformance suite. |
| 9 | + |
| 10 | +The `payload_codec` envelope tag on every wire payload identifies the |
| 11 | +codec used to encode the blob. The language-neutral v2 surface advertises |
| 12 | +one universal codec: |
| 13 | + |
| 14 | +| Codec | Use | |
| 15 | +| --- | --- | |
| 16 | +| `avro` | Default for new v2 workflows and activities. The blob is a base64-encoded Avro generic-wrapper around a JSON document. | |
| 17 | + |
| 18 | +Legacy PHP history can still name PHP-engine-specific codecs. Those |
| 19 | +codecs are exposed under `payload_codecs_engine_specific["php"]`, not |
| 20 | +in the universal `payload_codecs` list, so non-PHP workers are not |
| 21 | +required to decode PHP serializer payloads: |
| 22 | + |
| 23 | +| Engine | Codec | Use | |
| 24 | +| --- | --- | --- | |
| 25 | +| `php` | `workflow-serializer-y` | Legacy PHP SerializableClosure payloads with byte-escape encoding. | |
| 26 | +| `php` | `workflow-serializer-base64` | Legacy PHP SerializableClosure payloads with base64 encoding. | |
| 27 | + |
| 28 | +## Round-trip categories |
| 29 | + |
| 30 | +The contract sorts every value that can appear on the wire into one of |
| 31 | +three categories. The category determines whether the value crosses the |
| 32 | +boundary unchanged, crosses with a documented loss, or requires an |
| 33 | +explicit adapter the workflow author writes before encode. |
| 34 | + |
| 35 | +### Clean round-trip |
| 36 | + |
| 37 | +These values are JSON-native in both languages and round-trip with |
| 38 | +identical observable behaviour: |
| 39 | + |
| 40 | +| Wire shape | PHP type | Python type | |
| 41 | +| --- | --- | --- | |
| 42 | +| `null` | `null` | `None` | |
| 43 | +| `boolean` | `bool` | `bool` | |
| 44 | +| `integer` | `int` | `int` | |
| 45 | +| `number` | `float` | `float` | |
| 46 | +| `string` | `string` | `str` | |
| 47 | +| `array` | indexed `array<int, mixed>` | `list[Any]` | |
| 48 | +| `object` | associative `array<string, mixed>` | `dict[str, Any]` | |
| 49 | + |
| 50 | +The Avro generic-wrapper accepts any JSON document built from this set. |
| 51 | +Both languages read and write the same `payload_codec: "avro"` envelope |
| 52 | +without further configuration. |
| 53 | + |
| 54 | +### Round-trip with documented coercion |
| 55 | + |
| 56 | +These values decode in both languages but to a different concrete type |
| 57 | +on the receiving side. Workflows that need the original concrete type |
| 58 | +must adapt the value back at the consumer. |
| 59 | + |
| 60 | +| Producer | Wire shape | Consumer | Coercion | |
| 61 | +| --- | --- | --- | --- | |
| 62 | +| PHP `int` outside the JS-safe range (above 2^53-1) | JSON `number` | Python `int` | No loss in Python; PHP to Python preserves precision because Avro carries the integer in its wrapper. | |
| 63 | +| Python `IntEnum` / `StrEnum` | JSON scalar | PHP `int`/`string` | The receiver sees the raw scalar. Re-attach the enum class on the consumer side if it is significant. | |
| 64 | +| Python `Decimal` | JSON `string` (via `to_avro_payload_value`) | PHP `string` | The receiver must re-parse to its money/fixed-point type. | |
| 65 | +| Python `datetime` / `date` / `time` | ISO 8601 `string` | PHP `string` (parse with `Carbon`/`DateTimeImmutable`) | Time zone is preserved when the producer emits a tz-aware `datetime`; naive datetimes are wire-ambiguous and SHOULD be avoided. | |
| 66 | +| Python `UUID` | JSON `string` | PHP `string` | Parse on the consumer with `Ramsey\Uuid\Uuid::fromString()` or equivalent. | |
| 67 | +| Empty PHP `array` `[]` | JSON `[]` (always) | Python `list` | The PHP encoder always tags an empty `array` as a JSON list. Producers that need an empty mapping must encode `(object)[]` (`stdClass`) or `[]` typed as `array<string, mixed>` via an explicit adapter. | |
| 68 | + |
| 69 | +### Requires an explicit adapter at the call site |
| 70 | + |
| 71 | +These values are not Avro JSON payload safe. The producer MUST adapt |
| 72 | +them to a value in the clean round-trip set before encode, or the |
| 73 | +encoder raises: |
| 74 | + |
| 75 | +- Python `dataclasses` instances (use `to_avro_payload_value`, |
| 76 | + `dataclasses.asdict`, or a hand-written serializer) |
| 77 | +- Python `attrs` classes (the SDK's `_attrs_payload_dict` helper covers |
| 78 | + them, but the producer is still opting in) |
| 79 | +- Python `pydantic` models (the SDK calls `model_dump(mode="json")`; |
| 80 | + any custom `to_dict` should match that contract) |
| 81 | +- Python `pendulum` values (convert with `.isoformat()`) |
| 82 | +- Python `bytes` / `bytearray` (encode as base64 `string` or split |
| 83 | + into a `dict` with explicit `encoding` and `data` fields) |
| 84 | +- Python `set` / `frozenset` (convert to a sorted `list`) |
| 85 | +- Python custom objects without a registered adapter |
| 86 | +- PHP objects that are not plain `stdClass` or arrays (the workflow |
| 87 | + package's serializer rejects them at the boundary; convert to an |
| 88 | + associative array before scheduling the activity or workflow) |
| 89 | +- PHP closures and resources (rejected unconditionally) |
| 90 | +- PHP `BackedEnum` values (convert to `->value` before scheduling) |
| 91 | + |
| 92 | +A producer that does not adapt one of these values gets a synchronous |
| 93 | +`TypeError` (Python) or `WorkflowPayloadDecodeException` (PHP) at the |
| 94 | +call site. The error never crosses the worker protocol; the workflow |
| 95 | +never advances on an unadapted value. This is intentional: the codec |
| 96 | +boundary is the only place where the workflow author can choose how a |
| 97 | +language-specific shape is represented in durable history. |
| 98 | + |
| 99 | +## Test surfaces |
| 100 | + |
| 101 | +The round-trip contract is exercised in CI from three places. A change |
| 102 | +to any of the three SHOULD be co-landed with a change to the other two |
| 103 | +when it crosses category boundaries: |
| 104 | + |
| 105 | +- `sdk-python` — `tests/test_serializer.py` covers Python encode/decode |
| 106 | + for every category and the producer-side rejection of unadapted |
| 107 | + values. |
| 108 | +- `sdk-python` — `tests/integration/test_polyglot.py` exercises real |
| 109 | + PHP↔Python interop through a running server and asserts the |
| 110 | + receiving language observes the documented coerced type. |
| 111 | +- The sample app (`sample-app`) `polyglot/` smoke runs two scenarios |
| 112 | + end to end against the standalone server: a Python-authored workflow |
| 113 | + on a separate Python image, and a protocol-driver workflow task that |
| 114 | + schedules activities handled by a separate Python worker. Both |
| 115 | + scenarios assert that activity arguments and results round-trip with |
| 116 | + the documented codec envelope. The smoke is wired into the sample-app |
| 117 | + `polyglot` GitHub Actions workflow on every push and pull request. |
| 118 | + |
| 119 | +The sample app's polyglot smoke is a release gate alongside the |
| 120 | +sdk-python integration tests: a regression in either is a release |
| 121 | +blocker for both packages. |
| 122 | + |
| 123 | +## Operator guidance |
| 124 | + |
| 125 | +Operators of polyglot fleets SHOULD: |
| 126 | + |
| 127 | +- Pin `avro` as the language-neutral `payload_codec` in namespace |
| 128 | + policy. Expose legacy PHP serializer codecs only through the |
| 129 | + engine-specific codec list when old PHP history still needs to drain. |
| 130 | +- Treat the `Requires an explicit adapter` set as a workflow-author |
| 131 | + contract, not a runtime fallback. The SDKs deliberately fail closed |
| 132 | + rather than guess at a serialisation for these values. |
| 133 | +- Audit search attributes and memos with the same categories. They |
| 134 | + cross the same payload boundary, and the same adapters apply. |
| 135 | + |
| 136 | +A fuller worked example, with side-by-side PHP and Python snippets, is |
| 137 | +in the public docs under `polyglot/codec-roundtrip`. |
0 commit comments