You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#362: loud typed codec-mismatch errors at ingress and self-describing Avro export
Two pieces of the release-gating Avro parity surface:
1. Loud, typed codec-mismatch ingress errors
Add Workflow\Serializers\CodecDecodeException — a typed exception that
names the declared codec, what the decoder actually saw, and a
remediation hint. The Avro and Json codecs now throw it instead of a
generic RuntimeException when the bytes don't match the declared codec.
The ingress diagnosers cover the two most common cross-codec mistakes:
- JSON bytes labeled `avro` (producer JSON-encoded but tagged it Avro)
- Avro bytes labeled `json` (producer Avro-encoded but tagged it JSON)
In both cases the message names the codec, describes the symptom, and
tells the operator how to fix it ("change the codec tag to X" or
"re-encode the payload"). The wrapping ValidationException at the HTTP
ingress (PayloadEnvelopeResolver) preserves the typed message verbatim
so it shows up in API responses without losing context.
Covers acceptance criteria from the issue:
- "Sending JSON bytes under an `avro` codec tag produces a loud, typed
error — not silent garbage"
- "Sending Avro bytes under a `json` codec tag produces a loud, typed
error"
- "Avro decode failure names the codec, expected schema, and a
remediation — not a generic RuntimeException"
2. Self-describing Avro history-export bundles
HistoryExport bundles now include a top-level `codec_schemas` map. When
the bundle contains any payload encoded with the Avro codec (whether on
the run, commands, updates, signals, or tasks), the map embeds the
generic-wrapper schema JSON plus the documented prefix bytes (0x00
generic wrapper, 0x01 typed schema). An offline consumer reading the
export — without the workflow runtime in scope — can decode the
0x00-prefixed Avro payloads using only the bundle.
Bundles with no Avro payloads keep `codec_schemas: {}` so the field is
always present and shape-stable, simplifying consumer code.
Tests: 7 new ingress-mismatch unit tests; 2 new HistoryExport tests
covering the Avro-present and Avro-absent codec_schemas cases. All 130
tests across the changed surface pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
'Avro datum reader failed against schema "%s": %s',
181
+
$schemaName ?: 'inline',
182
+
$e->getMessage(),
183
+
),
184
+
'Verify the writer schema matches the bytes (resolution: writer→reader compatibility per Avro spec). If you intended a different schema, supply it via Avro::withSchema() before decoding.',
185
+
$e,
186
+
);
187
+
}
145
188
});
146
189
}
147
190
@@ -175,29 +218,95 @@ private static function decodeWrapped(string $data): mixed
175
218
returnself::suppressDeprecations(function () use ($data): mixed {
176
219
$bytes = base64_decode($data, true);
177
220
if ($bytes === false) {
178
-
thrownewRuntimeException('Failed to base64-decode Avro payload.');
221
+
self::failWithIngressDiagnosis($data);
179
222
}
180
223
181
224
$io = newAvroStringIO($bytes);
182
225
183
226
// Read prefix
184
227
$prefix = $io->read(1);
185
228
if ($prefix === "\x01") {
186
-
thrownewRuntimeException('Typed Avro payload requires a schema context. Call Avro::withSchema() before unserialize().');
229
+
thrownewCodecDecodeException(
230
+
'avro',
231
+
'Typed Avro payload (prefix 0x01) decoded without a schema context.',
232
+
'Call Avro::withSchema($writerSchema) before unserialize() so the typed payload can be read with its writer schema.',
'These bytes were not produced by Workflow\\Serializers\\Avro::serialize(). Re-encode the payload with the Avro codec, or change the codec tag if the producer used a different codec.',
'Re-encode the payload with the Avro codec (generic wrapper produces a JSON-string-inside-Avro envelope), or change the codec tag if the producer used a different codec.',
260
+
$e,
261
+
);
262
+
}
198
263
});
199
264
}
200
265
266
+
/**
267
+
* Diagnose why the bytes labeled as Avro could not be base64-decoded
268
+
* and throw a {@see CodecDecodeException} with the most actionable hint.
269
+
*
270
+
* The most common ingress mistakes are:
271
+
* - A producer JSON-encoded the payload but tagged it `avro`. The bytes
272
+
* will start with a JSON character ({, [, ", -, digit, t, f, n) and
273
+
* base64_decode() in strict mode rejects them outright.
274
+
* - A producer sent raw binary Avro bytes without base64-encoding them.
275
+
*/
276
+
privatestaticfunctionfailWithIngressDiagnosis(string$data): never
277
+
{
278
+
if (self::looksLikeJson($data)) {
279
+
thrownewCodecDecodeException(
280
+
'avro',
281
+
'Payload bytes look like JSON, not base64-encoded Avro.',
282
+
'The producer appears to have JSON-encoded the payload but tagged it with codec "avro". Either change the codec tag to "json", or re-encode the payload with Workflow\\Serializers\\Avro::serialize() before tagging it "avro".',
283
+
);
284
+
}
285
+
286
+
thrownewCodecDecodeException(
287
+
'avro',
288
+
'Failed to base64-decode Avro payload bytes.',
289
+
'Avro payloads on the wire must be base64-encoded bytes whose first byte is 0x00 (generic wrapper) or 0x01 (typed schema). Re-encode the payload, or change the codec tag if the producer used a different codec.',
thrownewRuntimeException('Failed to JSON-decode payload: ' . $e->getMessage(), 0, $e);
62
+
if (self::looksLikeBase64Avro($data)) {
63
+
thrownewCodecDecodeException(
64
+
'json',
65
+
'Payload bytes look like base64-encoded Avro, not JSON: ' . $e->getMessage(),
66
+
'The blob is valid base64 starting with an Avro framing prefix (0x00 generic wrapper or 0x01 typed schema). Either change the codec tag to "avro", or re-encode the payload as JSON.',
67
+
$e,
68
+
);
69
+
}
70
+
71
+
thrownewCodecDecodeException(
72
+
'json',
73
+
'Failed to JSON-decode payload: ' . $e->getMessage(),
74
+
'Re-encode the payload as valid UTF-8 JSON (RFC 8259), or change the codec tag if a different codec produced these bytes.',
75
+
$e,
76
+
);
77
+
}
78
+
}
79
+
80
+
/**
81
+
* Heuristic: do these bytes look like base64-encoded Avro?
82
+
*
83
+
* The cheapest reliable check: pure base64 alphabet, base64_decode in
84
+
* strict mode succeeds, and the first decoded byte is 0x00 (generic
85
+
* Avro wrapper) or 0x01 (typed Avro schema). JSON ASCII text never
86
+
* decodes to bytes leading with 0x00/0x01 because base64 alphabet
87
+
* cannot represent control characters in source form, so this check
88
+
* has effectively no false positives on misformatted JSON.
0 commit comments