You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a raw-DEFLATE or zlib stream is followed by any trailing bytes that are not themselves a valid follow-on stream, transcode/the decompressor stream throws a ZlibError (e.g. invalid literal/length code (code: -3)), after the complete and correct output has already been produced.
This happens because, on Z_STREAM_END, the codec resets the inflate state and attempts to decode the remaining input as a concatenated stream. That behavior is required for gzip (RFC 1952 — a gzip file is "a series of members") and CodecZlib handles concatenated gzip correctly. But for raw DEFLATE (RFC 1951) and zlib (RFC 1950), concatenation is not part of the format, so trailing bytes that aren't a valid next stream turn into a hard error — whereas the underlying zlib C API and other bindings simply stop at Z_STREAM_END and leave the extra bytes unconsumed.
Minimal reproducer (no external data)
using CodecZlib
payload =b"The quick brown fox jumps over the lazy dog."
raw =transcode(DeflateCompressor, payload) # raw DEFLATE (windowbits = -15)transcode(DeflateDecompressor, vcat(raw, UInt8[0x00])) # one extra 0x00 byte# ERROR: ZlibError: the compressed stream may be truncatedtranscode(DeflateDecompressor, vcat(raw, UInt8[0xf3,0x1b,0x33]))
# ERROR: ZlibError: invalid literal/length code (code: -3)transcode(DeflateDecompressor, vcat(raw, UInt8[0xff,0xff,0xff]))
# ERROR: ZlibError: invalid block type (code: -3)
A single trailing byte is enough. The error message varies with the trailing bytes (truncated, invalid literal/length code, invalid block type), because the codec is trying to interpret them as the start of a new DEFLATE stream.
GzipDecompressor, by contrast, correctly decodes genuinely concatenated members:
We were handed a file described as "compressed via zlib." It turned out to be a zlib stream with its 2-byte header stripped and its 4-byte Adler-32 trailer truncated to 3 bytes, i.e. [deflate body][first 3 of 4 Adler-32 bytes]. Decoding it as raw DEFLATE recovers the payload everywhere — but CodecZlib then tried to decode the 3 orphaned Adler-32 bytes as a concatenated stream and threw invalid literal/length code. The data emitted before the throw was complete and correct (reading byte-by-byte from a DeflateDecompressorStream yields the full output and only throws at EOF).
Comparison to the standards
RFC 1951 (DEFLATE): defines a single sequence of blocks ending at the BFINAL block. No concatenation, no trailer, no framing for "what follows."
RFC 1950 (zlib): defines a single stream = 2-byte header + DEFLATE + 4-byte Adler-32. No concatenation in the spec.
RFC 1952 (gzip): §2.2 explicitly: a file "consists of a series of members." Concatenation is required, and CodecZlib's gzip handling is correct.
So the reset-and-continue policy is well-founded for gzip but has no standards basis for raw DEFLATE / zlib.
Comparison to other implementations
zlib C API:inflate() returns Z_STREAM_END when the final block is consumed and does not loop on its own; decoding another concatenated stream requires an explicit inflateReset(). Trailing bytes are simply left unconsumed (avail_in > 0), not treated as an error.
Python (zlib): stops at the stream end and exposes the remainder rather than erroring:
>>>importzlib>>>payload=b"The quick brown fox jumps over the lazy dog.">>>raw=zlib.compress(payload)[2:-4] # raw deflate body>>>d=zlib.decompressobj(-15)
>>>d.decompress(raw+b"\xf3\x1b\x33") ==payloadTrue>>>d.unused_datab'\xf3\x1b3'# trailing bytes surfaced, no error
Direct Zlib_jllinflate call (single call, windowBits = -15) on the same data returns Z_STREAM_END with the full output and the trailing bytes simply unconsumed — i.e. the bug is not in zlib, it's in the reset-and-continue driving.
Questions / possible resolutions
Should DeflateDecompressor and ZlibDecompressor attempt concatenated-stream decoding at all, or should that be limited to GzipDecompressor (where the spec mandates it)?
If concatenated raw-DEFLATE/zlib decoding is intentionally supported, could trailing bytes that don't begin a valid stream be treated as end-of-data (stop at Z_STREAM_END) rather than a hard error — at least optionally?
Relatedly, is there a supported way to stop exactly at the first Z_STREAM_END and recover the unconsumed bytes (cf. zlib's avail_in / Python's unused_data)? This is the same need raised in Find end of Zlib stream #4.
Happy to open a PR if there's a preferred direction.
Versions
CodecZlib.jl 0.7.8
TranscodingStreams.jl 0.11.3
Zlib_jll 1.3.1+2 (zlib runtime 1.3.1)
Julia 1.12.6
Investigation and reproducers prepared with Claude Code (Claude Opus 4.8).
DeflateDecompressor/ZlibDecompressorerror on trailing bytes after a complete stream (concatenated-stream policy)Summary
When a raw-DEFLATE or zlib stream is followed by any trailing bytes that are not themselves a valid follow-on stream,
transcode/the decompressor stream throws aZlibError(e.g.invalid literal/length code (code: -3)), after the complete and correct output has already been produced.This happens because, on
Z_STREAM_END, the codec resets the inflate state and attempts to decode the remaining input as a concatenated stream. That behavior is required for gzip (RFC 1952 — a gzip file is "a series of members") and CodecZlib handles concatenated gzip correctly. But for raw DEFLATE (RFC 1951) and zlib (RFC 1950), concatenation is not part of the format, so trailing bytes that aren't a valid next stream turn into a hard error — whereas the underlying zlib C API and other bindings simply stop atZ_STREAM_ENDand leave the extra bytes unconsumed.Minimal reproducer (no external data)
A single trailing byte is enough. The error message varies with the trailing bytes (
truncated,invalid literal/length code,invalid block type), because the codec is trying to interpret them as the start of a new DEFLATE stream.GzipDecompressor, by contrast, correctly decodes genuinely concatenated members:How we hit this in the wild
We were handed a file described as "compressed via zlib." It turned out to be a zlib stream with its 2-byte header stripped and its 4-byte Adler-32 trailer truncated to 3 bytes, i.e.
[deflate body][first 3 of 4 Adler-32 bytes]. Decoding it as raw DEFLATE recovers the payload everywhere — but CodecZlib then tried to decode the 3 orphaned Adler-32 bytes as a concatenated stream and threwinvalid literal/length code. The data emitted before the throw was complete and correct (reading byte-by-byte from aDeflateDecompressorStreamyields the full output and only throws at EOF).Comparison to the standards
BFINALblock. No concatenation, no trailer, no framing for "what follows."So the reset-and-continue policy is well-founded for gzip but has no standards basis for raw DEFLATE / zlib.
Comparison to other implementations
zlib C API:
inflate()returnsZ_STREAM_ENDwhen the final block is consumed and does not loop on its own; decoding another concatenated stream requires an explicitinflateReset(). Trailing bytes are simply left unconsumed (avail_in > 0), not treated as an error.Python (
zlib): stops at the stream end and exposes the remainder rather than erroring:Direct
Zlib_jllinflatecall (single call,windowBits = -15) on the same data returnsZ_STREAM_ENDwith the full output and the trailing bytes simply unconsumed — i.e. the bug is not in zlib, it's in the reset-and-continue driving.Questions / possible resolutions
DeflateDecompressorandZlibDecompressorattempt concatenated-stream decoding at all, or should that be limited toGzipDecompressor(where the spec mandates it)?Z_STREAM_END) rather than a hard error — at least optionally?Z_STREAM_ENDand recover the unconsumed bytes (cf. zlib'savail_in/ Python'sunused_data)? This is the same need raised in Find end of Zlib stream #4.Happy to open a PR if there's a preferred direction.
Versions
Investigation and reproducers prepared with Claude Code (Claude Opus 4.8).