Skip to content

Fix ZLibDecompressor dropping data past the first gzip member#12674

Merged
Dreamsorcerer merged 10 commits into
aio-libs:masterfrom
Ashutosh-177:fix/zlib-multi-member-gzip
May 31, 2026
Merged

Fix ZLibDecompressor dropping data past the first gzip member#12674
Dreamsorcerer merged 10 commits into
aio-libs:masterfrom
Ashutosh-177:fix/zlib-multi-member-gzip

Conversation

@Ashutosh-177

Copy link
Copy Markdown
Contributor

What do these changes do?

When a response body contains concatenated gzip members (e.g. a server that produces one gzip member per write call, as nginx does under certain configs), zlib.decompressobj sets eof and stores the remaining bytes in unused_data after it finishes the first member. decompress_sync() wasn't checking unused_data at all, so every member after the first was silently discarded. The caller got truncated output with no error.

The fix applies the same while eof and unused_data loop that ZSTDDecompressor already uses for multi-frame zstd streams. Each iteration creates a fresh decompressor and feeds it the leftover bytes, accumulating output across all members. max_length is tracked and honoured across the loop.

Also added unused_data to ZLibDecompressObjProtocol so the attribute is properly typed, and three tests that mirror the existing ZSTD multi-frame test suite.

Are there changes in behaviour for the end user?

Yes — responses with concatenated gzip members now decompress fully instead of being silently truncated at the first member boundary.

Related issue number

Fixes #7157

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • make fmt has been run N/A (formatting only)
  • All the tests pass
  • Changelog entry added
  • Added myself to CONTRIBUTORS.txt

Drafted with Claude Sonnet 4.6; reviewed by Ashutosh-177.

When a response body contains concatenated gzip members (RFC 1952 §2.2),
zlib sets eof and moves the remaining bytes to unused_data once the
first member is fully consumed. decompress_sync() was not checking
unused_data, so every member after the first was silently discarded.

Apply the same while-eof-and-unused_data loop that ZSTDDecompressor
already uses for multi-frame zstd streams. Add unused_data to
ZLibDecompressObjProtocol so the attribute is typed. Include three
tests mirroring the existing ZSTD multi-frame test suite.

Fixes aio-libs#7157

Signed-off-by: Ashutosh Kumar Singh <ahutoshhjp1067@gmail.com>
@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided There is a change note present in this PR label May 21, 2026
@codecov

codecov Bot commented May 21, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.92%. Comparing base (05129a0) to head (6a5a354).
⚠️ Report is 5 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #12674   +/-   ##
=======================================
  Coverage   98.92%   98.92%           
=======================================
  Files         131      131           
  Lines       46790    46881   +91     
  Branches     2425     2431    +6     
=======================================
+ Hits        46285    46376   +91     
  Misses        379      379           
  Partials      126      126           
Flag Coverage Δ
Autobahn 22.45% <13.09%> (-0.02%) ⬇️
CI-GHA 98.89% <100.00%> (+<0.01%) ⬆️
OS-Linux 98.64% <100.00%> (+<0.01%) ⬆️
OS-Windows 97.02% <100.00%> (+<0.01%) ⬆️
OS-macOS 97.91% <100.00%> (+<0.01%) ⬆️
Py-3.10 98.13% <100.00%> (+<0.01%) ⬆️
Py-3.11 98.38% <100.00%> (+<0.01%) ⬆️
Py-3.12 98.46% <100.00%> (-0.01%) ⬇️
Py-3.13 98.44% <100.00%> (+<0.01%) ⬆️
Py-3.14 98.46% <100.00%> (+<0.01%) ⬆️
Py-3.14t 97.53% <100.00%> (+<0.01%) ⬆️
Py-pypy-3.11 97.39% <96.42%> (+<0.01%) ⬆️
VM-macos 97.91% <100.00%> (+<0.01%) ⬆️
VM-ubuntu 98.64% <100.00%> (+<0.01%) ⬆️
VM-windows 97.02% <100.00%> (+<0.01%) ⬆️
cython-coverage 37.97% <58.33%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add gzip and decompressor to the spelling wordlist, and replace the
unresolvable Sphinx class cross-reference with a plain code literal.
@Ashutosh-177 Ashutosh-177 marked this pull request as ready for review May 21, 2026 18:28
Copilot AI review requested due to automatic review settings May 21, 2026 18:28

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes truncated decompression output when servers send concatenated gzip members (multi-member gzip), by teaching ZLibDecompressor.decompress_sync() to continue decompressing unused_data after the first member ends—similar to the existing multi-frame handling in ZSTDDecompressor.

Changes:

  • Add a loop in ZLibDecompressor.decompress_sync() to process concatenated gzip/deflate members via unused_data.
  • Extend typing for the zlib decompressor protocol to include unused_data.
  • Add unit tests for concatenated gzip members and update spelling wordlist/changelog/contributors.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
aiohttp/compression_utils.py Adds multi-member handling for concatenated gzip/deflate streams via unused_data.
tests/test_compression_utils.py Adds tests for concatenated gzip-member decoding and max_length behavior.
docs/spelling_wordlist.txt Adds “decompressor” and “gzip” to the spelling whitelist.
CHANGES/7157.bugfix.rst Documents the bugfix for concatenated gzip/deflate decompression.
CONTRIBUTORS.txt Adds contributor entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread aiohttp/compression_utils.py Outdated
Comment thread tests/test_compression_utils.py
@codspeed-hq

codspeed-hq Bot commented May 21, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 72 untouched benchmarks
⏩ 72 skipped benchmarks1


Comparing Ashutosh-177:fix/zlib-multi-member-gzip (6a5a354) with master (534a758)

Open in CodSpeed

Footnotes

  1. 72 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Ashutosh-177 and others added 3 commits May 22, 2026 00:11
When the output budget is exhausted in the multi-member loop, store the
leftover compressed bytes in _pending_unused_data so the next
decompress_sync() call picks them up, matching the ZSTDDecompressor
behaviour. Also expose _pending_unused_data in data_available and add a
test that mirrors test_zstd_multi_frame_max_length_exhausted_preserves_unused_data.
Comment thread tests/test_compression_utils.py
Comment thread aiohttp/compression_utils.py Outdated
@Dreamsorcerer Dreamsorcerer added the pr-unfinished The PR is unfinished and may need a volunteer to complete it label May 30, 2026
Ashutosh-177 and others added 3 commits May 31, 2026 00:37
Align the multi-member loop in ZLibDecompressor.decompress_sync() with
the cleaner ZSTD pattern: update max_length in-place rather than
computing a separate remaining variable each iteration.

Also add the eof-at-boundary reset that ZSTD already has: when a gzip
member ends exactly at a chunk boundary, unused_data is empty so the
while loop never runs, but the spent decompressor would error on the
next feed_data() call. Reset it to a fresh decompressobj the same way
the ZSTD path does.

Add four HTTP parser-level tests mirroring the ZSTD multi-frame suite:
all-at-once, chunked, split mid-member, and many small members.
@Dreamsorcerer

Copy link
Copy Markdown
Member

Seems to have broken something. If you get it working by tomorrow morning, we'll include it in the next release.

@github-actions github-actions Bot removed the pr-unfinished The PR is unfinished and may need a volunteer to complete it label May 30, 2026
The chunk-boundary reset was unconditionally replacing the spent
decompressor with a fresh one, which set eof=False. DeflateBuffer.feed_eof()
checks not decompressor.eof for deflate encoding and raises
ContentEncodingError when it sees False, breaking all deflate responses.

Guard the reset on gzip mode (wbits > MAX_WBITS) only. Deflate has a
single stream and relies on eof=True to signal completion; gzip
multi-member streams need the fresh decompressor for the next chunk.
@Ashutosh-177

Copy link
Copy Markdown
Contributor Author

Fixed — the eof reset was breaking deflate by resetting eof to False, which DeflateBuffer.feed_eof() treats as an incomplete stream. Guarded it to gzip mode only (wbits > MAX_WBITS). The Autobahn failure is flaky and unrelated to this change.

@Dreamsorcerer Dreamsorcerer added the backport-3.14 Trigger automatic backporting to the 3.14 release branch by Patchback robot label May 31, 2026
@Dreamsorcerer Dreamsorcerer merged commit e8832ae into aio-libs:master May 31, 2026
85 of 88 checks passed
@patchback

patchback Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Backport to 3.14: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.14/e8832aee7f694979292740b9b56a9c98fb0afc1a/pr-12674

Backported as #12745

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Dreamsorcerer pushed a commit that referenced this pull request May 31, 2026
…a past the first gzip member (#12745)

**This is a backport of PR #12674 as merged into master
(e8832ae).**

Co-authored-by: Ashutosh Kumar Singh <144926351+Ashutosh-177@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-3.14 Trigger automatic backporting to the 3.14 release branch by Patchback robot bot:chronographer:provided There is a change note present in this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decompressing concatenated gzip

3 participants