Skip to content

Decompressing concatenated gzip #7157

@davenza

Description

@davenza

Describe the bug

Recently, I have been consuming a proprietary REST API that returns gzip-encoded text data.

I discovered that the response was different from the response of the library requests or Postman. After a long research, I discovered that the REST API is sending concatenated gzip data.

It seems that decompressing concatenated gzip data requires special treatment of zlib.decompressobj.unused_data (for example, check this answer on StackOverflow).

I have confirmed that a similar implementation is found in urllib3 (which is used by the library requests), where unused data is checked for further decompression.

I think that this is where aiohttp decompresses the gzip data. The unused_data is not handled in any way. I tried changing that line of aiohttp code to something like this:

ret = self.decompressor.decompress(chunk)
while self.decompressor.unused_data:
    chunk = self.decompressor.unused_data
    self.decompressor = zlib.decompressobj(zlib.MAX_WBITS | 16)
    ret += self.decompressor.decompress(chunk)

chunk = ret

and I was able to reproduce the response of the library requests.

I have labeled this issue as bug, since I believe that the desired behavior is the same as in the library requests. If this treatment of gzip data was intentional, is there a simple way to process concatenated gzip data? Currently, aiohttp only returns a fragment of the decompressed response with await response.text().

To Reproduce

Sorry, I cannot offer a way to reproduce because I am using a propietary REST API.

Expected behavior

All the concatenated gzip data should be decompressed and concatenated as requests does here.

Logs/tracebacks

No logs.

Python Version

Python 3.8.10

aiohttp Version

aiohttp 3.8.3

multidict Version

multidict 6.0.4

yarl Version

yarl 1.8.2

OS

Windows 10

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions