Describe the bug
Recently, I have been consuming a proprietary REST API that returns gzip-encoded text data.
I discovered that the response was different from the response of the library requests or Postman. After a long research, I discovered that the REST API is sending concatenated gzip data.
It seems that decompressing concatenated gzip data requires special treatment of zlib.decompressobj.unused_data (for example, check this answer on StackOverflow).
I have confirmed that a similar implementation is found in urllib3 (which is used by the library requests), where unused data is checked for further decompression.
I think that this is where aiohttp decompresses the gzip data. The unused_data is not handled in any way. I tried changing that line of aiohttp code to something like this:
ret = self.decompressor.decompress(chunk)
while self.decompressor.unused_data:
chunk = self.decompressor.unused_data
self.decompressor = zlib.decompressobj(zlib.MAX_WBITS | 16)
ret += self.decompressor.decompress(chunk)
chunk = ret
and I was able to reproduce the response of the library requests.
I have labeled this issue as bug, since I believe that the desired behavior is the same as in the library requests. If this treatment of gzip data was intentional, is there a simple way to process concatenated gzip data? Currently, aiohttp only returns a fragment of the decompressed response with await response.text().
To Reproduce
Sorry, I cannot offer a way to reproduce because I am using a propietary REST API.
Expected behavior
All the concatenated gzip data should be decompressed and concatenated as requests does here.
Logs/tracebacks
Python Version
aiohttp Version
multidict Version
yarl Version
OS
Windows 10
Related component
Client
Additional context
No response
Code of Conduct
Describe the bug
Recently, I have been consuming a proprietary REST API that returns gzip-encoded text data.
I discovered that the response was different from the response of the library
requestsorPostman. After a long research, I discovered that the REST API is sending concatenated gzip data.It seems that decompressing concatenated gzip data requires special treatment of
zlib.decompressobj.unused_data(for example, check this answer on StackOverflow).I have confirmed that a similar implementation is found in urllib3 (which is used by the library
requests), where unused data is checked for further decompression.I think that this is where
aiohttpdecompresses the gzip data. Theunused_datais not handled in any way. I tried changing that line ofaiohttpcode to something like this:and I was able to reproduce the response of the library
requests.I have labeled this issue as bug, since I believe that the desired behavior is the same as in the library
requests. If this treatment of gzip data was intentional, is there a simple way to process concatenated gzip data? Currently,aiohttponly returns a fragment of the decompressed response withawait response.text().To Reproduce
Sorry, I cannot offer a way to reproduce because I am using a propietary REST API.
Expected behavior
All the concatenated gzip data should be decompressed and concatenated as
requestsdoes here.Logs/tracebacks
Python Version
Python 3.8.10aiohttp Version
aiohttp 3.8.3multidict Version
multidict 6.0.4yarl Version
yarl 1.8.2OS
Windows 10
Related component
Client
Additional context
No response
Code of Conduct