Skip to content

fix: accept GCS gzip responses without Content-Length#782

Open
nkemnitz wants to merge 1 commit into
apache:mainfrom
ZettaAI:fix/gcs-gzip-missing-content-length
Open

fix: accept GCS gzip responses without Content-Length#782
nkemnitz wants to merge 1 commit into
apache:mainfrom
ZettaAI:fix/gcs-gzip-missing-content-length

Conversation

@nkemnitz

@nkemnitz nkemnitz commented Jun 25, 2026

Copy link
Copy Markdown

Which issue does this PR close?

Part of #774 (does not fully close it — see below).

Rationale for this change

GCS serves large objects stored with Content-Encoding: gzip using chunked transfer with no Content-Length (and decompressive transcoding when the client does not accept gzip encoding). ObjectStore::get/head on GCS required
Content-Length unconditionally and failed with Generic { store: "GCS", source: Header { source: MissingContentLength } }, even though a chunked, self-delimiting body is a valid response (RFC 9112 §6.2 forbids Content-Length alongside Transfer-Encoding: chunked).

What changes are included in this PR?

HeaderConfig gains stored_size_header: Option<&'static str>. When Content-Length is absent, header_meta reads the object size from this header. GCS sets it to x-goog-stored-content-length (always present); S3, Azure and the
HTTP store leave it None, so a missing Content-Length stays a hard error for them.

Are there any user-facing changes?

get()/head() now succeed on chunked gzip GCS objects. On a server-decompressed (transcoded) read, ObjectMeta.size is the stored (compressed) size, since the decompressed length is not known without reading the body; on a passthrough read (Accept-Encoding: gzip) it is exact.

Not fully resolved: some transcoded GCS responses (default reads without Accept-Encoding: gzip) also omit the ETag entirely and still fail with MissingEtag. Left for a follow-up.


🤖 AI disclaimer:
All the code written by Claude. I made the changes as targeted and minimal as possible, for now only focusing on the chunked encoding, because that's my major blocker. Decompressive transcoding feels kind of niche. And the ETag handling involves some more thought and knowledge about this repo. E.g. I think the different Cloud vendors rely on custom metadata version headers to allow resuming downloads, rather than the ETag(?)...

GCS serves large objects stored with `Content-Encoding: gzip` using chunked
transfer with no `Content-Length` (and decompressive transcoding when the client
does not accept gzip encoding). The GET path required `Content-Length`
unconditionally and failed with `MissingContentLength`, even though a chunked
body is a valid self-delimiting response (RFC 9112 §6.2 forbids `Content-Length`
with `Transfer-Encoding: chunked`).

Add `HeaderConfig::stored_size_header`: when `Content-Length` is absent the size
falls back to this header. GCS sets it to `x-goog-stored-content-length` (always
present); S3, Azure and HTTP leave it `None`, so a missing `Content-Length`
remains an error for them.

This fixes the reported `MissingContentLength` failure. Some transcoded GCS
responses also omit the ETag and still fail with `MissingEtag`; that is left for
a follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant