fix: accept GCS gzip responses without Content-Length by nkemnitz · Pull Request #782 · apache/arrow-rs-object-store

nkemnitz · 2026-06-25T16:03:57Z

Which issue does this PR close?

Part of #774 (does not fully close it — see below).

Rationale for this change

GCS serves large objects stored with Content-Encoding: gzip using chunked transfer with no Content-Length (and decompressive transcoding when the client does not accept gzip encoding). ObjectStore::get/head on GCS required
Content-Length unconditionally and failed with Generic { store: "GCS", source: Header { source: MissingContentLength } }, even though a chunked, self-delimiting body is a valid response (RFC 9112 §6.2 forbids Content-Length alongside Transfer-Encoding: chunked).

What changes are included in this PR?

HeaderConfig gains stored_size_header: Option<&'static str>. When Content-Length is absent, header_meta reads the object size from this header. GCS sets it to x-goog-stored-content-length (always present); S3, Azure and the
HTTP store leave it None, so a missing Content-Length stays a hard error for them.

Are there any user-facing changes?

get()/head() now succeed on chunked gzip GCS objects. On a server-decompressed (transcoded) read, ObjectMeta.size is the stored (compressed) size, since the decompressed length is not known without reading the body; on a passthrough read (Accept-Encoding: gzip) it is exact.

Not fully resolved: some transcoded GCS responses (default reads without Accept-Encoding: gzip) also omit the ETag entirely and still fail with MissingEtag. Left for a follow-up.

🤖 AI disclaimer:
All the code written by Claude. I made the changes as targeted and minimal as possible, for now only focusing on the chunked encoding, because that's my major blocker. Decompressive transcoding feels kind of niche. And the ETag handling involves some more thought and knowledge about this repo. E.g. I think the different Cloud vendors rely on custom metadata version headers to allow resuming downloads, rather than the ETag(?)...

GCS serves large objects stored with `Content-Encoding: gzip` using chunked transfer with no `Content-Length` (and decompressive transcoding when the client does not accept gzip encoding). The GET path required `Content-Length` unconditionally and failed with `MissingContentLength`, even though a chunked body is a valid self-delimiting response (RFC 9112 §6.2 forbids `Content-Length` with `Transfer-Encoding: chunked`). Add `HeaderConfig::stored_size_header`: when `Content-Length` is absent the size falls back to this header. GCS sets it to `x-goog-stored-content-length` (always present); S3, Azure and HTTP leave it `None`, so a missing `Content-Length` remains an error for them. This fixes the reported `MissingContentLength` failure. Some transcoded GCS responses also omit the ETag and still fail with `MissingEtag`; that is left for a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: accept GCS gzip responses without Content-Length#782

fix: accept GCS gzip responses without Content-Length#782
nkemnitz wants to merge 1 commit into
apache:mainfrom
ZettaAI:fix/gcs-gzip-missing-content-length

nkemnitz commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nkemnitz commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nkemnitz commented Jun 25, 2026 •

edited

Loading