geotiff: validate HTTP range responses (#1735)#1744
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the GeoTIFF HTTP range reader by validating that server responses actually satisfy Range requests, preventing silent misreads (wrong offsets), truncated bodies, and non-partial responses from being treated as valid byte windows.
Changes:
- Added
_HTTPSource._validate_range_responseto validate status codes, parse/verifyContent-Range, and check body length consistency. - Integrated the validation into both urllib3 and stdlib urllib transport paths in
_HTTPSource.read_range. - Added regression tests that spin up local HTTP servers to simulate misbehaving range responses (issue #1735).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_reader.py |
Adds and wires in HTTP range-response validation for status/headers/body-length correctness. |
xrspatial/geotiff/tests/test_http_range_validation_1735.py |
Adds loopback-server regression tests covering ignored ranges, mismatched Content-Range, short bodies, and a well-formed 206. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
682
to
685
| def read_range(self, start: int, length: int) -> bytes: | ||
| end = start + length - 1 | ||
| headers = {'Range': f'bytes={start}-{end}'} | ||
| if self._pool is not None: |
| ) | ||
| if len(data) < min(length, 1): | ||
| # Empty body but caller wanted bytes. | ||
| raise OSError("HTTP 200 response body is empty.") |
_HTTPSource.read_range previously handed the response body to the caller without checking the status code, Content-Range header, or returned byte length. Three failure modes slipped through silently: - urllib3 by default does not raise on non-2xx; a 4xx/5xx body reached the codec as if it were the requested range. - A server that ignored Range and returned the whole object as a 200 with no Content-Range was trusted for any non-zero start, handing the codec wrong-offset bytes. - A truncated body, or one whose Content-Range header did not match the request, surfaced as an opaque decode error far from the real cause. Add _validate_range_response that checks status, parses Content-Range, and confirms the body length matches what Content-Range advertises. Short reads near EOF stay legitimate: the validator only requires the Content-Range to start at the requested offset and the body length to match what the server itself reported. Applied to both the urllib3 and stdlib urllib code paths. Adds regression coverage in test_http_range_validation_1735.py.
Two Copilot review follow-ups on PR #1744: - _HTTPSource.read_range now short-circuits to b'' for length <= 0, matching the convention used by other source implementations like _BytesIOSource. Without the guard, Range: bytes=<start>-<start-1> goes on the wire, which is invalid and either trips a 416 from a well-behaved server or pulls down arbitrarily large bytes from a misbehaving one. - _validate_range_response now slices the response to length bytes when status == 200, Content-Range is missing, and the body is longer than requested. The implicit contract of read_range is "at most length bytes", so a 16 KiB header prefetch against a 2 GiB object should not drag the whole thing into memory just because the server ignored Range. Tests cover both fixes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_HTTPSource.read_rangepreviously returned the response body without checking status,Content-Range, or length. urllib3 does not raise on non-2xx by default, and a server that ignoredRangereturned the full object as 200 — both slipped through silently._validate_range_responsecovering status, Content-Range parse + offset match, and body-length vs Content-Range agreement.Closes #1735.
Test plan
test_http_range_validation_1735.pycovers a server that ignoresRange(status 200 at non-zero start), a server with mismatchedContent-Range, a truncated body, and a well-formed 206.