Add dtype, compression_level, and VRT output to geotiff I/O (#1083)#1085
Merged
brendancol merged 13 commits intomasterfrom Mar 30, 2026
Merged
Add dtype, compression_level, and VRT output to geotiff I/O (#1083)#1085brendancol merged 13 commits intomasterfrom
brendancol merged 13 commits intomasterfrom
Conversation
Covers dtype on read, compression_level on write, and VRT tiled output from to_geotiff for streaming dask writes.
Thread a compression_level: int | None = None parameter from the public to_geotiff() API down to the compress() call in the writer. Validation rejects out-of-range levels for deflate (1-9), zstd (1-22), and lz4 (0-16); codecs without a level concept (lzw, packbits, jpeg) accept any value and ignore it.
write_geotiff_gpu now accepts compression_level: int | None = None so callers that pass the parameter through to_geotiff on the GPU path no longer get an unexpected keyword argument error. The value is accepted silently since nvCOMP does not currently expose level control.
…port - Use keyword form compression_level=compression_level at the two _write_tiled call sites in write() for clarity and refactor safety. - Update compress() docstring to cover all codecs with level support (deflate, zstd, lz4) instead of only mentioning deflate. - Remove unused tempfile import from test_compression_level.py.
Adds `dtype=None` to `open_geotiff`, `read_geotiff_dask`, `read_geotiff_gpu`, and `read_vrt`. When specified, each path casts the array to the target dtype after nodata masking. Float-to-int casts raise ValueError to prevent accidental fractional data loss. Resolves part of #1083.
The stripped-file fallback in read_geotiff_gpu returned before reaching the dtype cast block. Add the cast before the early return so the dtype parameter is honored for non-tiled GeoTIFFs on the GPU path.
The dask path validated the dtype cast against the raw file dtype (e.g. uint16), but nodata masking promotes integer arrays to float64 inside the delayed worker. Requesting dtype='int32' on a uint16 file with nodata would pass validation then silently cast float64+NaN to int32, producing garbage. Now compute the effective post-masking dtype and validate against that. Also remove the dead `dtype` positional parameter from _delayed_read_window and add tests for the nodata+dtype interaction on both eager and dask paths.
When to_geotiff receives a path ending in .vrt, write a directory of tiled GeoTIFFs with a VRT index file instead of a monolithic TIFF. This lets dask arrays stream to disk without materializing the full array in RAM. - Dask inputs get one tile per chunk, written in parallel via dask.delayed - Numpy inputs are sliced by tile_size - cog=True and overview_levels raise ValueError with VRT output - Non-empty tiles directory raises FileExistsError - 11 new tests cover numpy/dask round-trips, tile naming, relative paths, compression forwarding, and edge cases
- Use scheduler='synchronous' in dask.compute so tiles write one at a time, keeping peak memory to a single chunk instead of N_cores chunks - Reject 3D arrays in both the dask and numpy VRT paths with a clear error message, since raw.chunks[0] would read the band dimension instead of rows for (bands, y, x) shaped arrays - Remove unused gpu parameter from _write_vrt_tiled signature and call site; CuPy detection already happens inside _write_single_tile
Demonstrates dtype on read, compression_level on write, and VRT tiled output using self-contained tempdir examples.
os.path.relpath produces backslashes on Windows, but VRT XML expects forward slashes for cross-platform portability.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
open_geotiff(path, dtype='float32')casts on read, one tile at a time, so the full-precision buffer is never allocated. Works on eager, dask, and GPU paths. Float-to-int casts are rejected (lossy); everything else is allowed.to_geotiff(data, path, compression_level=N)exposes the codec's speed/size knob (zstd 1-22, deflate 1-9, lz4 0-16). Threaded through to the writer and GPU compress paths. Codecs without levels ignore it silently.to_geotiff(data, 'out.vrt')writes a{stem}_tiles/directory of per-chunk GeoTIFFs plus a VRT index with relative paths. For dask inputs, tiles write one at a time viascheduler='synchronous', so peak memory is one chunk. Numpy inputs are sliced bytile_size.cog=Trueandoverview_levelsraise ValueError with VRT output.Closes #1083.
Test plan