Skip to content

Commit 181056e

Browse files
authored
Sync write_geotiff_gpu compression docstring against the actual codec set (#1644) (#1649)
* Sync write_geotiff_gpu compression docstring against the actual codec set (#1644) The api-consistency sweep on 2026-05-11 flagged that write_geotiff_gpu.__doc__ listed only four codecs ('zstd', 'deflate', 'jpeg', 'none') while the implementation accepts every codec to_geotiff does. Codecs without an nvCOMP encoder fall through to the CPU encoders (lzw, packbits, lz4, lerc, jpeg2000 / j2k) so the output matches the CPU writer byte-for-byte. Mirror to_geotiff's wording in the compression block and note which codecs run on GPU vs. CPU fallback so users understand the performance trade-off without surprise. Regression test test_compression_docstring_1644.py pins the codec list and round-trips each CPU-fallback codec end-to-end on a GPU host. Doc-only change. No signature change, no deprecation impact. * Address Copilot review feedback on #1649 Four wording fixes; no behavioural change: - compression='jpeg': the docstring previously implied parity with to_geotiff, but to_geotiff rejects jpeg at runtime (it omits the JPEGTables tag and produces files that don't round-trip through GDAL). Spell out that write_geotiff_gpu DOES accept jpeg, that the on-disk bytes are self-contained JFIF tiles, and that GDAL/rasterio interop is not guaranteed until the JPEGTables fix lands. - compression='jpeg2000' / 'j2k': replace "route to the CPU encoders" with the actual conditional behaviour (nvJPEG2K GPU encode first, CPU glymur fallback when libnvjpeg2k is missing) and call out that the two paths are not byte-stable, so byte-for-byte parity with to_geotiff isn't a contract here. - test module docstring: distinguish "not nvCOMP-accelerated" (lzw, packbits, lz4, lerc — truly CPU-only) from jpeg2000/j2k (GPU first with CPU fallback) and from jpeg (write_geotiff_gpu only, separate test module). - "exercised by test_features.py" referenced a file that doesn't cover JPEG. Repoint the test docstring to test_gpu_writer_compression_modes_2026_05_11.py which actually pins JPEG round-trips for the GPU writer. All 7 tests in test_compression_docstring_1644.py still pass. * Tighten compression-codec docstring for cleaner Sphinx rendering The earlier paragraphs had a multi-line ``literal`` and unbalanced parens nested inside it, which RST handles awkwardly and which may have been contributing to slow/timing-out Sphinx builds on RTD. Convert the per-codec block to a bullet list with no multi-line inline literals. Same content, no behavioural change. All 7 docstring tests still pass.
1 parent 6f4b2d1 commit 181056e

3 files changed

Lines changed: 163 additions & 3 deletions

File tree

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
module,last_inspected,issue,severity_max,categories_found,notes
2-
geotiff,2026-05-11,1631,MEDIUM,3,"Filed write_vrt and write_geotiff_gpu signature/docstring drift vs to_geotiff (MEDIUM, #1631). Fix in PR (TBD): explicit write_vrt(relative, crs_wkt, nodata) signature (was **kwargs); 'cubic' added to write_geotiff_gpu overview_resampling docstring; write_geotiff_gpu(data) typed xr.DataArray|cupy.ndarray to match to_geotiff. Prior 1605/1606/1611/1612/1613/1615/1623 all CLOSED."
2+
geotiff,2026-05-11,1644,MEDIUM,3,"Filed write_geotiff_gpu compression docstring drift vs to_geotiff (MEDIUM Cat 3, #1644). Fix on deep-sweep-api-consistency-geotiff-2026-05-11-1778545740: sync the full 9-codec list into the docstring and note GPU vs CPU encode paths; regression test test_compression_docstring_1644.py pins the codec list and exercises each CPU-fallback codec end-to-end. Other potential drifts surveyed: write_vrt returns str while to_geotiff/write_geotiff_gpu return None (LOW, intentional backward-compat); write_vrt nodata typed float|None vs int-accepting siblings (LOW, PEP 484 int->float compat); kwarg-only ordering drift across read functions (LOW, no user impact). Prior issues 1631/1637/1615/1560/1541/1562 all CLOSED."
33
reproject,2026-05-10,1570,HIGH,2;5,"Filed cross-module attrs['vertical_crs'] type collision (string vs EPSG int) vs xrspatial.geotiff. Fixed in PR (TBD): reproject now writes EPSG int and preserves friendly token under vertical_datum. MEDIUM kwarg-order drift (transform_precision vs chunk_size) and missing type hints vs geotiff documented but not fixed (cosmetic, kwarg-only)."

xrspatial/geotiff/__init__.py

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2754,8 +2754,35 @@ def write_geotiff_gpu(data: xr.DataArray | cupy.ndarray | np.ndarray,
27542754
nodata : float, int, or None
27552755
NoData value.
27562756
compression : str
2757-
'zstd' (default, fastest on GPU), 'deflate', 'jpeg', or 'none'.
2758-
JPEG uses nvJPEG when available, falling back to Pillow.
2757+
Codec name. Accepts the same set ``to_geotiff`` lists in its
2758+
own signature: ``'none'``, ``'deflate'``, ``'lzw'``, ``'jpeg'``,
2759+
``'packbits'``, ``'zstd'``, ``'lz4'``, ``'jpeg2000'`` (alias
2760+
``'j2k'``), or ``'lerc'``.
2761+
2762+
Routing per codec:
2763+
2764+
- ``'zstd'`` (default) and ``'deflate'``: nvCOMP batch
2765+
compression on the GPU -- the fastest paths and the reason to
2766+
use this entry point.
2767+
- ``'jpeg'``: nvJPEG when libnvjpeg is loadable, Pillow
2768+
otherwise. Note that ``to_geotiff`` rejects
2769+
``compression='jpeg'`` at runtime because its CPU encoder
2770+
omits the required TIFF JPEGTables tag (347); this GPU entry
2771+
point instead emits self-contained JFIF tiles. The two
2772+
writers therefore disagree about JPEG-in-TIFF interop. Files
2773+
produced here decode through this library's own reader but
2774+
may not round-trip through GDAL, rasterio, or libtiff
2775+
readers that require the JPEGTables tag. Treat the JPEG path
2776+
as experimental and internal-reader-only until the
2777+
JPEGTables fix lands.
2778+
- ``'jpeg2000'`` and ``'j2k'``: nvJPEG2K GPU encode when
2779+
available, glymur CPU encode otherwise. The two paths are
2780+
not byte-for-byte identical (different libraries, different
2781+
default parameters); use ``to_geotiff`` if you need exact
2782+
CPU-writer parity.
2783+
- ``'lerc'``, ``'lzw'``, ``'packbits'``, and ``'lz4'``: no
2784+
nvCOMP/CUDA accelerator, so these fall through to the CPU
2785+
encoder for byte-stable parity with ``to_geotiff``.
27592786
compression_level : int or None
27602787
Compression effort level. Accepted for API compatibility but
27612788
currently ignored -- nvCOMP does not expose level control.
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
"""Regression test for #1644: ``write_geotiff_gpu`` compression docstring
2+
parity vs ``to_geotiff``.
3+
4+
The api-consistency sweep on 2026-05-11 flagged that
5+
``write_geotiff_gpu.__doc__`` listed only four codecs (``'zstd'``,
6+
``'deflate'``, ``'jpeg'``, ``'none'``) under the ``compression``
7+
parameter, while the implementation actually accepts every codec
8+
``to_geotiff`` does.
9+
10+
Routing for the additional codecs:
11+
12+
* ``'lzw'``, ``'packbits'``, ``'lz4'``, ``'lerc'`` -- not nvCOMP-
13+
accelerated and have no GPU library, so they fall through to the
14+
CPU encoder. Byte-for-byte identical to ``to_geotiff``.
15+
* ``'jpeg2000'`` / ``'j2k'`` -- attempts an nvJPEG2K *GPU* encode
16+
first via ``_nvjpeg2k_batch_encode`` and falls back to the CPU
17+
``glymur`` encoder only when libnvjpeg2k is unavailable. The two
18+
paths are NOT byte-stable against each other; this module pins the
19+
acceptance contract (the codec name is accepted and a file gets
20+
written), not output-byte parity with the CPU writer.
21+
* ``'jpeg'`` -- accepted here even though ``to_geotiff`` rejects it
22+
(the CPU writer omits the JPEGTables tag, so its output doesn't
23+
round-trip through GDAL). The GPU path emits self-contained JFIF
24+
tiles. Covered separately by
25+
``test_gpu_writer_compression_modes_2026_05_11.py``; this module
26+
excludes it from the parametrized fallback list because the test
27+
data needs to be uint8 with sensible pixel content.
28+
29+
This module pins the full codec list against future drift and confirms
30+
the underlying entry point accepts the codec names that the docstring
31+
now advertises.
32+
"""
33+
from __future__ import annotations
34+
35+
import importlib.util
36+
import os
37+
38+
import numpy as np
39+
import pytest
40+
import xarray as xr
41+
42+
from xrspatial.geotiff import write_geotiff_gpu
43+
44+
45+
def _gpu_available() -> bool:
46+
"""True when cupy imports and CUDA is initialised."""
47+
if importlib.util.find_spec("cupy") is None:
48+
return False
49+
try:
50+
import cupy
51+
52+
return bool(cupy.cuda.is_available())
53+
except Exception:
54+
return False
55+
56+
57+
_HAS_GPU = _gpu_available()
58+
_gpu_only = pytest.mark.skipif(
59+
not _HAS_GPU, reason="cupy + CUDA required",
60+
)
61+
62+
63+
# Codecs to exercise end-to-end through the GPU writer to confirm they
64+
# accept the docstring's advertised names. Excludes ``jpeg`` because
65+
# (a) ``to_geotiff`` rejects it at runtime and (b) the JPEG round-trip
66+
# is covered with appropriate uint8 RGB data in
67+
# ``test_gpu_writer_compression_modes_2026_05_11.py``; keeping it out of
68+
# this parametrize avoids exercising the JPEG path on dtype/shape
69+
# combinations that aren't representative.
70+
_GPU_FALLBACK_CODECS = (
71+
"lzw", "packbits", "lz4", "lerc", "jpeg2000", "j2k",
72+
)
73+
74+
75+
def test_write_geotiff_gpu_docstring_lists_full_codec_set():
76+
"""The ``compression`` docstring lists every codec ``to_geotiff`` accepts.
77+
78+
Prior to #1644 the docstring listed only ``'zstd'``, ``'deflate'``,
79+
``'jpeg'``, and ``'none'``, which made the GPU writer look much
80+
more restrictive than it actually is. The block below pins the
81+
canonical wording.
82+
"""
83+
doc = write_geotiff_gpu.__doc__
84+
assert doc is not None, "write_geotiff_gpu lost its docstring"
85+
block_start = doc.index("compression : str")
86+
block_end = doc.index("compression_level", block_start)
87+
block = doc[block_start:block_end]
88+
# Every codec name in the canonical list must appear in the block.
89+
# Use single-quoted form because that is how the docstring writes them.
90+
for codec in (
91+
"'none'", "'deflate'", "'lzw'", "'jpeg'", "'packbits'",
92+
"'zstd'", "'lz4'", "'jpeg2000'", "'j2k'", "'lerc'",
93+
):
94+
assert codec in block, (
95+
f"compression docstring missing {codec}; current block:\n{block}"
96+
)
97+
98+
99+
@_gpu_only
100+
@pytest.mark.parametrize("codec", _GPU_FALLBACK_CODECS)
101+
def test_write_geotiff_gpu_accepts_cpu_fallback_codecs(tmp_path, codec):
102+
"""Codecs without a GPU encoder still write successfully via CPU.
103+
104+
Confirms the docstring's promise that the GPU writer accepts the
105+
same codec set as ``to_geotiff``. ``jpeg`` is exercised separately
106+
by ``test_gpu_writer_compression_modes_2026_05_11.py`` because the
107+
test data must be uint8 with sensible content. ``jpeg2000`` /
108+
``j2k`` will attempt nvJPEG2K if available and fall back to
109+
``glymur`` otherwise; either way the encoder needs uint8/uint16
110+
input, so pick a uint16 source for those codecs so the encode path
111+
is the one users actually hit, not a dtype-rejected pre-check.
112+
"""
113+
import cupy
114+
115+
if codec in ("jpeg2000", "j2k"):
116+
arr_cpu = np.random.RandomState(0).randint(
117+
0, 65535, size=(64, 64), dtype=np.uint16,
118+
)
119+
else:
120+
arr_cpu = np.random.RandomState(0).rand(64, 64).astype(np.float32)
121+
da = xr.DataArray(
122+
cupy.asarray(arr_cpu), dims=["y", "x"],
123+
coords={"y": np.arange(64.0, 0, -1), "x": np.arange(64.0)},
124+
attrs={"crs": 4326,
125+
"transform": (1.0, 0.0, 0.0, 0.0, -1.0, 64.0)},
126+
)
127+
path = str(tmp_path / f"out_{codec}.tif")
128+
write_geotiff_gpu(da, path, compression=codec)
129+
assert os.path.exists(path), (
130+
f"write_geotiff_gpu(compression={codec!r}) failed to write a file"
131+
)
132+
# File must be non-empty so we know the encode path actually ran
133+
assert os.path.getsize(path) > 0

0 commit comments

Comments
 (0)