Address Copilot review feedback on #1649

brendancol · brendancol · commit 50424ac30233 · 2026-05-11T17:50:37.000-07:00
Four wording fixes; no behavioural change:

- compression='jpeg': the docstring previously implied parity with
  to_geotiff, but to_geotiff rejects jpeg at runtime (it omits the
  JPEGTables tag and produces files that don't round-trip through GDAL).
  Spell out that write_geotiff_gpu DOES accept jpeg, that the on-disk
  bytes are self-contained JFIF tiles, and that GDAL/rasterio interop is
  not guaranteed until the JPEGTables fix lands.
- compression='jpeg2000' / 'j2k': replace "route to the CPU encoders"
  with the actual conditional behaviour (nvJPEG2K GPU encode first, CPU
  glymur fallback when libnvjpeg2k is missing) and call out that the two
  paths are not byte-stable, so byte-for-byte parity with to_geotiff
  isn't a contract here.
- test module docstring: distinguish "not nvCOMP-accelerated" (lzw,
  packbits, lz4, lerc — truly CPU-only) from jpeg2000/j2k (GPU first
  with CPU fallback) and from jpeg (write_geotiff_gpu only, separate
  test module).
- "exercised by test_features.py" referenced a file that doesn't cover
  JPEG. Repoint the test docstring to
  test_gpu_writer_compression_modes_2026_05_11.py which actually pins
  JPEG round-trips for the GPU writer.

All 7 tests in test_compression_docstring_1644.py still pass.
diff --git a/xrspatial/geotiff/__init__.py b/xrspatial/geotiff/__init__.py
@@ -2723,18 +2723,38 @@ def write_geotiff_gpu(data: xr.DataArray | cupy.ndarray | np.ndarray,
     nodata : float, int, or None
         NoData value.
     compression : str
-        Codec name. Accepts the same set as ``to_geotiff``: ``'none'``,
-        ``'deflate'``, ``'lzw'``, ``'jpeg'``, ``'packbits'``, ``'zstd'``,
-        ``'lz4'``, ``'jpeg2000'`` (alias ``'j2k'``), or ``'lerc'``.
+        Codec name. Accepts the same set ``to_geotiff`` lists in its
+        own signature: ``'none'``, ``'deflate'``, ``'lzw'``, ``'jpeg'``,
+        ``'packbits'``, ``'zstd'``, ``'lz4'``, ``'jpeg2000'`` (alias
+        ``'j2k'``), or ``'lerc'``.
 
         ``'zstd'`` (default) and ``'deflate'`` compress on the GPU via
         nvCOMP batch compression -- the fastest paths and the reason to
-        use this entry point. ``'jpeg'`` uses nvJPEG when available and
-        falls back to Pillow otherwise. ``'jpeg2000'`` / ``'j2k'`` and
-        ``'lerc'`` route to the CPU encoders so the output matches the
-        CPU writer byte-for-byte, but lose the GPU compression speedup.
-        ``'lzw'``, ``'packbits'``, and ``'lz4'`` likewise fall through
-        to the CPU encoder for parity with ``to_geotiff``.
+        use this entry point.
+
+        ``'jpeg'`` uses nvJPEG when libnvjpeg is loadable and falls
+        back to Pillow otherwise. Unlike ``to_geotiff`` (which rejects
+        ``compression='jpeg'`` at runtime because its CPU encoder omits
+        the required TIFF JPEGTables tag (347)), this GPU entry point
+        emits self-contained JFIF tiles. The two writers therefore
+        disagree about JPEG-in-TIFF interop: files produced here decode
+        fine through this library's own reader but may not round-trip
+        through GDAL/rasterio/libtiff readers that require the
+        JPEGTables tag. Treat ``write_geotiff_gpu(..., compression=
+        'jpeg')`` as "experimental, internal-reader only" until the
+        JPEGTables fix lands.
+
+        ``'jpeg2000'`` / ``'j2k'`` attempt an nvJPEG2K GPU encode first
+        and fall back to the CPU encoder (``glymur``) when libnvjpeg2k
+        is unavailable. The GPU and CPU paths are NOT byte-for-byte
+        identical (different libraries, different default parameters);
+        if you need exact CPU-writer parity, use ``to_geotiff`` instead.
+
+        ``'lerc'``, ``'lzw'``, ``'packbits'``, and ``'lz4'`` have no
+        nvCOMP/CUDA accelerator and fall through to the CPU encoder for
+        parity with ``to_geotiff`` (LERC is byte-stable across CPU/CPU
+        because there is only one encoder; the others are likewise
+        identical bytes).
     compression_level : int or None
         Compression effort level. Accepted for API compatibility but
         currently ignored -- nvCOMP does not expose level control.
diff --git a/xrspatial/geotiff/tests/test_compression_docstring_1644.py b/xrspatial/geotiff/tests/test_compression_docstring_1644.py
@@ -5,12 +5,30 @@
 ``write_geotiff_gpu.__doc__`` listed only four codecs (``'zstd'``,
 ``'deflate'``, ``'jpeg'``, ``'none'``) under the ``compression``
 parameter, while the implementation actually accepts every codec
-``to_geotiff`` does. Codecs unsupported by nvCOMP fall through to the
-CPU encoders (``lzw``, ``packbits``, ``lz4``, ``lerc``, ``jpeg2000`` /
-``j2k``) so the output matches the CPU writer byte-for-byte. This
-module pins the full codec list against future drift and confirms the
-underlying entry point accepts the codec names that the docstring now
-advertises.
+``to_geotiff`` does.
+
+Routing for the additional codecs:
+
+* ``'lzw'``, ``'packbits'``, ``'lz4'``, ``'lerc'`` -- not nvCOMP-
+  accelerated and have no GPU library, so they fall through to the
+  CPU encoder. Byte-for-byte identical to ``to_geotiff``.
+* ``'jpeg2000'`` / ``'j2k'`` -- attempts an nvJPEG2K *GPU* encode
+  first via ``_nvjpeg2k_batch_encode`` and falls back to the CPU
+  ``glymur`` encoder only when libnvjpeg2k is unavailable. The two
+  paths are NOT byte-stable against each other; this module pins the
+  acceptance contract (the codec name is accepted and a file gets
+  written), not output-byte parity with the CPU writer.
+* ``'jpeg'`` -- accepted here even though ``to_geotiff`` rejects it
+  (the CPU writer omits the JPEGTables tag, so its output doesn't
+  round-trip through GDAL). The GPU path emits self-contained JFIF
+  tiles. Covered separately by
+  ``test_gpu_writer_compression_modes_2026_05_11.py``; this module
+  excludes it from the parametrized fallback list because the test
+  data needs to be uint8 with sensible pixel content.
+
+This module pins the full codec list against future drift and confirms
+the underlying entry point accepts the codec names that the docstring
+now advertises.
 """
 from __future__ import annotations
 
@@ -42,11 +60,13 @@ def _gpu_available() -> bool:
 )
 
 
-# The full set ``to_geotiff`` accepts, mirrored to ``write_geotiff_gpu``
-# so both entry points stay in lockstep. Excludes ``jpeg`` because PR
-# #1633 already pins that name and the ``to_geotiff`` runtime rejects
-# it -- but it is still listed in the docstring as an accepted codec
-# name, matching ``to_geotiff``'s wording.
+# Codecs to exercise end-to-end through the GPU writer to confirm they
+# accept the docstring's advertised names. Excludes ``jpeg`` because
+# (a) ``to_geotiff`` rejects it at runtime and (b) the JPEG round-trip
+# is covered with appropriate uint8 RGB data in
+# ``test_gpu_writer_compression_modes_2026_05_11.py``; keeping it out of
+# this parametrize avoids exercising the JPEG path on dtype/shape
+# combinations that aren't representative.
 _GPU_FALLBACK_CODECS = (
     "lzw", "packbits", "lz4", "lerc", "jpeg2000", "j2k",
 )
@@ -83,11 +103,12 @@ def test_write_geotiff_gpu_accepts_cpu_fallback_codecs(tmp_path, codec):
 
     Confirms the docstring's promise that the GPU writer accepts the
     same codec set as ``to_geotiff``. ``jpeg`` is exercised separately
-    by ``test_features.py`` because the test data must be 8-bit
-    integer. ``jpeg2000`` / ``j2k`` go through ``glymur`` which only
-    accepts uint8/uint16 -- pick a uint16 source for those codecs so
-    the encode path is the one users actually hit, not a dtype-rejected
-    pre-check inside glymur.
+    by ``test_gpu_writer_compression_modes_2026_05_11.py`` because the
+    test data must be uint8 with sensible content. ``jpeg2000`` /
+    ``j2k`` will attempt nvJPEG2K if available and fall back to
+    ``glymur`` otherwise; either way the encoder needs uint8/uint16
+    input, so pick a uint16 source for those codecs so the encode path
+    is the one users actually hit, not a dtype-rejected pre-check.
     """
     import cupy