RFC: opt-in GPU backend for invert_network (torch.linalg.lstsq, CUDA)

# RFC: opt-in GPU backend for invert_network (torch.linalg.lstsq, CUDA)

## Motivation

`invert_network` is consistently the longest single step in time-series InSAR processing. Within `run_ifgram_inversion_patch`, the dominant cost on real data is **per-pixel Python looping**.

The current CPU path splits pixels into three cases:

1. OLS pixels with all ifgrams valid → solved in a single `scipy.linalg.lstsq` call (good).
2. OLS pixels with per-pixel NaN masks → **Python loop, one `scipy.linalg.lstsq` call per pixel**.
3. WLS pixels (per-pixel weights) → **Python loop, one `scipy.linalg.lstsq` call per pixel**, regardless of NaN pattern.

On real data, NaN observations (atmosphere, decorrelation) are common and WLS is preferred for accuracy, so cases 2 and 3 dominate. `scipy.linalg.lstsq` cannot vectorize these because pixels effectively have different design matrices (different row masks, different weights).

`torch.linalg.lstsq` accepts a **batched** stack of independent (A_k, y_k) systems and dispatches to CUDA. This eliminates the Python loop in cases 2 and 3 by solving them in a single GPU call.

## Proposal

Add an opt-in **CUDA-only** GPU backend for `invert_network`, selectable via template (or `--backend` CLI flag):

    mintpy.networkInversion.backend = auto | cpu | torch

- `auto` (default): resolves to `cpu` via the existing `check_template_auto_value` static lookup. Behavior **unchanged**
  for any user who does not modify their template.
- `cpu` (explicit): existing scipy path, byte-for-byte unchanged.
- `torch` (explicit opt-in): batched `torch.linalg.lstsq` on CUDA.

PyTorch is gated behind a new `[gpu]` extras group in `pyproject.toml`. Users without the extras pay zero — neither dependency nor runtime cost.

### Scope (intentionally narrow)

- **CUDA only.** When the user requests `backend='torch'` and CUDA is unavailable, the run **fails fast with an explicit error** rather than silently falling back. Rationale: the user explicitly opted in; silent fallback would mask configuration / driver issues. Users without CUDA simply leave `backend = auto` (default cpu).
- **No CPU torch backend.** Out of scope. Could be a separate proposal later if there is demand; would expand test surface and maintenance.
- **Full-rank pixels only on GPU.** CUDA's `gels` driver does not handle rank-deficient systems. Rare in real SBAS networks;
  encountered cases produce NaN in the output. CPU path retains its existing rank handling.

## Evidence (reference run, single machine)

On FernandinaSenDT128 with an RTX 5080 (16 GiB VRAM, warm SSD) I observed **1.43× wall-time speedup** for the `invert_network` step vs the CPU path. Numerical equivalence verified at float32 round-off. A modest figure on a tutorial dataset, but applied to production-scale runs where `invert_network` dominates wall, the absolute time saved is practical. A larger-scene benchmark is planned to confirm the scaling story before PR.

- Working prototype:
  https://github.com/s-sasaki-earthsea-wizard/MintPy

## Open questions for maintainers

1. **API surface**: template flag (current) vs CLI flag vs env var — preference?
2. **Extras layout**: `[gpu]` (current) vs more specific name (`[cuda]` / `[torch-cuda]`)?
3. **CI**: comfortable adding a GPU-tagged CI job (or keeping it manual)? A no-op smoke import test is cheap and would catch packaging regressions.
4. **Docs scope**: install steps in existing `docs/installation.md`, or a separate `docs/gpu.md` modelled on `docs/dask.md`?

Following the Dask integration playbook (#349 → #351 → #357) as a reference for staged contribution. Happy to split into smaller issues or PRs as preferred.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: opt-in GPU backend for invert_network (torch.linalg.lstsq, CUDA) #1489

RFC: opt-in GPU backend for invert_network (torch.linalg.lstsq, CUDA)

Motivation

Proposal

Scope (intentionally narrow)

Evidence (reference run, single machine)

Open questions for maintainers

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RFC: opt-in GPU backend for invert_network (torch.linalg.lstsq, CUDA) #1489

Description

RFC: opt-in GPU backend for invert_network (torch.linalg.lstsq, CUDA)

Motivation

Proposal

Scope (intentionally narrow)

Evidence (reference run, single machine)

Open questions for maintainers

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions