|
| 1 | +# Backend Parity: Cross-Backend Consistency Audit |
| 2 | + |
| 3 | +Verify that all implemented backends produce consistent results for a given |
| 4 | +function or set of functions. The prompt is: $ARGUMENTS |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +## Step 1 -- Identify targets |
| 9 | + |
| 10 | +1. If $ARGUMENTS names specific functions (e.g. `slope`, `aspect`), use those. |
| 11 | +2. If $ARGUMENTS names a category (e.g. `hydrology`, `surface`, `focal`), read |
| 12 | + `README.md` to find all functions in that category. |
| 13 | +3. If $ARGUMENTS is empty or says "all", scan the full feature matrix in `README.md` |
| 14 | + and test every function that claims support for 2+ backends. |
| 15 | +4. For each function, read its source file and find the `ArrayTypeFunctionMapping` |
| 16 | + call to determine which backends are actually implemented (not just what the |
| 17 | + README claims). |
| 18 | + |
| 19 | +## Step 2 -- Build test inputs |
| 20 | + |
| 21 | +For each target function, create test rasters at three scales: |
| 22 | + |
| 23 | +| Name | Size | Purpose | |
| 24 | +|---------|---------|--------------------------------------------------| |
| 25 | +| tiny | 8x6 | Fast, easy to inspect cell-by-cell | |
| 26 | +| medium | 64x64 | Catches chunk-boundary artifacts in dask | |
| 27 | +| large | 256x256 | Stress test, exposes numerical accumulation drift | |
| 28 | + |
| 29 | +For each size, generate two variants: |
| 30 | +- **Clean:** no NaN, realistic value range for the function |
| 31 | + (e.g. 0-5000m for elevation, 0-1 for NDVI inputs) |
| 32 | +- **Dirty:** 5-10% random NaN, some extreme values near dtype limits |
| 33 | + |
| 34 | +Use `np.random.default_rng(42)` for reproducibility. For functions that require |
| 35 | +specific input structure (e.g. `flow_direction` needs a DEM with drainage, not |
| 36 | +random noise), use the project's `perlin` module or a synthetic cone/valley. |
| 37 | + |
| 38 | +Also test with at least two dtypes: `float32` and `float64`. |
| 39 | + |
| 40 | +## Step 3 -- Run every backend |
| 41 | + |
| 42 | +For each function, input variant, and dtype: |
| 43 | + |
| 44 | +1. **NumPy:** `create_test_raster(data, backend='numpy')` -- always the baseline. |
| 45 | +2. **Dask+NumPy:** test with two chunk configurations: |
| 46 | + - `chunks=(size//2, size//2)` -- even split |
| 47 | + - `chunks=(size//3, size//3)` -- ragged remainder |
| 48 | +3. **CuPy:** `create_test_raster(data, backend='cupy')` -- skip if CUDA unavailable. |
| 49 | +4. **Dask+CuPy:** `create_test_raster(data, backend='dask+cupy')` -- skip if CUDA |
| 50 | + unavailable. |
| 51 | + |
| 52 | +If the function has parameter variants (e.g. `boundary`, `method`), test the |
| 53 | +default parameters first. If $ARGUMENTS includes "thorough", also sweep all |
| 54 | +parameter combinations. |
| 55 | + |
| 56 | +## Step 4 -- Pairwise comparison |
| 57 | + |
| 58 | +For every non-NumPy result, compare against the NumPy baseline. Extract data using |
| 59 | +the project conventions: |
| 60 | +- Dask: `.data.compute()` |
| 61 | +- CuPy: `.data.get()` |
| 62 | +- Dask+CuPy: `.data.compute().get()` |
| 63 | + |
| 64 | +For each pair, compute and record: |
| 65 | + |
| 66 | +### 4a. Value agreement |
| 67 | +```python |
| 68 | +abs_diff = np.abs(result - baseline) |
| 69 | +max_abs = np.nanmax(abs_diff) |
| 70 | +rel_diff = abs_diff / (np.abs(baseline) + 1e-30) # avoid div-by-zero |
| 71 | +max_rel = np.nanmax(rel_diff) |
| 72 | +mean_abs = np.nanmean(abs_diff) |
| 73 | +``` |
| 74 | + |
| 75 | +### 4b. NaN mask agreement |
| 76 | +```python |
| 77 | +nan_match = np.array_equal(np.isnan(result), np.isnan(baseline)) |
| 78 | +nan_only_in_result = np.sum(np.isnan(result) & ~np.isnan(baseline)) |
| 79 | +nan_only_in_baseline = np.sum(np.isnan(baseline) & ~np.isnan(result)) |
| 80 | +``` |
| 81 | + |
| 82 | +### 4c. Metadata preservation |
| 83 | +Using `general_output_checks` from `general_checks.py`: |
| 84 | +- Output type matches input type (DataArray backed by the same array type) |
| 85 | +- Shape, dims, coords, attrs preserved |
| 86 | + |
| 87 | +### 4d. Pass/fail thresholds |
| 88 | + |
| 89 | +| Comparison | rtol | atol | |
| 90 | +|-----------------------|----------|----------| |
| 91 | +| NumPy vs Dask+NumPy | 1e-5 | 0 | |
| 92 | +| NumPy vs CuPy | 1e-6 | 1e-6 | |
| 93 | +| NumPy vs Dask+CuPy | 1e-6 | 1e-6 | |
| 94 | + |
| 95 | +A comparison **fails** if `max_abs > atol` AND `max_rel > rtol`, or if NaN masks |
| 96 | +disagree. |
| 97 | + |
| 98 | +## Step 5 -- Chunk boundary analysis |
| 99 | + |
| 100 | +Dask backends are the most likely source of parity issues due to `map_overlap` |
| 101 | +boundary handling. For any Dask comparison that fails or is borderline: |
| 102 | + |
| 103 | +1. Identify which cells diverge from the NumPy result. |
| 104 | +2. Map those cells to chunk boundaries (cells within `depth` pixels of a chunk edge). |
| 105 | +3. Report what percentage of divergent cells are at chunk boundaries vs interior. |
| 106 | +4. If all divergence is at boundaries, the issue is likely in the `map_overlap` |
| 107 | + `depth` or `boundary` parameter. Say so explicitly. |
| 108 | + |
| 109 | +## Step 6 -- Generate the report |
| 110 | + |
| 111 | +``` |
| 112 | +## Backend Parity Report |
| 113 | +
|
| 114 | +### Functions tested |
| 115 | +| Function | Backends implemented | Source file | |
| 116 | +|---------------------|---------------------------|--------------------------| |
| 117 | +| slope | numpy, cupy, dask, dask+cupy | xrspatial/slope.py | |
| 118 | +| ... | ... | ... | |
| 119 | +
|
| 120 | +### Parity Matrix |
| 121 | +
|
| 122 | +#### <function_name> |
| 123 | +| Comparison | Input | Dtype | Max |Δ| | Max |Δ/ref| | NaN match | Metadata | Status | |
| 124 | +|-----------------------|-------------|---------|----------|------------|-----------|----------|--------| |
| 125 | +| NumPy vs Dask+NumPy | tiny clean | float32 | ... | ... | yes | ok | PASS | |
| 126 | +| NumPy vs Dask+NumPy | medium dirty| float64 | ... | ... | yes | ok | PASS | |
| 127 | +| NumPy vs CuPy | tiny clean | float32 | ... | ... | no (3) | ok | FAIL | |
| 128 | +| ... | ... | ... | ... | ... | ... | ... | ... | |
| 129 | +
|
| 130 | +### Failures |
| 131 | +For each FAIL row: |
| 132 | +- Which cells diverged |
| 133 | +- Whether divergence correlates with chunk boundaries (Dask) or specific |
| 134 | + input values (CuPy) |
| 135 | +- Likely root cause |
| 136 | +- Suggested fix |
| 137 | +
|
| 138 | +### Summary |
| 139 | +- Functions tested: N |
| 140 | +- Total comparisons: N |
| 141 | +- Passed: N |
| 142 | +- Failed: N |
| 143 | +- Skipped (no CUDA): N |
| 144 | +``` |
| 145 | + |
| 146 | +--- |
| 147 | + |
| 148 | +## General rules |
| 149 | + |
| 150 | +- Do not modify any source or test files. This command is read-only. |
| 151 | +- Use `create_test_raster` from `general_checks.py` for all raster construction. |
| 152 | +- Any temporary files must include the function name for uniqueness. |
| 153 | +- If CUDA is unavailable, skip CuPy and Dask+CuPy gracefully. Report them |
| 154 | + as SKIPPED, not FAIL. |
| 155 | +- If $ARGUMENTS includes "fix", still do not auto-fix. Report the issue and ask. |
| 156 | +- If a function is not in `ArrayTypeFunctionMapping` (e.g. it only has a numpy |
| 157 | + path), note it as "single-backend only" and skip parity checks for it. |
| 158 | +- If $ARGUMENTS includes a specific tolerance (e.g. `rtol=1e-3`), override the |
| 159 | + defaults in the threshold table. |
0 commit comments