Skip to content

Commit 54f299a

Browse files
authored
added additional management commands (#1032)
1 parent 080618b commit 54f299a

File tree

5 files changed

+962
-0
lines changed

5 files changed

+962
-0
lines changed

.claude/commands/backend-parity.md

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# Backend Parity: Cross-Backend Consistency Audit
2+
3+
Verify that all implemented backends produce consistent results for a given
4+
function or set of functions. The prompt is: $ARGUMENTS
5+
6+
---
7+
8+
## Step 1 -- Identify targets
9+
10+
1. If $ARGUMENTS names specific functions (e.g. `slope`, `aspect`), use those.
11+
2. If $ARGUMENTS names a category (e.g. `hydrology`, `surface`, `focal`), read
12+
`README.md` to find all functions in that category.
13+
3. If $ARGUMENTS is empty or says "all", scan the full feature matrix in `README.md`
14+
and test every function that claims support for 2+ backends.
15+
4. For each function, read its source file and find the `ArrayTypeFunctionMapping`
16+
call to determine which backends are actually implemented (not just what the
17+
README claims).
18+
19+
## Step 2 -- Build test inputs
20+
21+
For each target function, create test rasters at three scales:
22+
23+
| Name | Size | Purpose |
24+
|---------|---------|--------------------------------------------------|
25+
| tiny | 8x6 | Fast, easy to inspect cell-by-cell |
26+
| medium | 64x64 | Catches chunk-boundary artifacts in dask |
27+
| large | 256x256 | Stress test, exposes numerical accumulation drift |
28+
29+
For each size, generate two variants:
30+
- **Clean:** no NaN, realistic value range for the function
31+
(e.g. 0-5000m for elevation, 0-1 for NDVI inputs)
32+
- **Dirty:** 5-10% random NaN, some extreme values near dtype limits
33+
34+
Use `np.random.default_rng(42)` for reproducibility. For functions that require
35+
specific input structure (e.g. `flow_direction` needs a DEM with drainage, not
36+
random noise), use the project's `perlin` module or a synthetic cone/valley.
37+
38+
Also test with at least two dtypes: `float32` and `float64`.
39+
40+
## Step 3 -- Run every backend
41+
42+
For each function, input variant, and dtype:
43+
44+
1. **NumPy:** `create_test_raster(data, backend='numpy')` -- always the baseline.
45+
2. **Dask+NumPy:** test with two chunk configurations:
46+
- `chunks=(size//2, size//2)` -- even split
47+
- `chunks=(size//3, size//3)` -- ragged remainder
48+
3. **CuPy:** `create_test_raster(data, backend='cupy')` -- skip if CUDA unavailable.
49+
4. **Dask+CuPy:** `create_test_raster(data, backend='dask+cupy')` -- skip if CUDA
50+
unavailable.
51+
52+
If the function has parameter variants (e.g. `boundary`, `method`), test the
53+
default parameters first. If $ARGUMENTS includes "thorough", also sweep all
54+
parameter combinations.
55+
56+
## Step 4 -- Pairwise comparison
57+
58+
For every non-NumPy result, compare against the NumPy baseline. Extract data using
59+
the project conventions:
60+
- Dask: `.data.compute()`
61+
- CuPy: `.data.get()`
62+
- Dask+CuPy: `.data.compute().get()`
63+
64+
For each pair, compute and record:
65+
66+
### 4a. Value agreement
67+
```python
68+
abs_diff = np.abs(result - baseline)
69+
max_abs = np.nanmax(abs_diff)
70+
rel_diff = abs_diff / (np.abs(baseline) + 1e-30) # avoid div-by-zero
71+
max_rel = np.nanmax(rel_diff)
72+
mean_abs = np.nanmean(abs_diff)
73+
```
74+
75+
### 4b. NaN mask agreement
76+
```python
77+
nan_match = np.array_equal(np.isnan(result), np.isnan(baseline))
78+
nan_only_in_result = np.sum(np.isnan(result) & ~np.isnan(baseline))
79+
nan_only_in_baseline = np.sum(np.isnan(baseline) & ~np.isnan(result))
80+
```
81+
82+
### 4c. Metadata preservation
83+
Using `general_output_checks` from `general_checks.py`:
84+
- Output type matches input type (DataArray backed by the same array type)
85+
- Shape, dims, coords, attrs preserved
86+
87+
### 4d. Pass/fail thresholds
88+
89+
| Comparison | rtol | atol |
90+
|-----------------------|----------|----------|
91+
| NumPy vs Dask+NumPy | 1e-5 | 0 |
92+
| NumPy vs CuPy | 1e-6 | 1e-6 |
93+
| NumPy vs Dask+CuPy | 1e-6 | 1e-6 |
94+
95+
A comparison **fails** if `max_abs > atol` AND `max_rel > rtol`, or if NaN masks
96+
disagree.
97+
98+
## Step 5 -- Chunk boundary analysis
99+
100+
Dask backends are the most likely source of parity issues due to `map_overlap`
101+
boundary handling. For any Dask comparison that fails or is borderline:
102+
103+
1. Identify which cells diverge from the NumPy result.
104+
2. Map those cells to chunk boundaries (cells within `depth` pixels of a chunk edge).
105+
3. Report what percentage of divergent cells are at chunk boundaries vs interior.
106+
4. If all divergence is at boundaries, the issue is likely in the `map_overlap`
107+
`depth` or `boundary` parameter. Say so explicitly.
108+
109+
## Step 6 -- Generate the report
110+
111+
```
112+
## Backend Parity Report
113+
114+
### Functions tested
115+
| Function | Backends implemented | Source file |
116+
|---------------------|---------------------------|--------------------------|
117+
| slope | numpy, cupy, dask, dask+cupy | xrspatial/slope.py |
118+
| ... | ... | ... |
119+
120+
### Parity Matrix
121+
122+
#### <function_name>
123+
| Comparison | Input | Dtype | Max |Δ| | Max |Δ/ref| | NaN match | Metadata | Status |
124+
|-----------------------|-------------|---------|----------|------------|-----------|----------|--------|
125+
| NumPy vs Dask+NumPy | tiny clean | float32 | ... | ... | yes | ok | PASS |
126+
| NumPy vs Dask+NumPy | medium dirty| float64 | ... | ... | yes | ok | PASS |
127+
| NumPy vs CuPy | tiny clean | float32 | ... | ... | no (3) | ok | FAIL |
128+
| ... | ... | ... | ... | ... | ... | ... | ... |
129+
130+
### Failures
131+
For each FAIL row:
132+
- Which cells diverged
133+
- Whether divergence correlates with chunk boundaries (Dask) or specific
134+
input values (CuPy)
135+
- Likely root cause
136+
- Suggested fix
137+
138+
### Summary
139+
- Functions tested: N
140+
- Total comparisons: N
141+
- Passed: N
142+
- Failed: N
143+
- Skipped (no CUDA): N
144+
```
145+
146+
---
147+
148+
## General rules
149+
150+
- Do not modify any source or test files. This command is read-only.
151+
- Use `create_test_raster` from `general_checks.py` for all raster construction.
152+
- Any temporary files must include the function name for uniqueness.
153+
- If CUDA is unavailable, skip CuPy and Dask+CuPy gracefully. Report them
154+
as SKIPPED, not FAIL.
155+
- If $ARGUMENTS includes "fix", still do not auto-fix. Report the issue and ask.
156+
- If a function is not in `ArrayTypeFunctionMapping` (e.g. it only has a numpy
157+
path), note it as "single-backend only" and skip parity checks for it.
158+
- If $ARGUMENTS includes a specific tolerance (e.g. `rtol=1e-3`), override the
159+
defaults in the threshold table.

.claude/commands/bench.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# Bench: Local Performance Comparison
2+
3+
Run ASV benchmarks for the current branch against master and report regressions
4+
and improvements. The prompt is: $ARGUMENTS
5+
6+
---
7+
8+
## Step 1 -- Identify what changed
9+
10+
1. If $ARGUMENTS names specific benchmark classes or functions (e.g. `Slope`,
11+
`flow_accumulation`), use those directly.
12+
2. If $ARGUMENTS is empty or says "auto", run `git diff origin/master --name-only`
13+
to find changed source files under `xrspatial/`. Map each changed file to the
14+
corresponding benchmark module in `benchmarks/benchmarks/`. Use the filename
15+
and imports to match (e.g. changes to `slope.py` map to `benchmarks/benchmarks/slope.py`).
16+
3. If no benchmark exists for the changed code, note this in the report and
17+
suggest whether one should be added.
18+
19+
## Step 2 -- Check prerequisites
20+
21+
1. Verify ASV is installed: `python -c "import asv"`. If missing, tell the user
22+
to install it (`pip install asv`) and stop.
23+
2. Verify the benchmarks directory exists at `benchmarks/`.
24+
3. Read `benchmarks/asv.conf.json` to confirm the project name and branch settings.
25+
4. Check whether the ASV machine file exists (`.asv/machine.json`). If not, run
26+
`cd benchmarks && asv machine --yes` to initialize it.
27+
28+
## Step 3 -- Run the comparison
29+
30+
Run ASV in continuous-comparison mode from the `benchmarks/` directory:
31+
32+
```bash
33+
cd benchmarks && asv continuous origin/master HEAD -b "<regex>" -e
34+
```
35+
36+
Where `<regex>` is a pattern matching the benchmark classes identified in Step 1
37+
(e.g. `Slope|Aspect` or `FlowAccumulation`). The `-e` flag shows stderr on failure.
38+
39+
If $ARGUMENTS contains "quick", add `--quick` to run each benchmark only once
40+
(faster but noisier).
41+
42+
If $ARGUMENTS contains "full", omit the `-b` filter to run all benchmarks.
43+
44+
## Step 4 -- Parse and interpret results
45+
46+
ASV continuous outputs lines like:
47+
```
48+
BENCHMARKS NOT SIGNIFICANTLY CHANGED.
49+
```
50+
or:
51+
```
52+
REGRESSION: benchmarks.slope.Slope.time_numpy 3.45ms -> 5.67ms (1.64x)
53+
IMPROVED: benchmarks.slope.Slope.time_dask 8.12ms -> 4.23ms (0.52x)
54+
```
55+
56+
Parse the output and classify each result:
57+
58+
| Category | Criteria |
59+
|--------------|-----------------------------|
60+
| REGRESSION | Ratio > 1.2x (matches CI) |
61+
| IMPROVED | Ratio < 0.8x |
62+
| UNCHANGED | Between 0.8x and 1.2x |
63+
64+
## Step 5 -- Generate the report
65+
66+
```
67+
## Benchmark Report: <branch> vs master
68+
69+
### Changed files
70+
- <list of changed source files>
71+
72+
### Benchmarks run
73+
- <list of benchmark classes/functions matched>
74+
75+
### Results
76+
77+
| Benchmark | master | HEAD | Ratio | Status |
78+
|------------------------------------|-----------|-----------|-------|------------|
79+
| slope.Slope.time_numpy | 3.45 ms | 3.51 ms | 1.02x | UNCHANGED |
80+
| slope.Slope.time_dask_numpy | 8.12 ms | 4.23 ms | 0.52x | IMPROVED |
81+
| ... | ... | ... | ... | ... |
82+
83+
### Regressions
84+
<details for each regression: which benchmark, how much slower, likely cause>
85+
86+
### Improvements
87+
<details for each improvement>
88+
89+
### Missing benchmarks
90+
<list any changed functions that have no benchmark coverage>
91+
92+
### Recommendation
93+
- [ ] Safe to merge (no regressions)
94+
- [ ] Add "performance" label to PR (regressions found, CI will recheck)
95+
- [ ] Consider adding benchmarks for: <uncovered functions>
96+
```
97+
98+
## Step 6 -- Suggest benchmark additions (if gaps found)
99+
100+
If Step 1 found changed functions with no benchmark coverage:
101+
102+
1. Read an existing benchmark file in `benchmarks/benchmarks/` that covers a
103+
similar function (same category or same backend pattern).
104+
2. Describe what a new benchmark should test:
105+
- Which function and parameter variants
106+
- Suggested array sizes (match `common.py` conventions)
107+
- Which backends to benchmark (numpy at minimum, dask if applicable)
108+
3. Ask the user whether they want you to write the benchmark file.
109+
110+
Do NOT write benchmark files automatically. Report the gap and propose, then wait.
111+
112+
---
113+
114+
## General rules
115+
116+
- Always run benchmarks from the `benchmarks/` directory, not the project root.
117+
- The regression threshold is 1.2x, matching `.github/workflows/benchmarks.yml`.
118+
Do not change this unless $ARGUMENTS overrides it.
119+
- If ASV setup or machine detection fails, report the error clearly and suggest
120+
the fix. Do not retry in a loop.
121+
- If benchmarks take longer than 5 minutes per class, note the elapsed time so
122+
the user can plan accordingly.
123+
- Do not modify any source, test, or benchmark files. This command is read-only
124+
analysis (unless the user explicitly asks for a benchmark to be written in
125+
response to Step 6).
126+
- If $ARGUMENTS says "compare <branch1> <branch2>", run
127+
`asv continuous <branch1> <branch2>` instead of the default origin/master vs HEAD.

0 commit comments

Comments
 (0)