cai4cai
diff --git a/‎.github/workflows/python-package.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/python-package.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎JOSS_REVIEWER_RESPONSE.md‎
Lines changed: 80 additions & 0 deletions b/‎JOSS_REVIEWER_RESPONSE.md‎
Lines changed: 80 additions & 0 deletions
diff --git a/‎PR_DESCRIPTION.md‎
Lines changed: 84 additions & 0 deletions b/‎PR_DESCRIPTION.md‎
Lines changed: 84 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 21 additions & 2 deletions b/‎README.md‎
Lines changed: 21 additions & 2 deletions
diff --git a/‎RELEASE_NOTES_0.2.2.md‎
Lines changed: 111 additions & 0 deletions b/‎RELEASE_NOTES_0.2.2.md‎
Lines changed: 111 additions & 0 deletions
diff --git a/‎docs/source/conf.py‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/conf.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/contributing.rst‎
Lines changed: 3 additions & 0 deletions b/‎docs/source/contributing.rst‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/source/installation.rst‎
Lines changed: 20 additions & 0 deletions b/‎docs/source/installation.rst‎
Lines changed: 20 additions & 0 deletions
@@ -14,7 +14,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: ["3.10", "3.11", "3.12"]
-        torch-version: ["2.5.0", "2.9.0"]
+        torch-version: ["2.5.0", "2.11.0"]
         torch-type: ["stable"]
         include:
         - python-version: "3.12"
 
@@ -0,0 +1,80 @@
+# Draft JOSS Reviewer Response
+
+Thank you for flagging the test-suite issue. I agree that the previous state made the package harder to validate reproducibly, especially on CUDA-visible machines.
+
+The issue had two parts:
+
+1. The default pytest suite included CUDA memory/performance/OOM experiments that were useful during development but were not appropriate for normal CI or reviewer validation.
+2. Several stochastic tests used random tensors without consistent global seeding, and some numerical/statistical assertions were too sensitive to CUDA float32 precision differences. This made a few tests appear flaky depending on the environment and random draw.
+
+I have addressed this in a new release-readiness branch and prepared it for the `0.2.2` patch release.
+
+The default test suite now:
+
+- Uses deterministic global seeding for Python `random`, NumPy, PyTorch CPU RNG, and PyTorch CUDA RNGs.
+- Keeps normal CPU tests and lightweight CUDA functional tests enabled when CUDA is available.
+- Excludes CUDA memory/performance/OOM experiments unless explicitly requested.
+- Removes the previous `pytest.mark.flaky(reruns=...)` markers.
+- Uses shared dtype-aware tolerances for solver/backend tests.
+
+The manual CUDA tests are still preserved for package development and memory-advantage validation. They can now be run explicitly with:
+
+```bash
+python -m pytest --run-manual-cuda -m manual_cuda -s
+```
+
+For normal reviewer validation after the `0.2.2` release, the intended clean-environment commands are:
+
+```bash
+uv venv --python 3.12 --seed --managed-python
+pip install "torchsparsegradutils[all]==0.2.2"
+python -m pytest
+```
+
+I also fixed the packaging issue you observed:
+
+```text
+WARNING: torchsparsegradutils 0.2.1 does not provide the extra 'all'
+```
+
+In `0.2.2`, the `all` extra now lists concrete optional dependencies directly instead of self-referencing the package. The built wheel metadata has been checked and now includes:
+
+```text
+Provides-Extra: all
+Requires-Dist: cupy-cuda12x>=13.0; extra == "all"
+Requires-Dist: jax[cuda12]; extra == "all"
+```
+
+I also updated CI to test the current supported stable PyTorch range:
+
+- PyTorch `2.5.0`
+- PyTorch `2.11.0`
+- PyTorch nightly, allowed to fail
+
+The README badge has been updated accordingly to:
+
+```text
+Tested 2.5 / 2.11 / nightly
+```
+
+Local verification on a CUDA-visible workspace now passes:
+
+```text
+4289 passed, 822 skipped, 324 warnings
+```
+
+I also repeated the known flaky target sweep, including:
+
+- `test_sparse_batch_mv[batch_mv_test_data3]`
+- `test_bicgstab.py::test_bicgstab_2d_rhs`
+- representative linear CG, sparse solve, distribution sampling, CuPy, and JAX tests
+
+That targeted sweep passed:
+
+```text
+242 passed, 30 warnings
+```
+
+The remaining skipped tests in the default run are intentional: they are the manual CUDA memory/performance/OOM experiments or tests skipped by existing backend/device capability checks. These are now documented and opt-in rather than part of the normal CI/reviewer command.
+
+This should make the package reproducible for JOSS review while still preserving the GPU experiments I use to validate memory advantage over native PyTorch behavior.
@@ -0,0 +1,84 @@
+# Pull Request: JOSS Test Suite and 0.2.2 Release Readiness
+
+## Summary
+
+This PR prepares `torchsparsegradutils` for the JOSS review follow-up and the `0.2.2` patch release. It separates manual CUDA memory/performance experiments from the default pytest suite, makes stochastic tests deterministic, fixes the broken `all` extra, and updates CI/docs for the latest stable PyTorch target.
+
+The default test suite now keeps normal CUDA functional coverage enabled when CUDA is visible, while CUDA memory/performance/OOM-style experiments are preserved behind an explicit opt-in marker.
+
+## What Changed
+
+- Added deterministic global pytest seeding in `torchsparsegradutils/tests/conftest.py`.
+- Added seed opt-out environment variables:
+  - `TSGU_UNLOCK_SEED=true`
+  - legacy `UNLOCK_SEED=true`
+- Added `--run-manual-cuda` pytest option.
+- Added and registered `manual_cuda` marker for GPU memory/performance/OOM experiments.
+- Added `torchsparsegradutils/tests/test_config.py` for shared device/dtype/layout constants, dtype-aware tolerances, confidence thresholds, and device comparison helpers.
+- Removed `@pytest.mark.flaky(reruns=...)` markers after making tests deterministic.
+- Marked all of `test_integration_pairwise_sparse_mvn.py` as `manual_cuda`.
+- Marked `test_sparse_mm_memory_advantage` and `test_sparse_mm_memory_stability` as `manual_cuda`.
+- Kept lightweight CUDA functional tests in the default suite when CUDA is available.
+- Updated CI stable PyTorch matrix from `2.5.0` / `2.9.0` to `2.5.0` / `2.11.0`, with nightly still allowed to fail.
+- Updated README badge text to `Tested 2.5 / 2.11 / nightly`.
+- Bumped package/docs version from `0.2.1` to `0.2.2`.
+- Fixed the `all` extra so it expands to concrete optional dependencies instead of self-referencing `torchsparsegradutils[cupy,jax]`.
+- Added reviewer-oriented install/test commands to README and docs.
+
+## Context Since `v0.2.1`
+
+This branch builds on the post-`v0.2.1` mainline work:
+
+- JOSS paper and documentation revisions.
+- More robust PyTorch version comparison using `packaging.version`.
+- Sparse matmul and sparse triangular solve reshape optimizations.
+- `sparse_eye` optimization avoiding unnecessary `.coalesce()`.
+- CuPy binding and sparse solve updates from the JOSS revision work.
+
+## Reviewer / User-Facing Commands
+
+After the `0.2.2` release:
+
+```bash
+uv venv --python 3.12 --seed --managed-python
+pip install "torchsparsegradutils[all]==0.2.2"
+python -m pytest
+```
+
+Manual CUDA memory/performance experiments:
+
+```bash
+python -m pytest --run-manual-cuda -m manual_cuda -s
+```
+
+## Verification
+
+Run in a CUDA-visible workspace:
+
+```bash
+black --check .
+isort --check-only --diff .
+flake8 . --count --show-source --statistics
+python -m pytest -q --ignore=torchsparsegradutils/tests/test_doctests.py
+python -m pytest -q -m manual_cuda --collect-only
+python -m build --wheel
+```
+
+Results:
+
+- `black --check .`: passed
+- `isort --check-only --diff .`: passed
+- `flake8 . --count --show-source --statistics`: passed
+- Default pytest excluding doctests: `4289 passed, 822 skipped, 324 warnings`
+- Known flaky target sweep: `242 passed, 30 warnings`
+- Manual CUDA collection: `676/5128 tests collected`, `4452 deselected`
+- Wheel build: passed
+- Wheel metadata confirmed:
+  - `Version: 0.2.2`
+  - `Provides-Extra: all`
+  - `Requires-Dist: cupy-cuda12x>=13.0; extra == "all"`
+  - `Requires-Dist: jax[cuda12]; extra == "all"`
+
+## Notes
+
+The full manual CUDA suite is intentionally not part of default CI. It remains available for local validation of memory advantage, performance, and OOM behavior on suitable GPU hardware.
@@ -1,6 +1,6 @@
 # torchsparsegradutils: Sparsity-preserving gradient utility tools for PyTorch
 
-[![PyPI](https://img.shields.io/pypi/v/torchsparsegradutils.svg)](https://pypi.org/project/torchsparsegradutils/) [![Python Versions](https://img.shields.io/pypi/pyversions/torchsparsegradutils.svg)](https://pypi.org/project/torchsparsegradutils/) [![Downloads](https://img.shields.io/pypi/dm/torchsparsegradutils.svg)](https://pypi.org/project/torchsparsegradutils/) ![PyTorch 2.5+](https://img.shields.io/badge/PyTorch-2.5%2B-ee4c2c?logo=pytorch) ![Tested 2.5 / 2.9 / nightly](https://img.shields.io/badge/Tested-2.5%20|%202.9%20|%20nightly-ee4c2c?logo=pytorch) [![Build](https://github.com/cai4cai/torchsparsegradutils/actions/workflows/python-package.yml/badge.svg)](https://github.com/cai4cai/torchsparsegradutils/actions/workflows/python-package.yml) [![Docs](https://readthedocs.org/projects/torchsparsegradutils/badge/?version=latest)](https://readthedocs.org/projects/torchsparsegradutils) [![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![License](https://img.shields.io/github/license/cai4cai/torchsparsegradutils)](LICENSE) [![status](https://joss.theoj.org/papers/6da0e92488d06f70c0a03d0a7cbfba7d/status.svg)](https://joss.theoj.org/papers/6da0e92488d06f70c0a03d0a7cbfba7d)
+[![PyPI](https://img.shields.io/pypi/v/torchsparsegradutils.svg)](https://pypi.org/project/torchsparsegradutils/) [![Python Versions](https://img.shields.io/pypi/pyversions/torchsparsegradutils.svg)](https://pypi.org/project/torchsparsegradutils/) [![Downloads](https://img.shields.io/pypi/dm/torchsparsegradutils.svg)](https://pypi.org/project/torchsparsegradutils/) ![PyTorch 2.5+](https://img.shields.io/badge/PyTorch-2.5%2B-ee4c2c?logo=pytorch) ![Tested 2.5 / 2.11 / nightly](https://img.shields.io/badge/Tested-2.5%20|%202.11%20|%20nightly-ee4c2c?logo=pytorch) [![Build](https://github.com/cai4cai/torchsparsegradutils/actions/workflows/python-package.yml/badge.svg)](https://github.com/cai4cai/torchsparsegradutils/actions/workflows/python-package.yml) [![Docs](https://readthedocs.org/projects/torchsparsegradutils/badge/?version=latest)](https://readthedocs.org/projects/torchsparsegradutils) [![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![License](https://img.shields.io/github/license/cai4cai/torchsparsegradutils)](LICENSE) [![status](https://joss.theoj.org/papers/6da0e92488d06f70c0a03d0a7cbfba7d/status.svg)](https://joss.theoj.org/papers/6da0e92488d06f70c0a03d0a7cbfba7d)
 
 A comprehensive collection of utility functions to work with PyTorch sparse tensors, ensuring memory efficiency and supporting various sparsity-preserving tensor operations with automatic differentiation. This package addresses fundamental gaps in PyTorch's sparse tensor ecosystem, providing essential operations that preserve sparsity in gradients during backpropagation.
 
@@ -106,6 +106,22 @@ pip install scipy matplotlib pandas tqdm pytest
 
 > **Note:** The CuPy extra installs `cupy-cuda12x>=13.0`. If you are using a different CUDA version, install the appropriate CuPy package manually (e.g. `pip install cupy-cuda11x`).
 
+### Reviewer Test Environment
+
+For a clean Python 3.12 environment with all optional dependencies after the 0.2.2 release:
+
+```bash
+uv venv --python 3.12 --seed --managed-python
+pip install "torchsparsegradutils[all]==0.2.2"
+python -m pytest
+```
+
+The default pytest suite includes CPU tests and lightweight CUDA functional tests when CUDA is available. CUDA memory, performance, and OOM experiments are preserved for manual validation and can be run explicitly:
+
+```bash
+python -m pytest --run-manual-cuda -m manual_cuda -s
+```
+
 ### Requirements
 
 - **Python**: ≥ 3.10
@@ -396,6 +412,9 @@ python -m pytest torchsparsegradutils/tests/test_distributions.py
 
 # Run with coverage
 python -m pytest --cov=torchsparsegradutils
+
+# Run CUDA memory/performance experiments manually
+python -m pytest --run-manual-cuda -m manual_cuda -s
 ```
 
 ### Running Benchmarks
@@ -662,4 +681,4 @@ dist_stable = SparseMultivariateNormal(
 - **No SPD Constraints**: Doesn't require strict positive definiteness
 - **Better Conditioning**: Diagonal component can be controlled independently
 
-**Status**: This is a known limitation of the LL^T precision formulation. LDL^T parameterization is the recommended approach for precision matrices.
+**Status**: This is a known limitation of the LL^T precision formulation. LDL^T parameterization is the recommended approach for precision matrices.
@@ -0,0 +1,111 @@
+# Release Notes: torchsparsegradutils 0.2.2
+
+Patch release focused on JOSS review readiness, deterministic testing, CUDA test separation, packaging metadata, and refreshed PyTorch compatibility validation.
+
+## Highlights
+
+- Default pytest is now suitable for reviewers and CI on CPU or CUDA-visible machines.
+- CUDA memory/performance/OOM experiments are preserved but moved behind an explicit manual marker.
+- Stochastic tests are deterministic by default.
+- The `all` optional dependency extra is fixed.
+- CI now tests PyTorch `2.5.0`, `2.11.0`, and nightly.
+- README badge updated to `Tested 2.5 / 2.11 / nightly`.
+
+## Testing and CI
+
+- Added global deterministic test seeding for:
+  - Python `random`
+  - NumPy
+  - PyTorch CPU RNG
+  - PyTorch CUDA RNGs
+- Added seed opt-out support:
+  - `TSGU_UNLOCK_SEED=true`
+  - `UNLOCK_SEED=true`
+- Added `--run-manual-cuda` pytest option.
+- Added `manual_cuda` marker for CUDA memory/performance/OOM experiments.
+- Added shared test constants and dtype-aware tolerances in `torchsparsegradutils/tests/test_config.py`.
+- Removed flaky rerun markers after stabilizing seeding and tolerance behavior.
+- Marked pairwise sparse MVN integration tests as manual CUDA.
+- Marked sparse matrix multiplication memory advantage and memory stability tests as manual CUDA.
+- Kept small CUDA functional tests in the default suite when CUDA is available.
+- Updated GitHub Actions stable PyTorch matrix:
+  - `2.5.0`
+  - `2.11.0`
+- Kept nightly CPU CI on Python `3.12` with `continue-on-error`.
+
+## Packaging
+
+- Bumped version from `0.2.1` to `0.2.2`.
+- Fixed `torchsparsegradutils[all]`.
+- The `all` extra now expands directly to:
+  - `cupy-cuda12x>=13.0`
+  - `jax[cuda12]`
+- Wheel metadata now includes `Provides-Extra: all`.
+
+## Documentation
+
+- Added reviewer-oriented installation and test commands:
+
+```bash
+uv venv --python 3.12 --seed --managed-python
+pip install "torchsparsegradutils[all]==0.2.2"
+python -m pytest
+```
+
+- Documented manual CUDA validation command:
+
+```bash
+python -m pytest --run-manual-cuda -m manual_cuda -s
+```
+
+- Updated README badge to `Tested 2.5 / 2.11 / nightly`.
+- Updated docs version metadata to `0.2.2`.
+
+## Changes Since `v0.2.1`
+
+In addition to this release-readiness work, the current release includes the post-`v0.2.1` mainline changes:
+
+- JOSS paper revisions and rebuilt paper artifacts.
+- Installation and benchmark documentation updates from reviewer feedback.
+- More robust PyTorch version comparison using `packaging.version`.
+- Sparse matrix multiplication reshape optimization.
+- Sparse triangular solve reshape optimization.
+- `sparse_eye` optimization using the `is_coalesced` flag instead of forcing `.coalesce()`.
+- CuPy binding and sparse solve updates from JOSS review work.
+
+## Verification
+
+Verified in a CUDA-visible workspace:
+
+```bash
+black --check .
+isort --check-only --diff .
+flake8 . --count --show-source --statistics
+python -m pytest -q --ignore=torchsparsegradutils/tests/test_doctests.py
+python -m pytest -q -m manual_cuda --collect-only
+python -m build --wheel
+```
+
+Observed results:
+
+- Default pytest excluding doctests: `4289 passed, 822 skipped, 324 warnings`
+- Known flaky target sweep: `242 passed, 30 warnings`
+- Manual CUDA collection: `676/5128 tests collected`, `4452 deselected`
+- Wheel build: passed
+- Wheel metadata includes `Provides-Extra: all`
+
+## Upgrade Notes
+
+After publication:
+
+```bash
+pip install --upgrade "torchsparsegradutils[all]==0.2.2"
+```
+
+Users who do not need CuPy/JAX support can continue installing the base package:
+
+```bash
+pip install --upgrade torchsparsegradutils==0.2.2
+```
+
+The CuPy extra currently targets CUDA 12 via `cupy-cuda12x`. Users on a different CUDA runtime should install the appropriate CuPy package manually.
@@ -15,8 +15,8 @@
 project = "torchsparsegradutils"
 copyright = "2025, CAI4CAI research group"
 author = "CAI4CAI research group"
-release = "0.2.1"
-version = "0.2.1"
+release = "0.2.2"
+version = "0.2.2"
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
 
@@ -144,6 +144,9 @@ We use pytest for testing. Tests are organized by module:
    # Run with coverage
    pytest --cov=torchsparsegradutils
 
+   # Run CUDA memory/performance experiments manually
+   pytest --run-manual-cuda -m manual_cuda -s
+
 **Writing Tests**
 
 Follow these guidelines:
 
@@ -34,6 +34,26 @@ For additional functionality, you can install optional dependencies:
    CUDA version, install the appropriate CuPy package manually
    (e.g. ``pip install cupy-cuda11x``).
 
+Reviewer Test Environment
+-------------------------
+
+For a clean Python 3.12 environment with all optional dependencies after the
+0.2.2 release:
+
+.. code-block:: bash
+
+   uv venv --python 3.12 --seed --managed-python
+   pip install "torchsparsegradutils[all]==0.2.2"
+   python -m pytest
+
+The default pytest suite includes CPU tests and lightweight CUDA functional
+tests when CUDA is available. CUDA memory, performance, and OOM experiments are
+preserved for manual validation and can be run explicitly:
+
+.. code-block:: bash
+
+   python -m pytest --run-manual-cuda -m manual_cuda -s
+
 Requirements
 ------------