Add SparseMatrixCSR type alias for CSR storage format

Sébastien Loisel · Sébastien Loisel · commit 63908f9798f3 · 2025-12-17T19:03:35.000+01:00
- Add SparseMatrixCSR{Tv,Ti} as alias for Transpose{Tv, SparseMatrixCSC{Tv,Ti}}
- Add SparseMatrixCSR(::SparseMatrixCSC) constructor for CSC→CSR conversion
- Add SparseMatrixCSC(::SparseMatrixCSR) constructor for CSR→CSC conversion
- Update SparseMatrixMPI to use SparseMatrixCSR{T,Int} for the A field
- Update SparseMatrixMPI_local signature to use SparseMatrixCSR
- Optimize triplet-to-CSR construction by building M^T directly (swap I↔J)
- Add comprehensive documentation explaining the dual life of Transpose{SparseMatrixCSC}
- Update CLAUDE.md, api.md, and getting-started.md with CSR explanations
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -40,12 +40,23 @@ Many operations in this module are collective and should not be run on a subset
 
 ### Core Data Structures
 
+**SparseMatrixCSR{T,Ti}** (Type Alias)
+- `SparseMatrixCSR{T,Ti} = Transpose{T, SparseMatrixCSC{T,Ti}}` - type alias for CSR storage
+- In Julia, `Transpose{SparseMatrixCSC}` has a **dual life**:
+  - **Semantic view**: A lazy transpose of a CSC matrix (what `transpose(A)` returns)
+  - **Storage view**: Row-major (CSR) access to sparse data
+- Use `SparseMatrixCSR` when intent is CSR storage, use `transpose(A)` for mathematical transpose
+- `SparseMatrixCSR(A::SparseMatrixCSC)` converts CSC to CSR representing the **same** matrix
+- `SparseMatrixCSC(A::SparseMatrixCSR)` converts CSR to CSC representing the **same** matrix
+- For `B = SparseMatrixCSR(A)`, `B[i,j] == A[i,j]` (same matrix, different storage)
+
 **SparseMatrixMPI{T}**
 - Rows are partitioned across MPI ranks
-- `A::Transpose{T,SparseMatrixCSC{T,Int}}`: Local rows wrapped in a Transpose for type clarity
+- `A::SparseMatrixCSR{T,Int}`: Local rows in CSR format for efficient row-wise iteration
   - `A.parent` is the underlying CSC storage with shape `(length(col_indices), local_nrows)`
-  - Columns in `A.parent` correspond to local rows; this layout enables efficient row-wise iteration
-  - Storage is **compressed**: `A.parent.rowval` uses local column indices (1:length(col_indices)), not global
+  - `A.parent.colptr` acts as row pointers for the CSR format
+  - `A.parent.rowval` contains LOCAL column indices (1:length(col_indices)), not global
+  - Storage is **compressed** to avoid hypersparse issues
 - `row_partition`: Array of size `nranks + 1` defining which rows each rank owns (1-indexed boundaries)
 - `col_partition`: Array of size `nranks + 1` defining column partition (used for transpose operations)
 - `col_indices`: Sorted global column indices that appear in the local part (local→global mapping)
@@ -100,11 +111,19 @@ x = F \ b
 
 For efficient construction when data is already distributed:
 - `VectorMPI_local(v_local)`: Create from local vector portion
-- `SparseMatrixMPI_local(transpose(AT_local))`: Create from local rows
+- `SparseMatrixMPI_local(SparseMatrixCSR(local_csc))`: Create from local rows in CSR format
+- `SparseMatrixMPI_local(transpose(AT_local))`: Alternative using explicit transpose wrapper
 - `MatrixMPI_local(A_local)`: Create from local dense rows
 
 These infer the global partition via MPI.Allgather of local sizes.
 
+When building from triplets (I, J, V), the efficient pattern is to build M^T directly as CSC by swapping indices, then wrap in lazy transpose for CSR:
+```julia
+AT_local = sparse(local_J, local_I, local_V, ncols, local_nrows)  # M^T as CSC
+SparseMatrixMPI_local(transpose(AT_local))  # M in CSR format
+```
+This avoids an unnecessary physical transpose operation.
+
 ### Matrix Multiplication Flow
 
 1. **Plan creation** (`MatrixPlan` constructor): Uses `Alltoall` and `Alltoallv` to exchange row requests and sparse structure (colptr, rowval)
diff --git a/docs/src/api.md b/docs/src/api.md
@@ -22,6 +22,52 @@ MatrixMPI
 VectorMPI
 ```
 
+### SparseMatrixCSR
+
+```@docs
+SparseMatrixCSR
+```
+
+## CSR Storage Format
+
+LinearAlgebraMPI uses CSR (Compressed Sparse Row) format internally for `SparseMatrixMPI` because row-partitioned distributed matrices need efficient row-wise access.
+
+### The Dual Life of Transpose{SparseMatrixCSC}
+
+In Julia, the type `Transpose{T, SparseMatrixCSC{T,Int}}` has two interpretations:
+
+1. **Semantic**: A lazy transpose of a CSC matrix (what you get from `transpose(A)`)
+2. **Storage**: Row-major (CSR) access to sparse data
+
+This duality can be confusing. When you call `transpose(A)` on a SparseMatrixCSC, you get a wrapper that represents A^T. But the same wrapper type, when used for storage, provides efficient row iteration.
+
+### The SparseMatrixCSR Type Alias
+
+To clarify intent, LinearAlgebraMPI exports:
+
+```julia
+const SparseMatrixCSR{Tv,Ti} = Transpose{Tv, SparseMatrixCSC{Tv,Ti}}
+```
+
+Use `SparseMatrixCSR` when you want row-major storage, and `transpose(A)` when you want the mathematical transpose.
+
+### Converting Between CSC and CSR
+
+```julia
+# CSC to CSR (same matrix, different storage)
+A_csc = sparse([1,2,2], [1,1,2], [1.0, 2.0, 3.0], 2, 2)
+A_csr = SparseMatrixCSR(A_csc)
+A_csr[1,1] == A_csc[1,1]  # true
+
+# CSR to CSC
+A_back = SparseMatrixCSC(A_csr)
+A_back == A_csc  # true
+```
+
+### Why CSR for Distributed Matrices?
+
+`SparseMatrixMPI` partitions matrices by rows across MPI ranks. Each rank needs to efficiently iterate over its local rows for operations like matrix-vector multiplication. CSR format provides O(1) access to each row's nonzeros, while CSC would require scanning the entire column pointer array.
+
 ## Sparse Matrix Operations
 
 ### Arithmetic
diff --git a/docs/src/getting-started.md b/docs/src/getting-started.md
@@ -54,6 +54,16 @@ The matrix is partitioned roughly equally by rows. For example, with 4 ranks and
 - Rank 2: rows 51-75
 - Rank 3: rows 76-100
 
+### Internal Storage: CSR Format
+
+Internally, each rank stores its local rows in CSR (Compressed Sparse Row) format using the `SparseMatrixCSR` type. This enables efficient row-wise iteration, which is essential for a row-partitioned distributed matrix.
+
+In Julia, `SparseMatrixCSR{T,Ti}` is a type alias for `Transpose{T, SparseMatrixCSC{T,Ti}}`. This type has a dual interpretation:
+- **Semantic view**: A lazy transpose of a CSC matrix
+- **Storage view**: Row-major (CSR) access to the data
+
+You don't need to worry about this for normal usage - it's handled automatically. But if you're accessing the internal storage (e.g., `A.A.parent`), be aware that it stores the transposed data in CSC format, which gives CSR access through the wrapper.
+
 ### Efficient Local-Only Construction
 
 For large matrices, you can avoid replicating data across all ranks by only populating each rank's local portion:
diff --git a/src/LinearAlgebraMPI.jl b/src/LinearAlgebraMPI.jl
@@ -9,6 +9,7 @@ import LinearAlgebra
 import LinearAlgebra: tr, diag, triu, tril, Transpose, Adjoint, norm, opnorm, mul!, ldlt, BLAS, issymmetric, UniformScaling, dot
 
 export SparseMatrixMPI, MatrixMPI, VectorMPI, clear_plan_cache!, uniform_partition
+export SparseMatrixCSR  # Type alias for Transpose{SparseMatrixCSC} (CSR storage format)
 export ⊛  # Multithreaded sparse matrix multiplication
 export VectorMPI_local, MatrixMPI_local, SparseMatrixMPI_local  # Local constructors
 export mean  # Our mean function for SparseMatrixMPI and VectorMPI
@@ -21,6 +22,88 @@ export solve, solve!, finalize!
 const Blake3Hash = NTuple{32,UInt8}
 const OptionalBlake3Hash = Union{Nothing, Blake3Hash}
 
+# ============================================================================
+# SparseMatrixCSR Type Alias and Constructors
+# ============================================================================
+
+"""
+    SparseMatrixCSR{Tv,Ti} = Transpose{Tv, SparseMatrixCSC{Tv,Ti}}
+
+Type alias for CSR (Compressed Sparse Row) storage format.
+
+## The Dual Life of Transpose{SparseMatrixCSC}
+
+In Julia, the type `Transpose{Tv, SparseMatrixCSC{Tv,Ti}}` has two interpretations:
+
+1. **Semantic interpretation**: A lazy transpose wrapper around a CSC matrix.
+   When you call `transpose(A)` on a SparseMatrixCSC, you get this wrapper that
+   represents A^T without copying data.
+
+2. **Storage interpretation**: CSR (row-major) access to sparse data.
+   The underlying CSC stores columns contiguously, but through the transpose wrapper,
+   we can iterate efficiently over rows instead of columns.
+
+This alias clarifies intent: use `SparseMatrixCSR` when you want row-major storage
+semantics, and `transpose(A)` when you want the mathematical transpose.
+
+## CSR vs CSC Storage
+
+- **CSC (Compressed Sparse Column)**: Julia's native sparse format. Efficient for
+  column-wise operations, matrix-vector products with column access.
+- **CSR (Compressed Sparse Row)**: Efficient for row-wise operations, matrix-vector
+  products with row access, and row-partitioned distributed matrices.
+
+For `SparseMatrixCSR`, the underlying `parent::SparseMatrixCSC` stores the *transposed*
+matrix. If `B = SparseMatrixCSR(A)` represents matrix M, then `B.parent` is a CSC
+storing M^T. This means:
+- `B.parent.colptr` acts as row pointers for M
+- `B.parent.rowval` contains column indices for M
+- `B.parent.nzval` contains values in row-major order
+
+## Usage Note
+
+Julia will still display this type as `Transpose{Float64, SparseMatrixCSC{...}}`,
+not as `SparseMatrixCSR`. The alias improves code clarity but doesn't affect
+type printing.
+"""
+const SparseMatrixCSR{Tv,Ti} = Transpose{Tv, SparseMatrixCSC{Tv,Ti}}
+
+"""
+    SparseMatrixCSR(A::SparseMatrixCSC{Tv,Ti}) where {Tv,Ti}
+
+Convert a CSC matrix to CSR format representing the **same** matrix.
+
+If A represents matrix M in CSC format, the result represents M in CSR format.
+Element access is unchanged: `B[i,j] == A[i,j]`.
+
+Internally, this:
+1. Materializes A^T as CSC (physical transpose)
+2. Wraps in lazy transpose to get M back, but with row-major storage
+
+# Example
+```julia
+A_csc = sparse([1,2,2], [1,1,2], [1.0, 2.0, 3.0], 2, 2)
+A_csr = SparseMatrixCSR(A_csc)  # Same matrix, CSR storage
+A_csr[1,1] == A_csc[1,1]        # true - same elements
+```
+"""
+function SparseMatrixCSR(A::SparseMatrixCSC{Tv,Ti}) where {Tv,Ti}
+    return transpose(SparseMatrixCSC(transpose(A)))
+end
+
+"""
+    SparseMatrixCSC(A::SparseMatrixCSR{Tv,Ti}) where {Tv,Ti}
+
+Convert a CSR matrix to CSC format representing the **same** matrix.
+
+This physically transposes the underlying storage to produce a CSC matrix.
+Element access is unchanged: the result represents the same matrix as the input.
+"""
+function SparseArrays.SparseMatrixCSC(A::SparseMatrixCSR{Tv,Ti}) where {Tv,Ti}
+    # Use sparse() to avoid dispatching back to our method
+    return sparse(transpose(A.parent))
+end
+
 # Cache for memoized MatrixPlans
 # Key: (A_hash, B_hash, T) - use full 256-bit hashes
 const _plan_cache = Dict{Tuple{Blake3Hash,Blake3Hash,DataType},Any}()
diff --git a/src/blocks.jl b/src/blocks.jl
@@ -136,13 +136,11 @@ function Base.cat(As::SparseMatrixMPI{T}...; dims) where T
         end
     end
 
-    # Step 3: Build local sparse matrix
-    if isempty(local_I)
-        AT_local = SparseMatrixCSC(total_cols, local_nrows, ones(Int, local_nrows + 1), Int[], T[])
-    else
-        local_sparse = sparse(local_I, local_J, local_V, local_nrows, total_cols)
-        AT_local = sparse(transpose(local_sparse))
-    end
+    # Step 3: Build M^T directly as CSC (swap I↔J), then wrap in lazy transpose for CSR
+    # This avoids an unnecessary physical transpose operation
+    AT_local = isempty(local_I) ?
+        SparseMatrixCSC(total_cols, local_nrows, ones(Int, local_nrows + 1), Int[], T[]) :
+        sparse(local_J, local_I, local_V, total_cols, local_nrows)
 
     return SparseMatrixMPI_local(transpose(AT_local); comm=comm)
 end
@@ -509,13 +507,11 @@ function blockdiag(As::SparseMatrixMPI{T}...) where T
         end
     end
 
-    # Step 4: Build local sparse matrix
-    if isempty(local_I)
-        AT_local = SparseMatrixCSC(total_cols, local_nrows, ones(Int, local_nrows + 1), Int[], T[])
-    else
-        local_sparse = sparse(local_I, local_J, local_V, local_nrows, total_cols)
-        AT_local = sparse(transpose(local_sparse))
-    end
+    # Step 4: Build M^T directly as CSC (swap I↔J), then wrap in lazy transpose for CSR
+    # This avoids an unnecessary physical transpose operation
+    AT_local = isempty(local_I) ?
+        SparseMatrixCSC(total_cols, local_nrows, ones(Int, local_nrows + 1), Int[], T[]) :
+        sparse(local_J, local_I, local_V, total_cols, local_nrows)
 
     return SparseMatrixMPI_local(transpose(AT_local); comm=comm)
 end
diff --git a/src/sparse.jl b/src/sparse.jl