Skip to content

LinearSolve of Block-Tridiagonal matrix is slower than Sparse Arrays #225

@rsenne

Description

@rsenne

So basically I need to do a linear solve on a block-tridiagonal matrix. I figured I would compare against SparseArrays and a custom routine I just wrote. Here is a MWRE.

using Random
using LinearAlgebra
using BlockBandedMatrices
using SparseArrays
using BlockArrays: getblock
using BenchmarkTools

# Parameters
k = 2       # block size
n = 5       # number of blocks

# Create SPD blocks
function make_spd_block(k)
    A = randn(k, k)
    return A'A + k * I
end

Bs = [make_spd_block(k) for _ in 1:n]         # diagonal blocks
Cs = [randn(k, k) for _ in 1:(n-1)]           # off-diagonal blocks

# Build full dense matrix for comparison
function build_full_block_tridiag(Bs, Cs)
    k, n = size(Bs[1], 1), length(Bs)
    A = zeros(k*n, k*n)
    for i in 1:n
        A[(k*(i-1)+1):(k*i), (k*(i-1)+1):(k*i)] .= Bs[i]
        if i < n
            A[(k*(i-1)+1):(k*i), (k*i+1):(k*(i+1))] .= Cs[i]'
            A[(k*i+1):(k*(i+1)), (k*(i-1)+1):(k*i)] .= Cs[i]
        end
    end
    return Symmetric(A)
end

A_dense = build_full_block_tridiag(Bs, Cs)
b = randn(k * n)

# Method 1: Custom block Cholesky solver
function custom_block_cholesky(Bs, Cs, b)
    n = length(Bs)
    k = size(Bs[1], 1)
    Ls = Vector{Matrix{Float64}}(undef, n)
    Loffs = Vector{Matrix{Float64}}(undef, n - 1)

    # Cholesky factorization
    Ls[1] = cholesky(Bs[1]).L
    for i in 2:n
        Loffs[i-1] = Cs[i-1] * inv(Ls[i-1]')
        S = Bs[i] - Loffs[i-1] * Loffs[i-1]'
        Ls[i] = cholesky(S).L
    end

    # Forward substitution
    y = zeros(k * n)
    for i in 1:n
        idx = (k*(i-1)+1):(k*i)
        y[idx] .= b[idx]
        if i > 1
            prev_idx = (k*(i-2)+1):(k*(i-1))
            y[idx] .-= Loffs[i-1] * y[prev_idx]
        end
        y[idx] .= Ls[i] \ y[idx]
    end

    # Backward substitution
    x = zeros(k * n)
    for i in n:-1:1
        idx = (k*(i-1)+1):(k*i)
        x[idx] .= y[idx]
        if i < n
            next_idx = (k*i+1):(k*(i+1))
            x[idx] .-= Loffs[i]' * x[next_idx]
        end
        x[idx] .= Ls[i]' \ x[idx]
    end

    return x
end

x_custom = custom_block_cholesky(Bs, Cs, b)

# Method 2: BlockBandedMatrix solve
block_sizes = fill(k, n)
A_bbm = BlockBandedMatrix{Float64}(undef, block_sizes, block_sizes, (1, 1))

for i in 1:n
    getblock(A_bbm, i, i) .= Bs[i]
end

for i in 1:(n-1)
    getblock(A_bbm, i+1, i) .= Cs[i]
    getblock(A_bbm, i, i+1) .= Cs[i]'
end

x_bbm = A_bbm \ b

# Method 3: Sparse matrix solve
A_sparse = sparse(A_dense)
x_sparse = A_sparse \ b

When doing this I found all methods produced the same results to machine precision. I also found my custom method was the fastest. here were the results i got using the @btime macro.

Mine: 4.343 μs (178 allocations: 8.08 KiB)
BlockBandedMatrices: 23.400 μs (394 allocations: 412.88 KiB)
SparseArrays: 9.200 μs (60 allocations: 7.42 KiB)

Obviously I'm not using any specific matrix types to do this but just using the blocks I know exist inside vectors and then assuming symmetry. But I was curious if potentially this could be a viable improvement over the current linear solving for this specific matrix form.

Edit

Also I realize the use of Cholesky here is very brittle and I'm making some idealized assumptions. I assume someone with more numerical Linear Algebra experience could probably make this better :)

Edit 2

I discovered a more stable LU block factorization here is the code.

	function custom_block_lu(As, Bs, Cs, b)
	
	    n = length(As)  # number of block rows
	    k = size(As[1], 1)  # block size
	    
	    # Storage for LU factors
	    Us = Vector{Matrix{Float64}}(undef, n)      # Upper diagonal blocks
	    Ls = Vector{Matrix{Float64}}(undef, n-1)    # Lower off-diagonal blocks
	    
	    # LU Factorization Phase
	    # =====================
	    
	    # First block: U₁ = A₁
	    Us[1] = copy(As[1])
	    
	    for i = 1:n-1
	        # Compute lower off-diagonal: Lᵢ = Bᵢ * Uᵢ⁻¹
	        Ls[i] = Bs[i] / Us[i]
	        
	        # Update next diagonal block: Uᵢ₊₁ = Aᵢ₊₁ - Lᵢ * Cᵢ
	        Us[i+1] = As[i+1] - Ls[i] * Cs[i]
	    end
	    
	    # Forward Substitution Phase (Solve Ly = b)
	    # ========================================
	    
	    y = zeros(k * n)
	    
	    for i = 1:n
	        idx = (k*(i-1)+1):(k*i)
	        y[idx] = b[idx]
	        
	        # Subtract contribution from previous block
	        if i > 1
	            prev_idx = (k*(i-2)+1):(k*(i-1))
	            y[idx] -= Ls[i-1] * y[prev_idx]
	        end
	    end
	    
	    # Backward Substitution Phase (Solve Ux = y)
	    # ==========================================
	    
	    x = zeros(k * n)
	    
	    for i = n:-1:1
	        idx = (k*(i-1)+1):(k*i)
	        x[idx] = y[idx]
	        
	        # Subtract contribution from next block
	        if i < n
	            next_idx = (k*i+1):(k*(i+1))
	            x[idx] -= Cs[i] * x[next_idx]
	        end
	        
	        # Solve with diagonal block
	        x[idx] = Us[i] \ x[idx]
	    end
	    
	    return x
	end

For a block-tridiagonal system with 50 blocks of dimension 2 here is the relative performance:

Cholesky: Fails (Not PD)
BBM: 829.400 μs (4445 allocations: 4.93 MiB)
Sparse: 282.000 μs (147 allocations: 223.47 KiB)
LU: 135.100 μs (1982 allocations: 89.11 KiB)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions