Skip to content

Latest commit

 

History

History
114 lines (93 loc) · 6.09 KB

File metadata and controls

114 lines (93 loc) · 6.09 KB

Interface

To extend the above functionality to a new array type, you should use the types and implement the interfaces listed on this page. GPUArrays is designed around having two different array types to represent a GPU array: one that exists only on the host, and one that actually can be instantiated on the device (i.e. in kernels). Device functionality is then handled by KernelAbstractions.jl.

Host abstractions

You should provide an array type that builds on the AbstractGPUArray supertype, such as:

mutable struct CustomArray{T, N} <: AbstractGPUArray{T, N}
    data::DataRef{Vector{UInt8}}
    offset::Int
    dims::Dims{N}
    ...
end

This will allow your defined type (in this case JLArray) to use the GPUArrays interface where available. To be able to actually use the functionality that is defined for AbstractGPUArrays, you need to define the backend, like so:

import KernelAbstractions: Backend
struct CustomBackend <: KernelAbstractions.GPU
KernelAbstractions.get_backend(a::CA) where CA <: CustomArray = CustomBackend()

There are numerous examples of potential interfaces for GPUArrays, such as with JLArrays, CuArrays, and ROCArrays.

Sparse arrays

A sparse array can't share the AbstractGPUArray supertype — that is a DenseArray, whereas a sparse array must be an AbstractSparseArray — so GPUArrays keeps a parallel sparse hierarchy with its own generic functionality. Integrating a back-end has three parts: the storage types it provides, the methods it implements to plug them in, and the functionality it then gets for free.

Storage types to provide

One mutable struct per supported format, subtyping the matching abstract type and using the conventional field names (generic code reads them directly):

supertype fields
AbstractGPUSparseVector{Tv,Ti} iPtr, nzVal, len, nnz
AbstractGPUSparseMatrixCSC{Tv,Ti} colPtr, rowVal, nzVal, dims, nnz
AbstractGPUSparseMatrixCSR{Tv,Ti} rowPtr, colVal, nzVal, dims, nnz
AbstractGPUSparseMatrixCOO{Tv,Ti} rowInd, colInd, nzVal, dims, nnz

The pointer/index/value arrays are the back-end's own dense vector type. Provide only the formats you need, but note that several generic operations route through COO.

Interface to implement

  • Constructors — from component arrays (MyCSR(rowPtr, colVal, nzVal, dims)), between formats (MyCSR(::MyCOO), …), and to/from host SparseArrays (MyCSC(::SparseMatrixCSC), SparseMatrixCSC(::MyCSC)).
  • undef constructorsMyCSC{Tv,Ti}(undef, dims) / MyVec{Tv,Ti}(undef, n), building a structurally-empty array (no stored entries), mirroring dense Array{T}(undef, dims) and SparseArrays' SparseMatrixCSC{Tv,Ti}(undef, m, n). This is the empty-of-a-shape allocation primitive. Note there is no uninitialized-structure analogue: for a sparse array undef means empty, exactly as in SparseArrays. Implementing these through a spzeros(Tv, Ti, dims…; fmt=…) helper (the value-level analogue of SparseArrays.spzeros, with a format selector) is recommended — it also serves as a convenient public, format-polymorphic entry point — but spzeros itself is not mandated, since its signature is back-end-flavored (format symbols, storage modes) whereas the undef constructor is uniform.
  • Base.similar — structure-preserving (similar(A), similar(A, ::Type)) and empty-of-a-shape (similar(A, ::Type, dims)), as for dense arrays; generic code allocates its outputs through similar, never by naming a type. The empty-of-a-shape form just delegates to the undef constructor (threading the source's storage mode), so the constructor is the real primitive.
  • Format-conversion hooks GPUArrays.coo_type/csr_type/csc_type — map any of your sparse-matrix types to the type of the named sibling format (coo_type(::Type{<:MyCSC}) = MyCOO); generic code converts with coo_type(A)(A). These are type-level hooks rather than plain convert(Dest, A) because a format is the wrapper's identity (distinct structs), not a type parameter — so, unlike an eltype change, there is no generic wrapper→sibling-wrapper operation, and only the back-end knows its sibling types. The cross-format convert methods above are the engine the resulting constructors route through; the identity case (coo_type(coo)(coo)) is your identity constructor.
  • KernelAbstractions.get_backend for the sparse types (usually get_backend(nonzeros(A))).
  • Adapt.adapt_structure converting each host struct to its device counterpart (GPUArrays.GPUSparseDeviceVector, GPUSparseDeviceMatrixCSC/CSR/COO), so the generic kernels can consume it inside @kernels.
  • GPUArrays._sptranspose/_spadjoint — materialize a (conjugate) transpose; used by kron/triu/tril on lazily wrapped operands.

SparseArrays' accessors (nnz, nonzeros, nonzeroinds, rowvals, getcolptr) come for free from the field names. Dense↔sparse conversion is generic and on-device: to_sparse(::Type{ST}, dense) scans into a sparse array (ST a vector or COO type; CSR/CSC follow via the verbs) and to_dense(A) scatters back to a dense array of the back-end — so a back-end's MyArray(::MySparse…) and dense→sparse constructors can simply call them.

Functionality you get

Broadcasting; mapreduce and reductions (sum, norm, opnorm); sparse–dense and sparse–vector multiplication (*, mul!, including transposed/adjoint operands); findnz, triu/tril/kron/reshape/droptol!; iszero/issymmetric/ishermitian; scalar and slice indexing; copy/copyto!/collect/Array; and conversion between formats and to/from dense.

Caching Allocator

GPUArrays.@cached
GPUArrays.@uncached