Skip to content

Memory optimizations for large-scale SVD on GPU #111

@sunnowrun

Description

@sunnowrun

Hi,

I am performing large-scale SVD computations using MAK on GPUs and frequently encounter Out-Of-Memory (OOM) errors as the matrix size grows. I would like to ask whether there are ways to make GPU SVD more memory-efficient.

For example. In svd_compact!(A), when size(A, 1) < size(A, 2), I noticed that _gpu_gesvd_maybe_transpose! transposes the input matrix before calling cuSOLVER’s _gpu_gesvd!, since cuSOLVER only supports the m ≥ n case. However, this creates a second transposed matrix on the GPU while the original is still alive, effectively doubling memory usage. Would it be possible to free the original matrix immediately after the transpose if it is no longer needed in MAK? And the same for the output matrix?

And, are there any other way help me handle larger matrix SVD?

I have run into a similar issue when using TensorOperations.jl. For example, a contraction such as @tensor A[a, c, b, d] := B[a, k, b] * C[c, k, d] causes internal tensor permutations before contraction, resulting in additional temporary allocations. If B and C are large CuArrays, the temporary copies can again double memory usage.

Additionally, I have locally modified TensorKit so that contractions and factorizations run on GPU. In doing so, I noticed that the Julia garbage collector sometimes delays reclaiming GPU memory, causing the next large allocation to fail with OOM. To avoid this, I sometimes need to call CUDA.unsafe_free! manually. Has this GC behavior been observed by the maintainers?

Thanks for the great packages and all your work!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions