Skip to content

Commit e48f38e

Browse files
Sébastien LoiselSébastien Loisel
authored andcommitted
Restore automatic synchronized finalization for MUMPS factorizations
Re-adds the automatic cleanup system: - Each factorization gets a unique ID tracked in a global registry - Julia finalizers queue IDs to a thread-safe destroy list (no MPI) - _process_finalizers() gathers pending IDs from all ranks, merges, and finalizes in deterministic order - Registry check prevents double-finalization Manual finalize!(F) remains available for explicit control.
1 parent 08eb301 commit e48f38e

4 files changed

Lines changed: 147 additions & 18 deletions

File tree

CLAUDE.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -91,17 +91,16 @@ Factorization uses MUMPS (MUltifrontal Massively Parallel Solver) with distribut
9191
- Created by `lu(A)` for general matrices or `ldlt(A)` for symmetric matrices
9292
- Stores COO arrays (irn_loc, jcn_loc, a_loc) to prevent GC while MUMPS holds pointers
9393

94-
**Important: Manual cleanup required.** Unlike other types in this library, factorization objects
95-
require explicit cleanup via `finalize!(F)`. This is because MUMPS cleanup routines call MPI
96-
functions, and Julia's GC may run finalizers after MPI has shut down (causing crashes). Example:
94+
**Automatic cleanup:** Factorization objects are automatically cleaned up when garbage collected.
95+
The cleanup is synchronized across MPI ranks when the next factorization is created. Example:
9796

9897
```julia
9998
F = lu(A)
10099
x = F \ b
101-
finalize!(F) # Required!
100+
# F is automatically cleaned up when GC'd and next factorization is created
102101
```
103102

104-
If `finalize!` is not called, the program still works but MUMPS memory leaks until exit.
103+
Manual `finalize!(F)` is still available for explicit control (must be called on all ranks together).
105104

106105
### Local Constructors
107106

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ A_sym = A + transpose(A) + 10I # Make symmetric positive definite
6363
A_sym_dist = SparseMatrixMPI{Float64}(A_sym)
6464
F = ldlt(A_sym_dist) # LDLT factorization
6565
x_sol = solve(F, y) # Solve A_sym * x_sol = y
66-
finalize!(F) # Release factorization resources
66+
# F is automatically cleaned up when garbage collected
6767
```
6868

6969
## Running with MPI

docs/src/api.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -428,7 +428,10 @@ solve
428428
solve!
429429
```
430430

431-
### Releasing Factorization Resources
431+
### Manual Cleanup (Optional)
432+
433+
Factorization objects are automatically cleaned up when garbage collected.
434+
For explicit control, `finalize!` can be called manually (must be called on all ranks together).
432435

433436
```@docs
434437
finalize!
@@ -456,14 +459,14 @@ x = solve(F, b)
456459
# Or use backslash
457460
x = F \ b
458461

459-
# Release factorization resources when done
460-
finalize!(F)
462+
# F is automatically cleaned up when garbage collected
463+
# (or call finalize!(F) for immediate cleanup on all ranks)
461464

462465
# For non-symmetric matrices, use LU
463466
A_nonsym = SparseMatrixMPI{Float64}(sprand(1000, 1000, 0.01) + 10I)
464467
F_lu = lu(A_nonsym)
465468
x = F_lu \ b
466-
finalize!(F_lu)
469+
# F_lu is automatically cleaned up when garbage collected
467470
```
468471

469472
### Direct Solve Syntax
@@ -481,7 +484,7 @@ x = transpose(b) / A # solve x*A = transpose(b)
481484
x = transpose(b) / transpose(A) # solve x*transpose(A) = transpose(b)
482485
```
483486

484-
Note: One-shot solves like `A \ b` automatically clean up the factorization. For repeated solves with the same matrix, compute the factorization once with `lu()` or `ldlt()`, reuse it, then call `finalize!()` when done.
487+
Note: Factorizations are automatically cleaned up when garbage collected. Cleanup is synchronized across MPI ranks when the next factorization is created.
485488

486489
## Cache Management
487490

src/mumps_factorization.jl

Lines changed: 134 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,35 @@ using MUMPS
1111
using MUMPS: Mumps, set_icntl!, MUMPS_INT, MUMPS_INT8, suppress_printing!
1212
import MUMPS: invoke_mumps_unsafe!
1313

14+
# ============================================================================
15+
# MUMPS Automatic Finalization Management
16+
# ============================================================================
17+
#
18+
# MUMPS cleanup requires synchronized MPI calls across all ranks, but Julia's
19+
# GC runs asynchronously on each rank. This system handles automatic cleanup:
20+
#
21+
# 1. Each MUMPS factorization gets a unique integer ID (_mumps_count)
22+
# 2. Objects are registered in _mumps_registry by ID
23+
# 3. Julia's GC finalizer queues the ID to _destroy_list (no MPI calls)
24+
# 4. When creating a new factorization, _process_finalizers() is called:
25+
# - All ranks broadcast their _destroy_list
26+
# - Lists are merged, sorted, uniqued
27+
# - Objects are finalized in deterministic order across all ranks
28+
#
29+
# This ensures synchronized cleanup without blocking in finalizers.
30+
31+
# Global counter for unique MUMPS object IDs
32+
const _mumps_count = Ref{Int}(0)
33+
34+
# Registry mapping ID -> MUMPSFactorizationMPI (prevents GC until removed from registry)
35+
const _mumps_registry = Dict{Int, Any}()
36+
37+
# List of MUMPS IDs queued for destruction by this rank's GC
38+
const _destroy_list = Int[]
39+
40+
# Lock for thread-safe access to _destroy_list (finalizers may run from GC thread)
41+
const _destroy_list_lock = ReentrantLock()
42+
1443
# ============================================================================
1544
# MUMPS Factorization Type
1645
# ============================================================================
@@ -20,9 +49,12 @@ import MUMPS: invoke_mumps_unsafe!
2049
2150
Distributed MUMPS factorization result. Can be reused for multiple solves.
2251
23-
**Important:** Call `finalize!(F)` when done to release MUMPS resources.
52+
Factorization objects are automatically cleaned up when garbage collected,
53+
with synchronized finalization across MPI ranks. Manual `finalize!(F)` is
54+
still available for explicit control (must be called on all ranks together).
2455
"""
2556
mutable struct MUMPSFactorizationMPI{T}
57+
id::Int # Unique ID for finalization tracking
2658
mumps::Any # Mumps{T,R} where R is the real type (Float64 for both real and complex)
2759
irn_loc::Vector{MUMPS_INT}
2860
jcn_loc::Vector{MUMPS_INT}
@@ -36,6 +68,72 @@ end
3668
Base.size(F::MUMPSFactorizationMPI) = (F.n, F.n)
3769
Base.eltype(::MUMPSFactorizationMPI{T}) where T = T
3870

71+
# ============================================================================
72+
# Automatic Finalization Functions
73+
# ============================================================================
74+
75+
"""
76+
_queue_for_destruction(F::MUMPSFactorizationMPI)
77+
78+
Julia finalizer callback. Queues the factorization ID for later synchronized
79+
destruction. Does NOT call MPI (unsafe from GC thread).
80+
"""
81+
function _queue_for_destruction(F::MUMPSFactorizationMPI)
82+
lock(_destroy_list_lock) do
83+
push!(_destroy_list, F.id)
84+
end
85+
return nothing
86+
end
87+
88+
"""
89+
_process_finalizers()
90+
91+
Process pending MUMPS finalizations in a synchronized manner across all ranks.
92+
This is a **collective operation** - all ranks must call it together.
93+
94+
Called automatically when creating new factorizations. Gathers pending
95+
destruction requests from all ranks, merges them, and finalizes in
96+
deterministic order.
97+
"""
98+
function _process_finalizers()
99+
comm = MPI.COMM_WORLD
100+
nranks = MPI.Comm_size(comm)
101+
102+
# Thread-safe: detach current destroy list, replace with empty
103+
local_list = lock(_destroy_list_lock) do
104+
list = copy(_destroy_list)
105+
empty!(_destroy_list)
106+
list
107+
end
108+
109+
# Allgather counts of how many IDs each rank has
110+
local_count = Int32(length(local_list))
111+
all_counts = MPI.Allgather(local_count, comm)
112+
113+
# Allgatherv to collect all IDs from all ranks
114+
total_count = sum(all_counts)
115+
if total_count == 0
116+
return # Nothing to finalize
117+
end
118+
119+
all_ids = Vector{Int}(undef, total_count)
120+
MPI.Allgatherv!(local_list, MPI.VBuffer(all_ids, all_counts), comm)
121+
122+
# Sort and unique to get deterministic order across all ranks
123+
dead_list = sort!(unique(all_ids))
124+
125+
# Finalize each in order (check registry to avoid double-finalize)
126+
for id in dead_list
127+
if haskey(_mumps_registry, id)
128+
F = _mumps_registry[id]
129+
delete!(_mumps_registry, id)
130+
# Actually finalize the MUMPS object
131+
F.mumps._finalized = false
132+
MUMPS.finalize!(F.mumps)
133+
end
134+
end
135+
end
136+
39137
# ============================================================================
40138
# Extract COO from SparseMatrixMPI
41139
# ============================================================================
@@ -96,6 +194,13 @@ function _create_mumps_factorization(A::SparseMatrixMPI{T}, symmetric::Bool) whe
96194
comm = MPI.COMM_WORLD
97195
rank = MPI.Comm_rank(comm)
98196

197+
# Process any pending finalizations first (collective operation)
198+
_process_finalizers()
199+
200+
# Assign unique ID for this factorization
201+
id = _mumps_count[]
202+
_mumps_count[] += 1
203+
99204
m, n = size(A)
100205
@assert m == n "Matrix must be square for factorization"
101206

@@ -107,7 +212,7 @@ function _create_mumps_factorization(A::SparseMatrixMPI{T}, symmetric::Bool) whe
107212
# sym=0: unsymmetric, sym=1: SPD, sym=2: general symmetric
108213
mumps_sym = symmetric ? MUMPS.mumps_definite : MUMPS.mumps_unsymmetric
109214
mumps = Mumps{T}(mumps_sym, MUMPS.default_icntl, MUMPS.default_cntl64)
110-
mumps._finalized = true # Disable GC finalizer to avoid post-MPI crash
215+
mumps._finalized = true # Disable MUMPS GC finalizer to avoid post-MPI crash
111216

112217
# Suppress all MUMPS output
113218
suppress_printing!(mumps)
@@ -142,10 +247,19 @@ function _create_mumps_factorization(A::SparseMatrixMPI{T}, symmetric::Bool) whe
142247
# Pre-allocate RHS buffer on rank 0
143248
rhs_buffer = rank == 0 ? zeros(T, n) : T[]
144249

145-
return MUMPSFactorizationMPI{T}(
146-
mumps, irn_loc, jcn_loc, a_loc,
250+
# Create factorization object with ID
251+
F = MUMPSFactorizationMPI{T}(
252+
id, mumps, irn_loc, jcn_loc, a_loc,
147253
n, symmetric, copy(A.row_partition), rhs_buffer
148254
)
255+
256+
# Register in global registry (prevents GC until removed)
257+
_mumps_registry[id] = F
258+
259+
# Attach Julia finalizer to queue for synchronized destruction
260+
finalizer(_queue_for_destruction, F)
261+
262+
return F
149263
end
150264

151265
"""
@@ -168,7 +282,7 @@ end
168282
169283
Compute LU factorization of a distributed sparse matrix using MUMPS.
170284
Returns a `MUMPSFactorizationMPI` for use with `\\` or `solve`.
171-
Call `finalize!(F)` when done.
285+
Factorization is automatically cleaned up when garbage collected.
172286
"""
173287
function LinearAlgebra.lu(A::SparseMatrixMPI{T}) where T
174288
return _create_mumps_factorization(A, false)
@@ -180,7 +294,7 @@ end
180294
Compute LDLT factorization of a distributed symmetric sparse matrix using MUMPS.
181295
The matrix must be symmetric; only the lower triangular part is used.
182296
Returns a `MUMPSFactorizationMPI` for use with `\\` or `solve`.
183-
Call `finalize!(F)` when done.
297+
Factorization is automatically cleaned up when garbage collected.
184298
"""
185299
function LinearAlgebra.ldlt(A::SparseMatrixMPI{T}) where T
186300
return _create_mumps_factorization(A, true)
@@ -257,9 +371,22 @@ end
257371
"""
258372
finalize!(F::MUMPSFactorizationMPI)
259373
260-
Release MUMPS resources. Must be called when done with the factorization.
374+
Manually release MUMPS resources. This is a **collective operation** - all
375+
ranks must call it together for immediate cleanup.
376+
377+
If the factorization has already been cleaned up (by automatic finalization
378+
or a previous manual call), this is a no-op but all ranks must still call it.
261379
"""
262380
function finalize!(F::MUMPSFactorizationMPI)
381+
# Check if already finalized (removed from registry)
382+
if !haskey(_mumps_registry, F.id)
383+
return F # Already finalized, no-op
384+
end
385+
386+
# Remove from registry
387+
delete!(_mumps_registry, F.id)
388+
389+
# Actually finalize the MUMPS object
263390
F.mumps._finalized = false # Re-enable MUMPS finalization
264391
MUMPS.finalize!(F.mumps)
265392
return F

0 commit comments

Comments
 (0)