Skip to content

Commit 66afdef

Browse files
Sébastien LoiselSébastien Loisel
authored andcommitted
Document MUMPS threading requirements
MUMPS performance depends on proper thread configuration: - OMP_NUM_THREADS=1 (disable MUMPS OpenMP parallelism) - OPENBLAS_NUM_THREADS=ncores (enable BLAS parallelism) These must be set before starting Julia because OpenBLAS sizes its thread pool at initialization time. With proper settings, MUMPS matches Julia's built-in solver speed at n=1M (1.0x ratio on 2D Laplacian benchmark).
1 parent ac9c958 commit 66afdef

2 files changed

Lines changed: 27 additions & 10 deletions

File tree

docs/src/getting-started.md

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -288,25 +288,41 @@ This configuration uses only BLAS-level threading, which is the same strategy Ju
288288

289289
### Performance Comparison
290290

291-
The following table compares MUMPS (`OMP_NUM_THREADS=1`, `OPENBLAS_NUM_THREADS=10`) against Julia's built-in sparse solver (also using 10 BLAS threads) on a 2D Laplacian problem. Benchmarks were run on a 2025 M4 MacBook Pro with 10 CPU cores:
291+
The following table compares MUMPS (`OMP_NUM_THREADS=1`, `OPENBLAS_NUM_THREADS=10`) against Julia's built-in sparse solver (also using the same settings) on a 2D Laplacian problem. Benchmarks were run on a 2025 M4 MacBook Pro with 10 CPU cores:
292292

293293
| n | Julia (ms) | MUMPS (ms) | Ratio |
294294
|---|------------|------------|-------|
295-
| 9 | 0.005 | 0.041 | 8.7x |
296-
| 100 | 0.021 | 0.070 | 3.3x |
297-
| 992 | 0.261 | 0.412 | 1.6x |
298-
| 10,000 | 4.25 | 5.00 | 1.18x |
299-
| 99,856 | 49.1 | 56.7 | 1.16x |
300-
| 1,000,000 | 636 | 641 | 1.01x |
295+
| 9 | 0.004 | 0.041 | 9.7x |
296+
| 100 | 0.023 | 0.070 | 3.0x |
297+
| 992 | 0.269 | 0.418 | 1.6x |
298+
| 10,000 | 4.28 | 5.60 | 1.31x |
299+
| 99,856 | 51.2 | 56.9 | 1.11x |
300+
| 1,000,000 | 665 | 666 | 1.0x |
301301

302302
Key observations:
303303
- At small problem sizes, MUMPS has initialization overhead (~0.04ms)
304-
- At large problem sizes (n ≥ 100,000), MUMPS is within 1-16% of Julia's built-in solver
305-
- At n = 1,000,000, MUMPS is essentially the same speed (1% slower)
304+
- At large problem sizes (n ≥ 100,000), MUMPS is within 11% of Julia's built-in solver
305+
- At n = 1,000,000, MUMPS matches Julia's speed exactly (1.0x ratio)
306306

307307
### Default Behavior
308308

309-
Without setting environment variables, both OpenMP and OpenBLAS will attempt to use all available cores, which can lead to thread oversubscription and degraded performance. For predictable results, explicitly set the environment variables before running your program.
309+
For optimal performance, set threading environment variables **before starting Julia**:
310+
311+
```bash
312+
export OMP_NUM_THREADS=1
313+
export OPENBLAS_NUM_THREADS=10 # or your number of CPU cores
314+
julia your_script.jl
315+
```
316+
317+
This is necessary because OpenBLAS creates its thread pool during library initialization, before LinearAlgebraMPI has a chance to configure it. LinearAlgebraMPI attempts to set sensible defaults programmatically, but this may not always take effect if the thread pool is already initialized.
318+
319+
You can also add these to your shell profile (`.bashrc`, `.zshrc`, etc.) or Julia's `startup.jl`:
320+
321+
```julia
322+
# In ~/.julia/config/startup.jl
323+
ENV["OMP_NUM_THREADS"] = "1"
324+
ENV["OPENBLAS_NUM_THREADS"] = string(Sys.CPU_THREADS)
325+
```
310326

311327
### Advanced: Combined Threading
312328

src/LinearAlgebraMPI.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -982,6 +982,7 @@ function map_rows(f, A...)
982982
end
983983
end
984984

985+
985986
# ============================================================================
986987
# Precompilation Workload
987988
# ============================================================================

0 commit comments

Comments
 (0)