You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MUMPS performance depends on proper thread configuration:
- OMP_NUM_THREADS=1 (disable MUMPS OpenMP parallelism)
- OPENBLAS_NUM_THREADS=ncores (enable BLAS parallelism)
These must be set before starting Julia because OpenBLAS
sizes its thread pool at initialization time.
With proper settings, MUMPS matches Julia's built-in solver
speed at n=1M (1.0x ratio on 2D Laplacian benchmark).
Copy file name to clipboardExpand all lines: docs/src/getting-started.md
+26-10Lines changed: 26 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -288,25 +288,41 @@ This configuration uses only BLAS-level threading, which is the same strategy Ju
288
288
289
289
### Performance Comparison
290
290
291
-
The following table compares MUMPS (`OMP_NUM_THREADS=1`, `OPENBLAS_NUM_THREADS=10`) against Julia's built-in sparse solver (also using 10 BLAS threads) on a 2D Laplacian problem. Benchmarks were run on a 2025 M4 MacBook Pro with 10 CPU cores:
291
+
The following table compares MUMPS (`OMP_NUM_THREADS=1`, `OPENBLAS_NUM_THREADS=10`) against Julia's built-in sparse solver (also using the same settings) on a 2D Laplacian problem. Benchmarks were run on a 2025 M4 MacBook Pro with 10 CPU cores:
292
292
293
293
| n | Julia (ms) | MUMPS (ms) | Ratio |
294
294
|---|------------|------------|-------|
295
-
| 9 | 0.005| 0.041 |8.7x |
296
-
| 100 | 0.021| 0.070 | 3.3x|
297
-
| 992 | 0.261| 0.412| 1.6x |
298
-
| 10,000 | 4.25| 5.00| 1.18x|
299
-
| 99,856 |49.1| 56.7| 1.16x|
300
-
| 1,000,000 |636|641| 1.01x|
295
+
| 9 | 0.004| 0.041 |9.7x |
296
+
| 100 | 0.023| 0.070 | 3.0x|
297
+
| 992 | 0.269| 0.418| 1.6x |
298
+
| 10,000 | 4.28| 5.60| 1.31x|
299
+
| 99,856 |51.2| 56.9| 1.11x|
300
+
| 1,000,000 |665|666| 1.0x|
301
301
302
302
Key observations:
303
303
- At small problem sizes, MUMPS has initialization overhead (~0.04ms)
304
-
- At large problem sizes (n ≥ 100,000), MUMPS is within 1-16% of Julia's built-in solver
305
-
- At n = 1,000,000, MUMPS is essentially the same speed (1% slower)
304
+
- At large problem sizes (n ≥ 100,000), MUMPS is within 11% of Julia's built-in solver
305
+
- At n = 1,000,000, MUMPS matches Julia's speed exactly (1.0x ratio)
306
306
307
307
### Default Behavior
308
308
309
-
Without setting environment variables, both OpenMP and OpenBLAS will attempt to use all available cores, which can lead to thread oversubscription and degraded performance. For predictable results, explicitly set the environment variables before running your program.
309
+
For optimal performance, set threading environment variables **before starting Julia**:
310
+
311
+
```bash
312
+
export OMP_NUM_THREADS=1
313
+
export OPENBLAS_NUM_THREADS=10 # or your number of CPU cores
314
+
julia your_script.jl
315
+
```
316
+
317
+
This is necessary because OpenBLAS creates its thread pool during library initialization, before LinearAlgebraMPI has a chance to configure it. LinearAlgebraMPI attempts to set sensible defaults programmatically, but this may not always take effect if the thread pool is already initialized.
318
+
319
+
You can also add these to your shell profile (`.bashrc`, `.zshrc`, etc.) or Julia's `startup.jl`:
0 commit comments