Skip to content

KLU: add thread-safe solve entry points#1038

Closed
luke-kiernan wants to merge 1 commit intoDrTimothyAldenDavis:devfrom
luke-kiernan:klu-thread-safe-solve
Closed

KLU: add thread-safe solve entry points#1038
luke-kiernan wants to merge 1 commit intoDrTimothyAldenDavis:devfrom
luke-kiernan:klu-thread-safe-solve

Conversation

@luke-kiernan
Copy link
Copy Markdown

Summary

Add thread-safe variants of klu_solve / klu_tsolve so multiple threads can solve against a single shared Numeric.

The legacy entry points use Numeric->Xwork as scratch, so concurrent solves on the same factorization race on that buffer. This PR adds:

  • klu_*_solve_ws / klu_*_tsolve_ws — same as the legacy entry points, but take a caller-supplied void *Work instead of writing to Numeric->Xwork.
  • klu_*_solve_worksize(Symbolic, Common) — returns the required size in bytes (4 * n * sizeof(Entry), with overflow check).

The legacy klu_solve / klu_tsolve are unchanged — they still use Numeric->Xwork — so this is a pure addition. No struct fields touched, no existing signature changed, no SOVERSION bump.

Threading contract: any number of concurrent klu_*_solve_ws / klu_*_tsolve_ws calls against a single Numeric, provided each thread supplies its own Work buffer and its own klu_common. (As before, klu_refactor mutates the Numeric and is never concurrent-safe with solves.)

Implementation: each of klu_solve.c / klu_tsolve.c is split into a static _core helper (takes void *Work) plus public klu_solve / klu_solve_ws / (klu_tsolve / klu_tsolve_ws) wrappers.

Test plan

Demo/klu_thread_demo.c is included as a permanent regression test: 8 pthreads × 200 iters × (solve + tsolve) against a shared Numeric, each with its own Work and Common, results compared bit-exact to a serial reference. Wired into the SUITESPARSE_DEMOS build, gated on non-Windows.

  • Builds clean across all 4 templated variants (real/complex × int32/int64); 12 new symbols exported (nm verified)
  • klu_*_solve_ws / klu_*_tsolve_ws produce bit-identical output to the legacy entry points (single-threaded smoke test)
  • klu_thread_demo passes: 8 threads × 200 iters × 2 solves on n=200 tridiag
  • Negative control: same test against legacy klu_solve (sharing Numeric->Xwork) reliably fails on iter 0 — confirms the test really exercises the race and the new path resolves it
  • User guide and ChangeLog — deferred; happy to add in a follow-up if you like the API shape

Notes

  • Naming: chose _ws (workspace) over _2 for descriptive clarity. Open to bikeshed.
  • Scope: solve + tsolve only. klu_refactor and klu_condest aren't the concurrent-target use case (refactor mutates Numeric regardless).
  • Only x86/macOS tested locally. The pthreads demo is gated on NOT WIN32; on Windows the test is skipped (the library change itself is portable).

🤖 Generated with Claude Code

Legacy klu_solve / klu_tsolve race on Numeric->Xwork when called
concurrently against a shared Numeric.  Add klu_*_solve_ws and
klu_*_tsolve_ws variants that take caller-supplied scratch, plus
klu_*_solve_worksize() to query the size.  Legacy entry points
unchanged; no ABI break.  Includes Demo/klu_thread_demo.c regression
test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@luke-kiernan
Copy link
Copy Markdown
Author

Above was 80% AI. But in principle it seems like a straightforward patch. If there's things I've overlooked, please let me know and I'll do my best to address them.

@DrTimothyAldenDavis
Copy link
Copy Markdown
Owner

I can't include any AI generated code

@luke-kiernan
Copy link
Copy Markdown
Author

I can't include any AI generated code

Understood. I'll close the PR. I may re-write by hand and re-open, but that will take a while.

Related question: is there a concrete reason why the library doesn't already allow for thread-parallel calls to solve!?

@DrTimothyAldenDavis
Copy link
Copy Markdown
Owner

I wrote the library a long time ago, when parallelism was not a concern.

@DrTimothyAldenDavis
Copy link
Copy Markdown
Owner

@luke-kiernan
Copy link
Copy Markdown
Author

You don't have to justify your policy to me. The root problem I'm trying to address is 2 steps removed: PowerNetworkMatrices.jl has KLU.jl as a dependency which in turn provides Julia bindings for KLU. So I used AI out of convenience, hoping for a low-effort fix.

I'll close this PR. I may re-write by hand, but I can't really justify taking the time to do so right now.

@DrTimothyAldenDavis
Copy link
Copy Markdown
Owner

No problem. I will add this to my TODO list. I definitely agree it's something that needs to be done. I'll try to get to it over the summer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants