Skip to content

[FEATURE] True parallel multi-chain using OpenMP #388

Description

@andrewherren

The current run_mcmc_chains implementation in BARTSampler and BCFSampler loops over chains sequentially, restoring GFR snapshots between chains to give each a distinct starting point. The GFR snapshot mechanism is correct but the chains themselves run one after another.

This issue adds true parallel execution: each chain runs on its own thread with no shared mutable state (each chain owns its own sampler state, RNG, and BARTSamples/BCFSamples result object). Concretely: replace the sequential for-loop in run_mcmc_chains with a #pragma omp parallel for schedule(static) block; each thread i restores from gfr_snapshots_[num_chains - 1 - i] and writes into samples.chain_slice(i) with no coordination needed.

The existing warning about within-chain multi-threading conflicting with cross-chain parallelism should be preserved (raise a warning if num_threads > 1 and num_chains > 1).

Acceptance criteria:

  • With num_chains = 4 and the same four seeds, parallel results match sequential results within floating-point tolerance
  • Wall-time speedup >= 3x with 4 chains on a 4-core machine
  • All existing R and Python BART and BCF tests pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions