Skip to content

fix: NaN stability, covariance regularization, and cube_samples support#137

Open
dprelogo wants to merge 44 commits into
PolyChord:wraparoundfrom
dprelogo:wraparound
Open

fix: NaN stability, covariance regularization, and cube_samples support#137
dprelogo wants to merge 44 commits into
PolyChord:wraparoundfrom
dprelogo:wraparound

Conversation

@dprelogo
Copy link
Copy Markdown

Summary

This PR addresses several critical stability issues encountered when running PolyChord on real-world molecular energy landscape problems with high dimensionality, small clusters, and wraparound parameters.

1. NaN propagation and detection

  • Added ieee_arithmetic NaN checks throughout the sampling pipeline: calculate_point, SliceSampling, slice_sample, generate, nested_sampling, mpi_utils, and the C++ likelihood wrapper
  • These traces identify the source of NaN propagation (e.g., zero-magnitude direction vectors, singular Cholesky matrices, bad prior transforms) rather than just detecting the symptom downstream
  • Added warning when slice sampling direction vector has zero magnitude (w < 1e-20), which causes division-by-zero → NaN propagation

2. Covariance matrix regularization for rank-deficient clusters

  • When a cluster has fewer points than dimensions (n < D+1), the covariance matrix is rank-deficient and Cholesky decomposition fails or produces NaN
  • Added eigendecomposition-based regularization (regularize_covmat): decomposes Σ = QΛQᵀ, replaces zero eigenvalues with 1.0 (unit variance in null-space), reconstructs Σ_reg = QΛ_regQᵀ
  • Provides both LAPACK (DSYEV) and pure-Fortran Jacobi fallback implementations via HAVE_LAPACK preprocessor flag
  • calc_cholesky now accepts optional n_points parameter to trigger regularization when needed
  • calc_covmat returns identity matrix for n < 2 points instead of crashing

3. Wraparound atan2(0,0) protection

  • calc_covmat now handles the edge case where all points are identical or antipodal in wraparound dimensions, which produces atan2(0,0) → undefined behavior
  • Falls back to using the first point's value as the circular mean

4. cube_samples kwargs validation fix

  • pypolychord.run() rejected cube_samples as an unknown keyword argument because it wasn't in default_kwargs — the validation check on line 565 fired before the handling code on line 586 could use it
  • Fixed by popping cube_samples before validation (same pattern as paramnames)
  • Added fortranformat to package dependencies (required by _make_resume_file but never declared)

5. Build system improvements

  • Added Intel LLVM compiler support (Makefile_intel_llvm for ifx/icx/icpx)
  • Added conditional LAPACK support across all compiler makefiles (GNU, Intel, Intel LLVM, Cray)
  • Added -fno-finite-math-only flag for GNU to enable NaN detection with -Ofast
  • Added full pyproject.toml metadata for modern Python packaging (uv sync support)
  • Fixed setup.py environment passthrough for build subprocess

Test plan

  • Verify compilation with GNU, Intel classic, and Intel LLVM compilers
  • Run existing test suite (run_pypolychord.py with Gaussian likelihood)
  • Test cube_samples parameter with MPI (verified locally with 4 ranks)
  • Verify NaN traces appear when injecting NaN (commented-out test code in generate.F90)
  • Test with rank-deficient clusters (few live points, high dimensions)

🤖 Generated with Claude Code

dprelogo and others added 14 commits August 13, 2025 16:27
Implements three numerical stability fixes to prevent NaN errors that occur
with high convergence criteria and small clusters: (1) returns regularized
identity matrix when n<2 points, (2) avoids atan2(0,0) in wraparound dimensions,
and (3) pre-regularizes covariance matrices before Cholesky decomposition.
Consolidates calc_cholesky_regularised into calc_cholesky.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
When a cluster has only one point (n < 2), use unit identity matrix
instead of minimal-variance identity matrix (1.0 instead of 1.0e-20).
This provides more reasonable variance for local exploration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add LAPACK=1 option (enabled by default) for eigendecomposition-based
covariance regularization. Includes support for:
- GNU: -llapack -lblas
- Intel: -mkl
- Intel LLVM: -qmkl
- Cray: LibSci (auto-linked)

Set LAPACK=0 to use pure-Fortran Jacobi fallback.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ance matrices

When clusters have insufficient points (n < D+1), the covariance matrix is
rank-deficient and cannot have a valid Cholesky decomposition. This commit
implements a hybrid regularization approach:

Algorithm:
- Compute eigendecomposition of the covariance matrix
- Replace eigenvalues below threshold max(λ_max * 1e-10, 1e-20) with 1.0
- This inserts unit variance along null-space directions (valid in unit hypercube)
- Reconstruct the regularized matrix: V * Λ_reg * V^T

Implementation:
- jacobi_eigen: Pure-Fortran cyclic Jacobi eigendecomposition fallback
- sym_eigen: Wrapper that uses LAPACK DSYEV when available, Jacobi otherwise
- regularize_covmat: Main regularization function implementing the algorithm
- calc_cholesky: Now accepts optional n_points parameter to trigger regularization

The Cholesky fallback behavior is restored to original PolyChord: use
sqrt(trace(a)) scaled identity matrix when decomposition fails.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Pass num_points to calc_cholesky to enable rank-deficient regularization
- Remove debug output that was added during development
- Clean up variable declarations

The num_points parameter allows calc_cholesky to determine when
eigendecomposition-based regularization is needed (n < D+1 case).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
cube_samples was handled on line 587 but was never included in
default_kwargs, so the validation check on line 565 rejected it as an
unknown keyword argument before it could be used. Follow the same
pattern as paramnames: pop it before validation, use the local variable
after.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add [project] table with dependencies, optional-dependencies, and dev
dependency group so that `uv sync` can create a local environment with
MPI support. Disable build isolation for pypolychord via [tool.uv] so
MPI compilers have access to the full environment. Fix setup.py to pass
HOME and other essential env vars to the make subprocess, which OpenMPI's
mpifort requires for opal_init.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Required by _make_resume_file for writing Fortran-readable resume files
when cube_samples is provided. Was always imported but never declared
as a dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove debug NaN checks that were added during NaN investigation:
- Remove all commented-out NaN injection test code (generate.F90)
- Remove redundant pipeline tracing (calculate, nested_sampling, mpi_utils)
- Remove C++ NaN check in loglikelihood callback
- Remove "hello world" worker message
- Move useful diagnostic checks behind #ifdef DEBUG preprocessor guard
- Add -DDEBUG to debug FFLAGS in gnu, intel, intel_llvm Makefiles
- Keep defensive recovery guards in regularize_covmat (utils.F90)
- Keep zero-magnitude direction vector warning (chordal_sampling.f90)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ume_file

Remove the fortranformat dependency which crashes on numpy NaN values
(its NaN guard uses `type(val) is float` strict identity check, missing
np.float64). Replace with custom _format_fortran_double (E24.15E3) and
_format_fortran_int (I12) functions that correctly handle NaN, Inf, and
all normal values with proper Fortran 0-based mantissa normalization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant