fix: NaN stability, covariance regularization, and cube_samples support#137
Open
dprelogo wants to merge 44 commits into
Open
fix: NaN stability, covariance regularization, and cube_samples support#137dprelogo wants to merge 44 commits into
dprelogo wants to merge 44 commits into
Conversation
Implements three numerical stability fixes to prevent NaN errors that occur with high convergence criteria and small clusters: (1) returns regularized identity matrix when n<2 points, (2) avoids atan2(0,0) in wraparound dimensions, and (3) pre-regularizes covariance matrices before Cholesky decomposition. Consolidates calc_cholesky_regularised into calc_cholesky. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
When a cluster has only one point (n < 2), use unit identity matrix instead of minimal-variance identity matrix (1.0 instead of 1.0e-20). This provides more reasonable variance for local exploration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add LAPACK=1 option (enabled by default) for eigendecomposition-based covariance regularization. Includes support for: - GNU: -llapack -lblas - Intel: -mkl - Intel LLVM: -qmkl - Cray: LibSci (auto-linked) Set LAPACK=0 to use pure-Fortran Jacobi fallback. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…ance matrices When clusters have insufficient points (n < D+1), the covariance matrix is rank-deficient and cannot have a valid Cholesky decomposition. This commit implements a hybrid regularization approach: Algorithm: - Compute eigendecomposition of the covariance matrix - Replace eigenvalues below threshold max(λ_max * 1e-10, 1e-20) with 1.0 - This inserts unit variance along null-space directions (valid in unit hypercube) - Reconstruct the regularized matrix: V * Λ_reg * V^T Implementation: - jacobi_eigen: Pure-Fortran cyclic Jacobi eigendecomposition fallback - sym_eigen: Wrapper that uses LAPACK DSYEV when available, Jacobi otherwise - regularize_covmat: Main regularization function implementing the algorithm - calc_cholesky: Now accepts optional n_points parameter to trigger regularization The Cholesky fallback behavior is restored to original PolyChord: use sqrt(trace(a)) scaled identity matrix when decomposition fails. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Pass num_points to calc_cholesky to enable rank-deficient regularization - Remove debug output that was added during development - Clean up variable declarations The num_points parameter allows calc_cholesky to determine when eigendecomposition-based regularization is needed (n < D+1 case). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
cube_samples was handled on line 587 but was never included in default_kwargs, so the validation check on line 565 rejected it as an unknown keyword argument before it could be used. Follow the same pattern as paramnames: pop it before validation, use the local variable after. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add [project] table with dependencies, optional-dependencies, and dev dependency group so that `uv sync` can create a local environment with MPI support. Disable build isolation for pypolychord via [tool.uv] so MPI compilers have access to the full environment. Fix setup.py to pass HOME and other essential env vars to the make subprocess, which OpenMPI's mpifort requires for opal_init. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Required by _make_resume_file for writing Fortran-readable resume files when cube_samples is provided. Was always imported but never declared as a dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove debug NaN checks that were added during NaN investigation: - Remove all commented-out NaN injection test code (generate.F90) - Remove redundant pipeline tracing (calculate, nested_sampling, mpi_utils) - Remove C++ NaN check in loglikelihood callback - Remove "hello world" worker message - Move useful diagnostic checks behind #ifdef DEBUG preprocessor guard - Add -DDEBUG to debug FFLAGS in gnu, intel, intel_llvm Makefiles - Keep defensive recovery guards in regularize_covmat (utils.F90) - Keep zero-magnitude direction vector warning (chordal_sampling.f90) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ume_file Remove the fortranformat dependency which crashes on numpy NaN values (its NaN guard uses `type(val) is float` strict identity check, missing np.float64). Replace with custom _format_fortran_double (E24.15E3) and _format_fortran_int (I12) functions that correctly handle NaN, Inf, and all normal values with proper Fortran 0-based mantissa normalization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses several critical stability issues encountered when running PolyChord on real-world molecular energy landscape problems with high dimensionality, small clusters, and wraparound parameters.
1. NaN propagation and detection
ieee_arithmeticNaN checks throughout the sampling pipeline:calculate_point,SliceSampling,slice_sample,generate,nested_sampling,mpi_utils, and the C++ likelihood wrapperw < 1e-20), which causes division-by-zero → NaN propagation2. Covariance matrix regularization for rank-deficient clusters
n < D+1), the covariance matrix is rank-deficient and Cholesky decomposition fails or produces NaNregularize_covmat): decomposes Σ = QΛQᵀ, replaces zero eigenvalues with 1.0 (unit variance in null-space), reconstructs Σ_reg = QΛ_regQᵀDSYEV) and pure-Fortran Jacobi fallback implementations viaHAVE_LAPACKpreprocessor flagcalc_choleskynow accepts optionaln_pointsparameter to trigger regularization when neededcalc_covmatreturns identity matrix forn < 2points instead of crashing3. Wraparound
atan2(0,0)protectioncalc_covmatnow handles the edge case where all points are identical or antipodal in wraparound dimensions, which producesatan2(0,0)→ undefined behavior4.
cube_sampleskwargs validation fixpypolychord.run()rejectedcube_samplesas an unknown keyword argument because it wasn't indefault_kwargs— the validation check on line 565 fired before the handling code on line 586 could use itcube_samplesbefore validation (same pattern asparamnames)fortranformatto package dependencies (required by_make_resume_filebut never declared)5. Build system improvements
Makefile_intel_llvmforifx/icx/icpx)-fno-finite-math-onlyflag for GNU to enable NaN detection with-Ofastpyproject.tomlmetadata for modern Python packaging (uv syncsupport)setup.pyenvironment passthrough for build subprocessTest plan
run_pypolychord.pywith Gaussian likelihood)cube_samplesparameter with MPI (verified locally with 4 ranks)generate.F90)🤖 Generated with Claude Code