fix: allow zero-gradient initial points in check_all#63
fix: allow zero-gradient initial points in check_all#63habakan wants to merge 1 commit intopymc-devs:mainfrom
Conversation
|
Sorry for the delay, I somehow missed the issue... Rejecting initial points with zero gradient is by design, because we choose the initial mass matrix based on the gradient. Is there any specific reason you need this? |
|
Thank you for the clarification! Some context: I ran into this when testing with an initial point set to the expected value (the mode of the posterior). For example, starting inference from x=0 on N(0,1) produces a zero gradient and gets rejected with You're right that the check is intentional. After reading the code more carefully, I can see that in I'll close this PR. If zero-gradient initial points are ever worth supporting in the future, the right fix would be to handle zero explicitly in |
Problem
check_allcallsarray_all_finite_and_nonzeroontransformed_gradient, which rejectsmathematically valid initial points where the gradient is exactly zero with
BadInitGrad.Reproducer
Sampling from a standard normal N(0, 1) starting at the origin:
logp(x) = -x²/2, gradient =-xx = 0, gradient = 0 (the mode)check_allrejects the point →BadInitGradFix
Why this is safe
Consistency with
check_untransformedThe sibling function has always used
array_all_finiteonly. This change makes the two consistent.Consistent with Stan / PyMC
Stan's
initialize.hppchecksstd::isfinite(sum(gradient))only — zero gradients are not rejected.No impact on step size search
initialize_trajectorysets a random momentumpbefore the first leapfrog step, so the positionmoves via
q += ε * peven when the initial gradient is zero.If the search does get stuck, nuts-rs already has two safety nets: a 100-iteration cap and a
fallback to
initial_step.Background
The
array_all_finite_and_nonzerocheck was introduced in Refactor to always use transformed hamiltonian and implement exact normal trajectory #56 as part of a large refactor thatadded
check_allas a new function.Use 1. as initial mass matrix if grad is zero #7 already handles the zero-gradient case on the mass matrix side (falling back to
1.0when1/|grad|is not finite), so there is no longer a reason to reject the initial point itself.If I'm missing something and this check was intentional, I'd love to understand the reasoning.
Changes
src/dynamics/transformed_hamiltonian.rsarray_all_finite_and_nonzero→array_all_finite(one line)tests/zero_gradient_init.rs