Summary
Make the observation-noise prior in WeightedSumFitter and SoftmaxWeightedSumFitter scale automatically with the outcome's spread, instead of using a fixed HalfNormal(sigma=1). The current default works well only when the outcome is on a unit-ish scale (e.g. standardised z-scores), and silently degrades sampling stability and runtime on outcomes with larger natural scales.
Background
Both WeightedSumFitter and SoftmaxWeightedSumFitter define the likelihood as
default_priors = {
"y_hat": Prior(
"Normal",
sigma=Prior("HalfNormal", sigma=1, dims=["treated_units"]),
dims=["obs_ind", "treated_units"],
),
}
The HalfNormal(1) prior on σ has a median of about 0.67 and a 95th percentile of about 1.96. Whenever the outcome's natural standard deviation is meaningfully larger than 1, the posterior σ is pushed into the tail of the prior. This creates prior–data tension that has several practical costs:
- Stiff posterior geometry. When σ is forced small relative to what the data want, the conditional posterior over the donor weights becomes a narrow ridge, which is hard for NUTS to explore.
- NUTS warmup instability. During the early warmup phase, before the mass matrix has adapted, jittered initial points combined with a stiff posterior produce kinetic-energy overflows (
RuntimeWarning: overflow encountered in dot in pymc/step_methods/hmc/quadpotential.py). PyMC handles them by rejecting the trajectory, but they are emitted to stderr and concern users.
- Reduced effective sample size and slower wall-time. In a controlled comparison on a real-world SC outcome, switching the σ prior from
HalfNormal(1) to a scale that matches the outcome's spread roughly doubled ess_bulk_min and halved sampling wall-time, with no other change.
- Misleading downstream summaries. Power-curve detection rates can look artificially high when the σ posterior is held narrow by an incompatible prior, because credible intervals on the cumulative impact tighten for the wrong reason.
Proposed change
Add a priors_from_data(self, X, y) method to both WeightedSumFitter and SoftmaxWeightedSumFitter that returns a y_hat prior whose σ scale is derived from the data, for example
def priors_from_data(self, X, y):
base = super().priors_from_data(X, y) # preserves Dirichlet/softmax β prior
y_std = float(np.asarray(y).std(ddof=0))
sigma_scale = max(y_std, 1.0) # avoid pathological zero-variance edge cases
base["y_hat"] = Prior(
"Normal",
sigma=Prior("HalfNormal", sigma=sigma_scale, dims=["treated_units"]),
dims=["obs_ind", "treated_units"],
)
return base
The exact functional form is open for discussion — alternatives include a fraction of y_std, a robust spread (MAD / IQR), or treating the prior as HalfNormal(sigma=k * y_std) with a small k (e.g. 0.5) so the prior remains weakly informative rather than centred on the empirical noise. The principle is that the default should adapt to the outcome's scale rather than assume it.
For SoftmaxWeightedSumFitter the change is the same: only the likelihood σ is data-scaled; the softmax β logits prior is unaffected.
The library already follows this pattern elsewhere — LinearRegression documents the priors_from_data hook for exactly this purpose, so adopting it here is consistent.
Backwards compatibility
- Users who explicitly pass
priors={"y_hat": ...} are unaffected: the merge order in PyMCModel.__init__ is default_priors → priors_from_data → user-supplied, so explicit overrides continue to win.
- Users relying on the default will see slightly different posteriors, generally better-conditioned. Worth a
CHANGELOG note flagging it as a behaviour change. It is not a breaking API change.
Acceptance criteria
Out of scope
- Reparameterising the simplex β prior in
SoftmaxWeightedSumFitter. (Separate concern; this issue is only about likelihood σ.)
- Changing the priors on
LinearRegression, InstrumentalVariableRegression, or other PyMC models that already handle σ scaling differently.
Any thoughts on this @thomaspinder ? Sound like a good idea?
Summary
Make the observation-noise prior in
WeightedSumFitterandSoftmaxWeightedSumFitterscale automatically with the outcome's spread, instead of using a fixedHalfNormal(sigma=1). The current default works well only when the outcome is on a unit-ish scale (e.g. standardised z-scores), and silently degrades sampling stability and runtime on outcomes with larger natural scales.Background
Both
WeightedSumFitterandSoftmaxWeightedSumFitterdefine the likelihood asThe
HalfNormal(1)prior on σ has a median of about 0.67 and a 95th percentile of about 1.96. Whenever the outcome's natural standard deviation is meaningfully larger than 1, the posterior σ is pushed into the tail of the prior. This creates prior–data tension that has several practical costs:RuntimeWarning: overflow encountered in dotinpymc/step_methods/hmc/quadpotential.py). PyMC handles them by rejecting the trajectory, but they are emitted to stderr and concern users.HalfNormal(1)to a scale that matches the outcome's spread roughly doubledess_bulk_minand halved sampling wall-time, with no other change.Proposed change
Add a
priors_from_data(self, X, y)method to bothWeightedSumFitterandSoftmaxWeightedSumFitterthat returns ay_hatprior whose σ scale is derived from the data, for exampleThe exact functional form is open for discussion — alternatives include a fraction of
y_std, a robust spread (MAD / IQR), or treating the prior asHalfNormal(sigma=k * y_std)with a smallk(e.g.0.5) so the prior remains weakly informative rather than centred on the empirical noise. The principle is that the default should adapt to the outcome's scale rather than assume it.For
SoftmaxWeightedSumFitterthe change is the same: only the likelihood σ is data-scaled; the softmax β logits prior is unaffected.The library already follows this pattern elsewhere —
LinearRegressiondocuments thepriors_from_datahook for exactly this purpose, so adopting it here is consistent.Backwards compatibility
priors={"y_hat": ...}are unaffected: the merge order inPyMCModel.__init__isdefault_priors → priors_from_data → user-supplied, so explicit overrides continue to win.CHANGELOGnote flagging it as a behaviour change. It is not a breaking API change.Acceptance criteria
WeightedSumFitter.priors_from_datareturns a data-scaledy_hatprior in addition to the existing Dirichletbetaprior.SoftmaxWeightedSumFitter.priors_from_datareturns a data-scaledy_hatprior.priors={"y_hat": ...}to either class continues to take precedence.y.std(); explicit override still wins.CHANGELOGentry under "Changes" notes the default-prior adjustment and links here.Out of scope
SoftmaxWeightedSumFitter. (Separate concern; this issue is only about likelihood σ.)LinearRegression,InstrumentalVariableRegression, or other PyMC models that already handle σ scaling differently.Any thoughts on this @thomaspinder ? Sound like a good idea?