Skip to content

Make y_hat σ prior data-scaled in WeightedSumFitter and SoftmaxWeightedSumFitter #887

@drbenvincent

Description

@drbenvincent

Summary

Make the observation-noise prior in WeightedSumFitter and SoftmaxWeightedSumFitter scale automatically with the outcome's spread, instead of using a fixed HalfNormal(sigma=1). The current default works well only when the outcome is on a unit-ish scale (e.g. standardised z-scores), and silently degrades sampling stability and runtime on outcomes with larger natural scales.

Background

Both WeightedSumFitter and SoftmaxWeightedSumFitter define the likelihood as

default_priors = {
    "y_hat": Prior(
        "Normal",
        sigma=Prior("HalfNormal", sigma=1, dims=["treated_units"]),
        dims=["obs_ind", "treated_units"],
    ),
}

The HalfNormal(1) prior on σ has a median of about 0.67 and a 95th percentile of about 1.96. Whenever the outcome's natural standard deviation is meaningfully larger than 1, the posterior σ is pushed into the tail of the prior. This creates prior–data tension that has several practical costs:

  • Stiff posterior geometry. When σ is forced small relative to what the data want, the conditional posterior over the donor weights becomes a narrow ridge, which is hard for NUTS to explore.
  • NUTS warmup instability. During the early warmup phase, before the mass matrix has adapted, jittered initial points combined with a stiff posterior produce kinetic-energy overflows (RuntimeWarning: overflow encountered in dot in pymc/step_methods/hmc/quadpotential.py). PyMC handles them by rejecting the trajectory, but they are emitted to stderr and concern users.
  • Reduced effective sample size and slower wall-time. In a controlled comparison on a real-world SC outcome, switching the σ prior from HalfNormal(1) to a scale that matches the outcome's spread roughly doubled ess_bulk_min and halved sampling wall-time, with no other change.
  • Misleading downstream summaries. Power-curve detection rates can look artificially high when the σ posterior is held narrow by an incompatible prior, because credible intervals on the cumulative impact tighten for the wrong reason.

Proposed change

Add a priors_from_data(self, X, y) method to both WeightedSumFitter and SoftmaxWeightedSumFitter that returns a y_hat prior whose σ scale is derived from the data, for example

def priors_from_data(self, X, y):
    base = super().priors_from_data(X, y)  # preserves Dirichlet/softmax β prior
    y_std = float(np.asarray(y).std(ddof=0))
    sigma_scale = max(y_std, 1.0)  # avoid pathological zero-variance edge cases
    base["y_hat"] = Prior(
        "Normal",
        sigma=Prior("HalfNormal", sigma=sigma_scale, dims=["treated_units"]),
        dims=["obs_ind", "treated_units"],
    )
    return base

The exact functional form is open for discussion — alternatives include a fraction of y_std, a robust spread (MAD / IQR), or treating the prior as HalfNormal(sigma=k * y_std) with a small k (e.g. 0.5) so the prior remains weakly informative rather than centred on the empirical noise. The principle is that the default should adapt to the outcome's scale rather than assume it.

For SoftmaxWeightedSumFitter the change is the same: only the likelihood σ is data-scaled; the softmax β logits prior is unaffected.

The library already follows this pattern elsewhere — LinearRegression documents the priors_from_data hook for exactly this purpose, so adopting it here is consistent.

Backwards compatibility

  • Users who explicitly pass priors={"y_hat": ...} are unaffected: the merge order in PyMCModel.__init__ is default_priors → priors_from_data → user-supplied, so explicit overrides continue to win.
  • Users relying on the default will see slightly different posteriors, generally better-conditioned. Worth a CHANGELOG note flagging it as a behaviour change. It is not a breaking API change.

Acceptance criteria

  • WeightedSumFitter.priors_from_data returns a data-scaled y_hat prior in addition to the existing Dirichlet beta prior.
  • SoftmaxWeightedSumFitter.priors_from_data returns a data-scaled y_hat prior.
  • Existing user override behaviour is preserved: passing priors={"y_hat": ...} to either class continues to take precedence.
  • Unit tests cover the new behaviour: default prior scale changes with y.std(); explicit override still wins.
  • Docstrings on both classes mention that the σ prior is data-scaled and explain the rationale.
  • CHANGELOG entry under "Changes" notes the default-prior adjustment and links here.

Out of scope

  • Reparameterising the simplex β prior in SoftmaxWeightedSumFitter. (Separate concern; this issue is only about likelihood σ.)
  • Changing the priors on LinearRegression, InstrumentalVariableRegression, or other PyMC models that already handle σ scaling differently.

Any thoughts on this @thomaspinder ? Sound like a good idea?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions