Skip to content

fix(tweedie-post): use plain PyTensor for logp and add timing#19

Open
williambdean wants to merge 13 commits into
mainfrom
tweedie-regression
Open

fix(tweedie-post): use plain PyTensor for logp and add timing#19
williambdean wants to merge 13 commits into
mainfrom
tweedie-regression

Conversation

@williambdean
Copy link
Copy Markdown
Owner

@williambdean williambdean commented May 19, 2026

Adds the Pearson φ is Broken blog post

Preview

A live preview of this branch is deployed at: https://williambdean.github.io/pr-19/

Preview

A live preview of this branch is deployed at: https://williambdean.github.io/pr-19/

@ricardoV94
Copy link
Copy Markdown

the random function is not used correctly. dist argument of custom dist expects a PyTensor random expression but you used numpy code. the draws won't be correct for multiple evals.

Either use customdist random method or implement it using PyTensor random ops

@ricardoV94
Copy link
Copy Markdown

logp didn't work with PyMC 6.0, so replaced it with functionally

what does this mean?

@williambdean
Copy link
Copy Markdown
Owner Author

Opening pymc-devs/pymc#8311 in order to migrate to pymc.dims

Copy link
Copy Markdown

@NathanielF NathanielF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool post. I'm reviewing on my phone, so not running the code. Comments are mostly about framing. It's an excellent demonstration of why proper uncertainty calibration matters. I would try and emphasise the prevalence of the problem so that your solution lands more cleanly as an important corrective rather than a technical detail.

# Pearson φ is Broken: Bayesian Tweedie GLMs for Insurance Pure Premiums

!!! tip "TL;DR"
Pearson's chi-square dispersion estimator inflates φ by 7× for zero-inflated Tweedie models, making the model predict 99%+ zeros against 93% observed. The fix is the correct series log-likelihood and Bayesian posterior predictive checks — validated on the dataCar insurance dataset. Code and figures are fully reproducible.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd maybe start this call out with why we care about \phi first, then describe the result. Just to land a hook.


Insurance pure premium data has a distinctive shape: 90%+ of policies have zero claims, while the remaining few have positive amounts that are right-skewed and occasionally extreme. The [Tweedie distribution](https://doi.org/10.1007/s11222-005-4070-y) is the standard tool for this setting — it naturally handles the zero-mass point and continuous positive tail through a single compound Poisson-Gamma process.

Here is the paradox. A [well-known blog post on Tweedie GLMs for insurance](https://akshat.blog/posts/fitting-tweedie-models-to-claims-data/) reported something strange: the posterior predictive check predicted **99.95% zeros** against an observed **~94%**. The model collapsed to almost-all-zero predictions. This is not because Tweedie is the wrong distribution — it is because the dispersion parameter φ was estimated using the wrong tool.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can emphasise how much of a disaster this collapse is if you front load the why.

??? info "Why the Likelihood Ratio Matters"
The log-likelihood difference ΔLL is not just a statistical nicety. It directly translates to predictive performance. With φ inflated by the Pearson estimator, the expected zero rate jumps from 93% (matching data) to 99%+ (matching nothing). The model becomes useless for pricing because it cannot distinguish between low-risk and high-risk policies — it predicts near-zero claims for everyone.

## The Fix: Series Log-PDF
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super interesting. Hadn't heard of these.


## The Fix: Series Log-PDF

The reason practitioners reach for Pearson φ is that the Tweedie density does not have a closed form. The likelihood is an infinite series ([Dunn & Smyth, 2005](https://doi.org/10.1007/s11222-005-4070-y)):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd wonder if it would be useful to restructure the post a bit to lead with the hook as predictive failure in the reference distribution and then lead into explanation of why with approaches to modelling the likelihood.

The post reads to me as; clear misfire due to expressive limitation >> mathematical intractability of vanilla likelihood >> interesting re-expression of the likelihood >> more expressive modelling of target phenomena

But as it stands you kind of frontload the technical detail before we know why its interesting

- px.math.gammaln(j + 1)
- px.math.gammaln(px.math.maximum(j * alpha, 1e-10))
)
log_a = px.math.log((px.math.exp(log_Wj)).sum(dim="term")) - log_v
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool.

return model
```

The sigmoid transform on $p$ ensures it stays in $(1.1, 1.9)$ — the range where the Poisson-Gamma compound representation is valid and the series converges reliably. The log-link keeps $\mu$ positive, which is natural for claim amounts.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say more about validity here. Even maybe worth clarifying the idea of a compound likelihood.


But how much does the posterior actually tighten relative to these priors? We can compare the full prior and posterior distributions directly:

![Prior vs Posterior: 5000 synthetic Tweedie observations tighten all parameter distributions.](../images/fig_prior_posterior.png)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe (3, 1) figure render would work better than the current (1,3). It's hard to see rhe impact of what you've learned when the visual is spread thin. Note (reading on a phonw)


The **total claim** statistic is worth pausing on. Total claim divided by number of policies is the mean pure premium — the most basic pricing metric. If the model gets this wrong, nothing else matters. The PPC confirms our model gets it right: the observed total claim falls well inside the posterior predictive distribution for the dataCar dataset.

A standard GLM with Pearson $\phi$ also produces the same mean — the mean structure $(\mu)$ is identical regardless of how $\phi$ is estimated. But the PPC reveals what the point estimate cannot: the Bayesian model's full predictive distribution correctly clusters around the observed value, while the Pearson model's distribution would be shifted (inflated $\phi$ smears the predictive variance). A point estimate hides this; the PPC exposes it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This point needs more prominence. For example you give a call out box to a detail about computational time, which is closer to a footnote.


This is the uniquely Bayesian insight: the PPC validates the entire predictive distribution, not just a point estimate. A standard GLM reports the same mean ($293) but cannot tell you whether the distribution around that mean is realistic. The PPC — available only through the Bayesian posterior predictive — catches the Pearson failure. The dispersion estimate that looked reasonable in a coefficient table turns out to produce a predictive distribution that does not resemble the data at all. Without the PPC, you would never know.

A model that cannot distinguish a 2.3% tail risk from 0.001% is not usable for pricing, reinsurance, or reserve setting. The PPC catches this; the Pearson dispersion does not.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat undersells your finding. This is a catastrophic failing.

Three practical recommendations:

1. **Estimate $p$ and $\phi$ jointly** — fixing $p$ to an arbitrary value (like 1.6) and then estimating $\phi$ via Pearson creates a cascade of errors
2. **Use the full likelihood** — the series expansion is numerically tractable and converges rapidly; there is no reason to settle for method-of-moments estimates
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really points to expressive limitations for me. If you front load the insurance context and prevalence of zero inflated distributions in that domain, the argument of the post is that a standard implementation encodes a subtle silent failure mode.

You can this picture, then show how (a) the series approximation is a boost to expressive power and (b) the hand roller pytensor class articulates the problem properly.

The xtensor-based logp function doesn't work with PyMC 6.0. Replaced
with functionally equivalent plain PyTensor ops. Also added measured
sampling time (~3 min for 60k obs with nutpie).
Link key claims to their sources: Dunn & Smyth (2005) paper, statsmodels
GLM docs, statmod tweedie docs, Akshat's blog post, dataCar dataset,
PyMC WAIC, and pymc-bart.
…on dataCar alone

The blog post's French TPL φ values (φ(Pearson)=14,861) and dataset statistics (96.3% zeros,  weighted mean) could not be traced to any known data source — not to Akshat's actual 60k subset nor to any random subset of the freMTPL2 merged data. Removed all specific φ values and dataset stats for French TPL from tables and text.

- Replaced "7-56×" → "7×" throughout (TL;DR, text body)
- Removed French TPL row from φ comparison, dataset, and parameter tables
- Removed p-comparison paragraph (p=1.574 vs p=1.633)
- Updated figure scripts: "French TPL" → "High-Inflation" labels
- Fixed terminology: "MLE (Bayesian)" → "Posterior Mean"
- Softened max-claim underprediction language for dataCar fit
- Added compute_pearson_phi.py verification script
- dataCar φ(Pearson)=1,227 at p=1.574 verified reproducible
…planation

- Change CustomDist  to
  per Ricardo's review (dist expects a PyTensor expression, not numpy)
- Add docstring to tweedie_logp_series explaining why plain PyTensor
  ops are used (xtensor named axes fail under PyMC 6.0 broadcasting)
- Clean up leftover @classmethod dist corruption in code block
- Update text references from 'dist' / 'tweedie_dist' to 'random' / 'tweedie_random'
…plot_prior_posterior

- Write fig_prior_posterior.py that runs pm.sample_prior_predictive and
  pm.sample on synthetic Tweedie data, then compares using arviz
- Add arviz, pymc, nutpie dependencies to pyproject.toml (misc group)
- Add figure and explanatory text to blog post Bayesian validation section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants