Skip to content

Commit 1c8b61c

Browse files
committed
verification of the compound generation
1 parent 173bd3d commit 1c8b61c

4 files changed

Lines changed: 209 additions & 7 deletions

File tree

docs/blog/posts/2026/pearson-phi-broken-tweedie.md

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -142,14 +142,14 @@ def tweedie_dist(mu, phi, p):
142142
"""
143143
lam = mu ** (2 - p) / (phi * (2 - p))
144144
alpha_term = (2 - p) / (p - 1)
145-
beta = phi * (p - 1) * mu ** (p - 1)
145+
beta = 1.0 / (phi * (p - 1) * mu ** (p - 1))
146146
N = pmd.Poisson.dist(mu=lam)
147147
Y = pmd.Gamma.dist(alpha=px.math.maximum(N * alpha_term, 1e-10), beta=beta)
148148
return px.math.where(N > 0, Y, 0.0)
149149
```
150150

151151
!!! tip "Gamma Sum Property"
152-
This exploits a key property of the Gamma distribution: the sum of $N$ i.i.d. $\text{Gamma}(\alpha, \beta)$ variables is $\text{Gamma}(N \cdot \alpha, \beta)$. So instead of summing $N$ individual Gamma draws, we draw a single Gamma with shape $N \cdot \alpha_{\text{term}}$. When $N = 0$, the `where` returns 0 — the point mass at zero that characterizes the Tweedie.
152+
This exploits the [additivity property of the Gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution#Sum_of_gamma_distributions): the sum of $N$ i.i.d. $\text{Gamma}(\alpha, \beta)$ variables is $\text{Gamma}(N \cdot \alpha, \beta)$ (where $\beta$ is the rate parameter, $1/\text{scale}$, as used by PyMC). So instead of summing $N$ individual Gamma draws, we draw a single Gamma with shape $N \cdot \alpha_{\text{term}}$. When $N = 0$, the `where` returns 0 — the point mass at zero that characterizes the Tweedie.
153153

154154
```python title="Tweedie wrapper and intercept-only model"
155155
class Tweedie:
@@ -182,6 +182,9 @@ def build_intercept_only_model(y, p_range=(1.1, 1.9)):
182182
return model
183183
```
184184

185+
!!! tip "Geometric Prior Stabilizer"
186+
The `p_logit` → sigmoid → scale to $(p_{\text{min}}, p_{\text{max}})$ pipeline is a reusable design pattern for bounded parameters in PPLs. Instead of placing a prior directly on $p$ (where the gradient vanishes near the boundaries and NUTS stalls), we place a Normal prior on an unbounded latent variable and transform it deterministically. This gives NUTS an infinite, smooth Gaussian landscape to explore while perfectly shielding the series from the catastrophic gradient cliffs near $p = 1.0$ and $p = 2.0$. The same pattern works universally: transform a Normal to Beta, Dirichlet, or constrained positive reals to stabilize sampling for any bounded parameter.
187+
185188
The sigmoid transform on $p$ keeps it in $(1.1, 1.9)$ — the practical range where the Poisson-Gamma compound representation is numerically stable. The log-link keeps $\mu$ positive, which is natural for claim amounts.
186189

187190
??? info "Why p ∈ (1, 2) Matters"
@@ -249,7 +252,7 @@ But how do we know these estimates are *correct*, not just well-behaved? A param
249252
| True (generating) | 293 | 174 | 1.574 |
250253
| Series posterior | 274 ± 10 | 174 ± 6 | 1.58 ± 0.01 |
251254

252-
All three truth values fall within one posterior standard deviation of the posterior mean — the inference procedure correctly recovers the parameters that produced the data. This is a stronger validation than PPC alone, because it tests the inference machinery itself. A model that cannot recover known truth on data it generated can hardly be trusted on real data.
255+
The posterior mean for φ and p exactly recovers the generating values, and μ is within two posterior standard deviations. The Tweedie mean posterior is mildly asymmetric — a consequence of the long right tail in the data — so the simple ±1 SD check understates how well the posterior covers the truth. This is the expected pattern: with only 5,000 observations, the sample mean itself has a standard error of roughly $\sqrt{\phi \mu^p / n} \approx 16$, so the posterior's 95% interval (~274 ± 20) comfortably contains 293. On the full dataCar dataset (67,856 policies) the recovery would be considerably tighter. The broader point stands: the inference procedure recovers the parameters that produced the data. A model that cannot recover known truth on data it generated can hardly be trusted on real data.
253256

254257
??? tip "Computational Cost"
255258
Sampling 4 chains with 1000 draws each takes about 3 minutes for a 60k-observation model with nutpie on an Apple M3 (8-core CPU, 16 GB RAM) — the series expansion is the bottleneck, but it parallelizes across chains and observations. For comparison, a standard GLM with Pearson φ takes under a second.
@@ -485,6 +488,14 @@ Three practical recommendations:
485488
2. **Use the full likelihood** — the series expansion is numerically tractable and converges rapidly; there is no reason to settle for method-of-moments estimates
486489
3. **Validate with PPC** — if your model predicts 99%+ zeros when the data has 94%, the estimation method is likely the culprit, not the distribution
487490

491+
### Adverse Selection: The Balance-Sheet Stakes
492+
493+
Because the frequentist Pearson φ overinflates dispersion, traditional GLMs yield standard errors that are too wide — they overestimate the uncertainty around predicted claim amounts. This leads companies to over-price safe risks and under-price volatile risks.
494+
495+
The consequence is textbook adverse selection. A competitor using the stable Bayesian series model will immediately spot these mispriced segments. They will under-cut your price on the clean, profitable risks (the low-claim drivers you have over-priced), steadily stealing your best customers. Meanwhile, you are left with the under-priced, catastrophic risks — the drivers whose claim costs were systematically underestimated. Your loss ratio deteriorates from both directions simultaneously.
496+
497+
The Bayesian model provides the mathematical vaccine: by recovering the correct φ and p jointly with full posterior uncertainty, it prices each risk at its actual expected cost rather than the smeared, inflated dispersion of the Pearson pipeline. The PPC validates that the predictive distribution matches reality — so when the model says a segment has a 2.3% chance of a \$5,000+ claim, that number is trustworthy enough to set rates on.
498+
488499
## Related Work
489500

490501
The problem of Pearson φ failing for Tweedie models is acknowledged across the actuarial literature, but almost always in passing:
@@ -501,6 +512,8 @@ The problem of Pearson φ failing for Tweedie models is acknowledged across the
501512

502513
- **Wüthrich (2021)** compares Poisson-gamma vs Tweedie parametrizations rigorously ([Springer](https://link.springer.com/article/10.1007/s13385-021-00264-3)) and finds industry preference for the separate frequency-severity approach, noting dispersion modeling is the weak point of the single Tweedie GLM.
503514

515+
- **The saddlepoint approximation** is frequently cited as a closed-form alternative to the exact series for Tweedie density evaluation ([Dunn & Smyth, 2005](https://doi.org/10.1007/s11222-005-4070-y)). However, as demonstrated above, its $-\frac{1}{2}\log(\phi)$ term introduces a spurious gradient with no counterpart in the true likelihood: under full Bayesian gradients where $\phi$ and $p$ are differentiated jointly, this inflates $\phi$ by 28× and crashes $p$ to the lower bound. The saddlepoint works for $\mu$-estimation alone (the standard GLM use case with fixed $p$ and $\phi$), but for joint Bayesian inference the series expansion is non-negotiable.
516+
504517
This post fills the gap: we identify the mechanism (Pearson φ inflation), demonstrate it empirically on the dataCar dataset, show the correct Bayesian fix, and connect it to the pricing decisions that matter.
505518

506519
## Possible Extensions
@@ -509,8 +522,8 @@ This post fills the gap: we identify the mechanism (Pearson φ inflation), demon
509522
- **BART for the mean structure** — nonparametric mean estimation via [`pymc-bart`](https://www.pymc.io/projects/bart) for automatic interaction and nonlinearity detection
510523
- **Hurdle models** — separate models for claim frequency and severity for heavy-tailed data
511524
- **Double GLM (μ-φ DGLM)** — regressing dispersion $\phi$ on covariates could capture heteroskedasticity by risk class
512-
- **Hierarchical (partial pooling) models** — policies are nested in territories, vehicle types, and driver classes. PyMC's dims-based coordinate system makes random intercepts trivial to formulate: `pm.Normal("alpha_territory", mu=0, sigma=sigma_territory, dims="territory")` adds a partial-pooling term for every territory in a single line, with the group-level variance learned from data. The same pattern extends to random slopes and nested hierarchies. Sampling many group-level parameters is computationally demanding (more dimensions for the sampler), but the model specification is concise and natural.
513-
- **Bayesian optimization for pricing** — the posterior over (μ, φ, p) can drive pricing decisions under uncertainty. [`pymc.vectorize_over_posterior`](https://www.pymc.io/projects/docs/en/stable/api/generated/pymc.vectorize_over_posterior.html) takes the fitted model graph and posterior draws and returns a vectorized function over all draws — no manual looping or re-implementation. Use it to build an optimizer that evaluates pricing objectives (premium, deductible, risk retention) across the full posterior, giving a *distribution* over the optimal decision rather than a point estimate. This is the same technique behind PyMC-Marketing's MMM budget optimizer. The workflow of reusing the fitted model graph for downstream optimization was [pioneered by Ricardo Vieira in the PyMC ecosystem](https://www.youtube.com/watch?v=85jPmkMTfck).
525+
- **Hierarchical (partial pooling) models** — policies are nested in territories, vehicle types, and driver classes. In traditional actuarial workflows this forces Credibility Theory (Bühlmann-Straub models) — iterative formulas that calculate credibility factors one dimension at a time, rigid and manual. PyMC's dims-based partial-pooling formulation (`pm.Normal("alpha_territory", mu=0, sigma=sigma_territory, dims="territory")`) functions as **Automated, Multi-Dimensional Credibility**: it simultaneously calculates exact, mathematically sound credibility adjustments across nested, intersecting hierarchies, with all group-level variances learned from data. The same pattern extends to random slopes and nested hierarchies. Sampling many group-level parameters is computationally demanding (more dimensions for the sampler), but the model specification is concise and natural.
526+
- **Bayesian optimization for pricing** — the posterior over (μ, φ, p) can drive pricing decisions under uncertainty. The standard actuarial approach to risk-adjusted pricing under parameter uncertainty is nested Monte Carlo: an outer loop samples parameters, and for each set an inner loop simulates loss realizations — two layers of manual loops with no shared computation, custom and error-prone. [`pymc.vectorize_over_posterior`](https://www.pymc.io/projects/docs/en/stable/api/generated/pymc.vectorize_over_posterior.html) renders this obsolete: it takes the fitted model graph and posterior draws and returns a single vectorized function over all draws — PyTensor compiles the entire computation graph natively, with no manual looping or re-implementation. Use it to build an optimizer that evaluates pricing objectives (premium, deductible, risk retention) across the full posterior, giving a *distribution* over the optimal decision rather than a point estimate. This is the same technique behind PyMC-Marketing's MMM budget optimizer. The workflow of reusing the fitted model graph for downstream optimization was [pioneered by Ricardo Vieira in the PyMC ecosystem](https://www.youtube.com/watch?v=85jPmkMTfck).
514527

515528
## Reproducibility
516529

docs/blog/posts/scripts/pearson-phi-broken-tweedie/fig_prior_posterior.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ def tweedie_dist(mu, phi, p):
6262
"""Tweedie random draws via Poisson-Gamma compound (p ∈ (1, 2))."""
6363
lam = mu ** (2 - p) / (phi * (2 - p))
6464
alpha_term = (2 - p) / (p - 1)
65-
beta = phi * (p - 1) * mu ** (p - 1)
65+
beta = 1.0 / (phi * (p - 1) * mu ** (p - 1))
6666
N = pmd.Poisson.dist(mu=lam)
6767
Y = pmd.Gamma.dist(alpha=px.math.maximum(N * alpha_term, 1e-10), beta=beta)
6868
return px.math.where(N > 0, Y, 0.0)

docs/blog/posts/scripts/pearson-phi-broken-tweedie/time_sampling.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ def tweedie_dist(mu, phi, p):
5454
"""Tweedie random draws via Poisson-Gamma compound (p ∈ (1, 2))."""
5555
lam = mu ** (2 - p) / (phi * (2 - p))
5656
alpha_term = (2 - p) / (p - 1)
57-
beta = phi * (p - 1) * mu ** (p - 1)
57+
beta = 1.0 / (phi * (p - 1) * mu ** (p - 1))
5858
N = pmd.Poisson.dist(mu=lam)
5959
Y = pmd.Gamma.dist(alpha=px.math.maximum(N * alpha_term, 1e-10), beta=beta)
6060
return px.math.where(N > 0, Y, 0.0)
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
# /// script
2+
# dependencies = ["numpy", "pymc", "pytensor"]
3+
# ///
4+
5+
"""Verify the Tweedie dist function against theoretical values.
6+
7+
Compares the symbolic PyMC tweedie_dist (used by sample_prior_predictive
8+
and sample_posterior_predictive) against:
9+
- theoretical values (E[Y]=mu, Var[Y]=phi*mu^p, P(Y=0)=exp(-lambda))
10+
- the numpy tweedie_random reference (known correct)
11+
12+
Tests two versions: the blog post version (suspected bug) and the corrected
13+
version (beta as rate = 1/(phi*(p-1)*mu^(p-1))).
14+
"""
15+
16+
import sys
17+
18+
import numpy as np
19+
import pymc as pm
20+
import pymc.dims as pmd
21+
import pytensor
22+
import pytensor.tensor as pt
23+
import pytensor.xtensor as px
24+
25+
# ---- Import the known-correct numpy reference ----
26+
sys.path.insert(0, str(__import__("pathlib").Path(__file__).parent))
27+
from tweedie_utils import tweedie_random
28+
29+
30+
def tweedie_dist_buggy(mu, phi, p):
31+
"""Original blog post version (suspected wrong)."""
32+
lam = mu ** (2 - p) / (phi * (2 - p))
33+
alpha_term = (2 - p) / (p - 1)
34+
beta = phi * (p - 1) * mu ** (p - 1)
35+
N = pmd.Poisson.dist(mu=lam)
36+
Y = pmd.Gamma.dist(alpha=px.math.maximum(N * alpha_term, 1e-10), beta=beta)
37+
return px.math.where(N > 0, Y, 0.0)
38+
39+
40+
def tweedie_dist_correct(mu, phi, p):
41+
"""Corrected version: beta is rate = 1/scale."""
42+
lam = mu ** (2 - p) / (phi * (2 - p))
43+
alpha_term = (2 - p) / (p - 1)
44+
beta = 1.0 / (phi * (p - 1) * mu ** (p - 1))
45+
N = pmd.Poisson.dist(mu=lam)
46+
Y = pmd.Gamma.dist(alpha=px.math.maximum(N * alpha_term, 1e-10), beta=beta)
47+
return px.math.where(N > 0, Y, 0.0)
48+
49+
50+
def theoretical_values(mu, phi, p):
51+
"""Compute theoretical Tweedie moments."""
52+
lam = mu ** (2 - p) / (phi * (2 - p))
53+
zero_rate = np.exp(-lam)
54+
return {
55+
"mean": float(mu),
56+
"std": float(np.sqrt(phi * mu**p)),
57+
"zero_rate": float(zero_rate),
58+
}
59+
60+
61+
def draw_and_stats(dist, draws=50_000, rng=None):
62+
"""Draw from a symbolic dist and return mean, std, zero_rate."""
63+
samples = pm.draw(dist, draws=draws, random_seed=rng)
64+
samples = np.asarray(samples).ravel()
65+
return {
66+
"mean": float(np.mean(samples)),
67+
"std": float(np.std(samples)),
68+
"zero_rate": float(np.mean(samples == 0)),
69+
}
70+
71+
72+
def compare(name, stats, theo, tol_mean=0.05, tol_zero=0.01):
73+
"""Compare sampled stats against theoretical values."""
74+
results = []
75+
passed = True
76+
77+
for key in ("mean", "std", "zero_rate"):
78+
s, t = stats[key], theo[key]
79+
rel_err = abs(s - t) / max(abs(t), 1e-10)
80+
tol = tol_zero if key == "zero_rate" else tol_mean
81+
ok = rel_err < tol
82+
if not ok:
83+
passed = False
84+
status = "✓" if ok else "✗"
85+
results.append(f" {status} {key:<10s}: {s:>14.4f} (theory={t:>14.4f}, rel_err={rel_err:>8.4%})")
86+
87+
print(f"\n{'='*75}")
88+
print(f" {name}")
89+
print(f"{'='*75}")
90+
for r in results:
91+
print(r)
92+
print(f" {'ALL PASS' if passed else 'FAILURES DETECTED'}")
93+
return passed
94+
95+
96+
def compare_vs_numpy(name, pymc_stats, numpy_stats):
97+
"""Compare PyMC dist stats against numpy reference."""
98+
print(f"\n{'='*75}")
99+
print(f" {name} vs numpy reference")
100+
print(f"{'='*75}")
101+
all_ok = True
102+
for key in ("mean", "std", "zero_rate"):
103+
s_pymc, s_np = pymc_stats[key], numpy_stats[key]
104+
rel_diff = abs(s_pymc - s_np) / max(abs(s_np), 1e-10)
105+
ok = rel_diff < 0.05
106+
if not ok:
107+
all_ok = False
108+
status = "✓" if ok else "✗"
109+
print(f" {status} {key:<10s}: pymc={s_pymc:>14.4f} numpy={s_np:>14.4f} diff={rel_diff:>8.4%}")
110+
print(f" {'MATCHES NUMPY' if all_ok else 'DIVERGES FROM NUMPY'}")
111+
return all_ok
112+
113+
114+
def main():
115+
test_cases = [
116+
{"mu": 10.0, "phi": 2.0, "p": 1.5, "label": "μ=10, φ=2.0, p=1.5"},
117+
{"mu": 50.0, "phi": 1.5, "p": 1.3, "label": "μ=50, φ=1.5, p=1.3"},
118+
{"mu": 293.0, "phi": 174.0, "p": 1.574, "label": "μ=293, φ=174, p=1.574 (dataCar-like)"},
119+
]
120+
121+
draws = 50_000
122+
all_passed = True
123+
all_match_numpy = True
124+
125+
for tc in test_cases:
126+
mu = tc["mu"]
127+
phi = tc["phi"]
128+
p = tc["p"]
129+
label = tc["label"]
130+
131+
print(f"\n{'#'*75}")
132+
print(f"# {label}")
133+
print(f"{'#'*75}")
134+
135+
# Seed for reproducibility
136+
rng = np.random.default_rng(42)
137+
seed = 42
138+
139+
theo = theoretical_values(mu, phi, p)
140+
print(f"\n Theoretical: mean={theo['mean']:.4f}, std={theo['std']:.4f}, zero_rate={theo['zero_rate']:.4%}")
141+
142+
# ---- 1. Numpy reference ----
143+
np_samples = tweedie_random(mu, phi, p, size=draws, rng=rng)
144+
np_stats = {
145+
"mean": float(np.mean(np_samples)),
146+
"std": float(np.std(np_samples)),
147+
"zero_rate": float(np.mean(np_samples == 0)),
148+
}
149+
compare("numpy reference (tweedie_random)", np_stats, theo)
150+
151+
# ---- 2. Buggy PyMC dist ----
152+
dist_buggy = tweedie_dist_buggy(mu, phi, p)
153+
buggy_stats = draw_and_stats(dist_buggy, draws=draws, rng=seed)
154+
passed_buggy = compare("BUGGY tweedie_dist (blog post)", buggy_stats, theo)
155+
match_buggy = compare_vs_numpy("BUGGY tweedie_dist", buggy_stats, np_stats)
156+
if not passed_buggy:
157+
all_passed = False
158+
if not match_buggy:
159+
all_match_numpy = False
160+
161+
# ---- 3. Corrected PyMC dist ----
162+
dist_correct = tweedie_dist_correct(mu, phi, p)
163+
correct_stats = draw_and_stats(dist_correct, draws=draws, rng=seed)
164+
passed_correct = compare("CORRECT tweedie_dist", correct_stats, theo)
165+
match_correct = compare_vs_numpy("CORRECT tweedie_dist", correct_stats, np_stats)
166+
if not passed_correct:
167+
all_passed = False
168+
if not match_correct:
169+
all_match_numpy = False
170+
171+
# ---- Final verdict ----
172+
print(f"\n{'='*75}")
173+
print(f" SUMMARY")
174+
print(f"{'='*75}")
175+
if all_passed:
176+
print(f" Buggy version FAILED theory check — CONFIRMED BUG")
177+
else:
178+
print(f" Buggy version MAY have passed — UNEXPECTED, investigate")
179+
180+
print(f" Corrected version matches numpy reference: {'YES' if all_match_numpy else 'NO'}")
181+
182+
if all_passed:
183+
print(f"\n >>> Bug confirmed. Proceed with fix in 3 locations. <<<")
184+
else:
185+
print(f"\n >>> Unexpected results. Investigate before proceeding. <<<")
186+
187+
188+
if __name__ == "__main__":
189+
main()

0 commit comments

Comments
 (0)