Near-threshold density: calibrate to OBR/HMRC £1k-band counts instead of leaving within-band shape to the optimizer

## Problem

On the corrected model (#21), the weight optimiser places more mass just above £85k than below it (local step **up** ≈ +35% across the threshold; visible in `turnover_distribution_85k.png` and the Section 5 figure). This is economically backwards as a picture of the real world: administrative data (OBR EFO Mar-2023 Chart C; Liu et al. 2021) show excess mass **below** the threshold and a hole above.

Both the old step-down (produced by the mis-scaled liabilities) and the new step-up are optimizer equilibria, not behaviour — the paper says so and no costing reads the local shape behaviourally. Shape-sensitivity is small for the headline numbers: the [85k,90k) anchor band is £235.7m (direct) vs £233.5m (smooth counterfactual ignoring local shape), 0.9% apart; [85k,100k) is −698.2 vs −686.1 (1.8%). But the figure invites misreading, and within-band shape should not be an optimizer side-effect at all.

## Proposed fix: calibrate the near-threshold region to published £1k-band counts

The OBR March-2023 EFO Chart C underlying data (HMRC £1,000-band counts of businesses, £65k–£90k, outturn years + projections) is already in this repo (`data/processed/obr_vat_bunching.csv`, plotted by `scripts/plot_obr_bunching.py`). Add these fine near-threshold band counts as calibration targets (2023-24-appropriate series; rescale to the coarse-band totals to avoid double counting), so:

- the synthetic file reproduces the administratively observed bunching shape **as an explicit, cited target** — consistent with the paper's transparency framework (target-inherited shape, no behavioural claims, placebo still applies);
- the [85k,90k) mass — the anchor band — is disciplined by real data rather than an optimizer equilibrium;
- the Section 5 story sharpens: the estimator applied to the calibrated file recovers the **target-inherited** step, and the placebo (remove the OBR targets) collapses it, exactly the pattern the paper documents.

Universe caveat to handle explicitly: the OBR/HMRC chart counts are VAT-population based (registered traders incl. voluntary), narrower than the ONS registered-business frame below the threshold; use the series as a within-band **shape** target (relative £1k-band densities), not as level targets.

## Alternative (weaker)

If the OBR-target route stalls: impose a smooth monotone within-band density prior near band edges (penalise weight-density curvature within ±£10k of any calibration band edge), so within-band shape is a stated modelling assumption rather than an optimizer artifact. Doesn't reproduce real bunching, but removes the misleading spike.

Refs: #15, #21 (paper rewrite), the two-vintage spurious-signal exhibit in Section 5.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Near-threshold density: calibrate to OBR/HMRC £1k-band counts instead of leaving within-band shape to the optimizer #23

Problem

Proposed fix: calibrate the near-threshold region to published £1k-band counts

Alternative (weaker)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Near-threshold density: calibrate to OBR/HMRC £1k-band counts instead of leaving within-band shape to the optimizer #23

Description

Problem

Proposed fix: calibrate the near-threshold region to published £1k-band counts

Alternative (weaker)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions