You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(billing): F1 reconciler orphan sweep + F6 entitlement drift signal
Billing-trust audit 2026-05-19 — worker-repo findings.
F1 (P0) — billing reconciler blind spot. The primary sweep starts from
teams WHERE stripe_customer_id IS NOT NULL — the persisted Razorpay
subscription id, written by a best-effort, non-fatal UPDATE at checkout.
If that write is lost and the customer then pays, the team is
structurally invisible to the reconciler forever: Razorpay bills the
card, the DB stays on free/hobby, nothing corrects it.
Fix: add a Razorpay-authoritative orphan sweep (billing_reconciler.go
runOrphanSweep) that runs after the primary sweep. It enumerates
pending_checkouts — which records the (subscription_id, team_id) pair
for EVERY checkout, independent of the lost UPDATE — fetches each live
Razorpay subscription, and elevates any team Razorpay reports
paid-and-active whose DB tier is still below the entitled tier. It then
backfills teams.stripe_customer_id so the team is visible to the primary
sweep thereafter. Fully fail-open: a query error logs and returns, a
per-checkout Razorpay error / circuit-open aborts or skips that row
only. No api-side change needed — pending_checkouts already carries the
pair the worker needs.
F6 (P2) — infra entitlement drift unalerted. The entitlement reconciler
corrected Postgres connection-cap / Redis maxmemory drift but only via a
generic per-resource INFO line, so monitoring could not alert on a
rising drift-correction rate. Fix: emit a dedicated WARN-level
jobs.entitlement_reconciler.drift_corrected signal + a 1:1
instant_entitlement_drift_corrected_total counter (labelled by
resource_type) on every applied correction, for both the Postgres and
Redis paths.
Tests (fail without the fix, pass with it):
- TestBillingReconciler_OrphanSweep_PaidTeamNoSubID_Recovered — paid
team with NULL stripe_customer_id is detected and elevated.
- TestBillingReconciler_OrphanSweep_AlreadyCorrectTier_NoUpgrade
- TestBillingReconciler_OrphanSweep_NonActiveStatus_NoUpgrade
- TestBillingReconciler_OrphanSweep_QueryFailure_FailOpen
- TestEntitlementReconciler_PostgresDriftCorrected_EmitsSignal
- TestEntitlementReconciler_RedisDriftCorrected_EmitsSignal
- TestEntitlementReconciler_NoDrift_NoSignal
The 14 existing billing-reconciler Work() tests updated for the new
orphan-sweep query (expectEmptyOrphanSweep helper).
New metrics: instant_billing_reconciler_orphan_scanned_total,
instant_billing_reconciler_orphan_corrected_total,
instant_entitlement_drift_corrected_total.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments