Commit 0d7e46c
worker: Wave 3 P2 fixes — BugBash 2026-05-20
MR-P1-5 (T5 P0-3): reaper SELECT-then-deprovision races the upgrade
webhook clearing expires_at. Wrap the per-row check + deprovision in a
SERIALIZED FOR UPDATE tx so a concurrent subscription.charged that
clears expires_at + sets tier='pro' between batch SELECT and per-row
work cannot result in DROP DATABASE on the customer's just-paid DB.
The fix opens BeginTx per row, re-confirms the reaper predicate under
'SELECT EXISTS … FOR UPDATE OF r', and runs the deprovision + UPDATE
inside the same tx. The upgrade webhook either blocks on the row lock
(and then finds a 'deleted' row — its UPDATE is a no-op because
ElevateResourceTiersByTeam filters non-deleted statuses) or completes
before our SELECT (in which case the EXISTS returns false and the
reaper skips with metrics.ExpireRaceSkippedTotal.Inc()).
MR-P1-7 (T5 P1-7): reaper free-tier predicate ignored team deletion
state — a 'free' resource of a team in deletion_requested 30-day grace
was being reaped (DROP DATABASE) before the restore window elapsed.
Add LEFT JOIN teams + (r.team_id IS NULL OR t.status = 'active') to
the batch SELECT and to the per-row FOR UPDATE re-confirm (defense in
depth). The team-deletion executor is the authorized destructor for
that data path; the reaper stays out of it.
T8 worker followup: entitlement_reconciler Mongo arm — verify the
unsupported-MONGODB skip (provisioner returns {Applied:false,
SkipReason:'unsupported resource type for regrade'}) is logged at DEBUG
not WARN. Existing code already does this; added a regression test
(TestEntitlementReconciler_MongoArm_UnsupportedSkip_LogsAtDebug_NotWARN)
that captures slog records and fails loudly if mongo.regrade_skipped
ever fires at WARN — that path runs once per Mongo resource per 5min
tick indefinitely until provisioner.regradeMongo lands, so a WARN
would be ~12/h/resource of alert-fatigue spam.
T22 P2 (DEPLOY_DOMAIN/R2_ENDPOINT defaults) + T21 P1-5 (e2e_bypass
INFO spam): SKIPPED — both findings are in the api repo
(api/internal/config/config.go and api/internal/middleware/fingerprint.go
respectively), not in the worker module. The api repo has uncommitted
in-flight work (OTel tracing restoration, ~5 files); a worker-scoped
brief should not drag those in. Flagged in the report so a follow-up
api-repo Wave can claim them.
Coverage block (per CLAUDE.md rule 17):
Symptom: reaper race vs upgrade webhook → DROP paid DB; reaper
reaps free-row of deletion_requested team → restore
returns active account with no data
Enumeration: rg -n 'tier=.free.\|expires_at' worker/internal/jobs/
rg -n 'DeprovisionResource' worker/internal/jobs/expire.go
Sites found: 1 reaper SQL + 1 per-row deprovision call
Sites touched: both — batch SELECT now LEFT JOINs teams,
per-row deprovision wrapped in FOR UPDATE tx
Coverage test: TestExpireAnonymousWorker_T5_P0_3_UpgradeWebhookWinsRace
TestExpireAnonymousWorker_T5_P1_7_TeamInDeletion-
RequestedIsExcluded
TestExpireAnonymousWorker_T5_P1_7_PerRowGuard-
RedundantlyChecksTeamStatus
TestEntitlementReconciler_MongoArm_UnsupportedSkip-
_LogsAtDebug_NotWARN
Live verified: pending — pushes to origin/master auto-deploy; the
on-cluster effect is per-tick metric
instant_expire_race_skipped_total ≥ 0 (positive
signal when fires); behavioural verification waits
for a real subscription.charged collision (rare).
No regression in the existing P0-1a / paused-suspended /
free-tier expiry / storage-expiry test coverage.
Build/vet/test all green:
go build ./... → clean
go vet ./... → clean
go test ./... → ok (all packages, including 22.6s jobs suite)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent ebbdfef commit 0d7e46c
4 files changed
Lines changed: 606 additions & 156 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
999 | 999 | | |
1000 | 1000 | | |
1001 | 1001 | | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
| 1065 | + | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
| 1070 | + | |
| 1071 | + | |
| 1072 | + | |
| 1073 | + | |
| 1074 | + | |
| 1075 | + | |
| 1076 | + | |
| 1077 | + | |
| 1078 | + | |
| 1079 | + | |
| 1080 | + | |
| 1081 | + | |
| 1082 | + | |
| 1083 | + | |
| 1084 | + | |
0 commit comments