Commit b10df7e
fix(reaper+reconciler): MR-P0-1 + MR-P0-2 — stop the customer-namespace leak; wire real prober
BugBash 2026-05-20 P0s (cross-confirmed across 4 tracks — T1/T5/T20/T24).
The MR-P0-1 leak put 188 instant-customer-* namespaces, each running a live
Postgres/Redis/Mongo pod with no DB record, into the prod cluster.
MR-P0-1a (the headline finding): the reaper used to mark a resources row
status='deleted' EVEN WHEN provisioner.DeprovisionResource had returned an
error. A 'deleted' row is terminal and invisible to every reconciler, so the
backend (customer namespace + its live pod) was orphaned forever. Now the
reaper only advances the row to 'deleted' when the backend teardown
genuinely succeeded; on failure the row stays in its reapable status and the
next tick retries. A new instant_expire_deprovision_failed_total counter
surfaces a sustained leak rate to NR. Added a deprovisioner-seam interface
(ResourceDeprovisioner) so the regression test can inject a failing fake.
MR-P0-1b: the orphan_sweep_reconciler's PASS 3 only swept instant-deploy-*
namespaces — there was no sweep for instant-customer-*. New PASS 4 lists
every instant-customer-<token> namespace and deletes any whose token has no
active/paused/suspended resources row. Fail-open posture mirrors PASS 3 (RBAC
forbidden / DB blip → one WARN, zero orphans this sweep). This is the
durable backstop that stops the leak from recurring; 121 of the 188 leaked
namespaces had to be cleaned up by hand.
MR-P0-2: workers.go constructed the provisioner_reconciler with nil → the
fallback NoopProber whose every Probe returns ProbeReachable. The
reconciler would blindly promote stuck 'pending' rows to status='active'
WITHOUT checking the backend. Now wired with the same NewRealProber(cfg)
that resource_heartbeat already uses two lines below.
Regression tests (each fails without the fix, verified):
P0-1a → TestExpireAnonymousWorker_P0_1a_DeprovisionFailure_DoesNotMarkDeleted
(+ companion: ..._DeprovisionSuccess_StillMarksDeleted)
P0-1b → TestOrphanSweep_Pass4_ReclaimsOrphanedCustomerNamespace
(+ companion: TestOrphanSweep_Pass4_NoCustomerNamespaces_NoQuery)
P0-2 → TestProvisionerReconciler_P0_2_RealProberUnreachable_DoesNotPromote
+ TestProvisionerReconciler_StartWorkersCallSite_PassesRealProber
(call-site text-grep binding so a future nil revert fails CI)
+ TestProvisionerReconciler_WorkerKeepsRealProber
go build / go vet / go test ./... all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 7169493 commit b10df7e
9 files changed
Lines changed: 691 additions & 34 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
45 | 58 | | |
46 | 59 | | |
47 | 60 | | |
48 | 61 | | |
49 | 62 | | |
50 | 63 | | |
51 | 64 | | |
52 | | - | |
53 | | - | |
| 65 | + | |
| 66 | + | |
54 | 67 | | |
55 | 68 | | |
56 | 69 | | |
| |||
65 | 78 | | |
66 | 79 | | |
67 | 80 | | |
68 | | - | |
69 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
70 | 85 | | |
71 | 86 | | |
72 | 87 | | |
73 | 88 | | |
74 | 89 | | |
75 | 90 | | |
76 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
77 | 109 | | |
78 | 110 | | |
79 | 111 | | |
| |||
160 | 192 | | |
161 | 193 | | |
162 | 194 | | |
163 | | - | |
164 | | - | |
165 | | - | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
166 | 208 | | |
167 | 209 | | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
168 | 215 | | |
169 | 216 | | |
170 | 217 | | |
| |||
199 | 246 | | |
200 | 247 | | |
201 | 248 | | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
202 | 254 | | |
203 | 255 | | |
204 | 256 | | |
205 | 257 | | |
206 | 258 | | |
207 | 259 | | |
| 260 | + | |
208 | 261 | | |
209 | 262 | | |
210 | 263 | | |
211 | 264 | | |
212 | 265 | | |
213 | 266 | | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
214 | 278 | | |
215 | 279 | | |
216 | 280 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
5 | 7 | | |
6 | 8 | | |
7 | 9 | | |
| |||
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
| 15 | + | |
13 | 16 | | |
14 | 17 | | |
15 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
16 | 44 | | |
17 | 45 | | |
18 | 46 | | |
| |||
293 | 321 | | |
294 | 322 | | |
295 | 323 | | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
98 | 115 | | |
99 | 116 | | |
100 | | - | |
| 117 | + | |
101 | 118 | | |
102 | 119 | | |
103 | 120 | | |
104 | 121 | | |
105 | | - | |
106 | | - | |
| 122 | + | |
107 | 123 | | |
108 | 124 | | |
109 | 125 | | |
| |||
0 commit comments