Commit e7d23c6
fix(customer-backup): stop recovery UPDATE binding NULL to NOT-NULL started_at + classify mongo/redis auth failures (#106)
BUG 1 (P1 — recovery wedged + log flood): recoverStuckRows ran
`UPDATE resource_backups SET status='pending', started_at=NULL ...`
but resource_backups.started_at is `TIMESTAMPTZ NOT NULL` (api mig
031_backups.sql). The explicit NULL bind violated the constraint on
EVERY 30-60s tick, so stuck-row recovery NEVER worked and the worker
log flooded with `stuck_row_recovery_failed: pq: null value in column
"started_at" ... violates not-null constraint`. Drop the started_at
clause: leaving the stale value is correct because the re-claim in
processBackup runs `SET started_at=now()`, and the WHERE floor only
re-matches rows that have been 'running' again past the timeout. The
old regex-matcher test stayed green because sqlmock doesn't enforce
NOT NULL — the new test inspects the literal SQL and reds on NULL.
BUG 2 (misclassification surfaced by the 2026-06-11 mongo backup P1):
backupFailReason only knew Postgres auth strings, so mongodump
("auth error: ... SCRAM-SHA-256") and redis-cli ("WRONGPASS",
"NOAUTH") credential failures were bucketed as transient "dump" —
telling the customer "briefly unreachable, we'll retry" for a
non-self-healing credential failure and NOT paging ops. Extend the
classifier to match all three backends' auth dialects.
Note: the customer's actual mongo backup failure is an underlying
DATA/INFRA incident (the shared mongo pod OOMKilled and lost
db_poold8ad5fc1 + its user) — operator action required; this commit
fixes the two CODE defects (broken recovery + mislabel) the incident
exposed.
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>1 parent 6692610 commit e7d23c6
3 files changed
Lines changed: 125 additions & 8 deletions
File tree
- internal/jobs
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
30 | 39 | | |
31 | 40 | | |
| 41 | + | |
| 42 | + | |
32 | 43 | | |
33 | 44 | | |
34 | 45 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
351 | 361 | | |
352 | 362 | | |
353 | 363 | | |
354 | | - | |
355 | 364 | | |
356 | 365 | | |
357 | 366 | | |
| |||
592 | 601 | | |
593 | 602 | | |
594 | 603 | | |
595 | | - | |
596 | | - | |
597 | | - | |
598 | | - | |
599 | | - | |
600 | | - | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
601 | 617 | | |
602 | 618 | | |
603 | 619 | | |
604 | 620 | | |
605 | 621 | | |
606 | 622 | | |
| 623 | + | |
607 | 624 | | |
608 | 625 | | |
609 | 626 | | |
610 | 627 | | |
611 | | - | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
612 | 641 | | |
613 | 642 | | |
614 | 643 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
425 | 426 | | |
426 | 427 | | |
427 | 428 | | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
428 | 505 | | |
429 | 506 | | |
430 | 507 | | |
| |||
0 commit comments