Commit da495a1
feat(jobs): propagation_runner — eager retry consumer for pending_propagations
Adds the worker-side consumer of the new pending_propagations queue
(api repo migration 058). Every 30s it picks eligible rows under
FOR UPDATE SKIP LOCKED, dispatches by kind, and:
- success → applied_at = now() + propagation.applied audit row (INFO)
- failure → attempts++, next_attempt_at = now() + exp_backoff,
last_error persisted, propagation.retrying audit row (DEBUG)
- maxAttempts → failed_at = now() + propagation.dead_lettered audit row
+ structured slog.Error (CRITICAL — NR alert keys on this)
Backoff schedule (cumulative ≈ 24h to dead-letter):
1m, 5m, 15m, 30m, 1h, 2h, 4h, 8h, 16h, 24h
Kind registry (rule 18 / CLAUDE.md):
tier_elevation → handleTierElevation
iterates the team's active resources, calls provisioner RegradeResource
per resource. Idempotent: provisioner's CONFIG GET / applied_conn_limit
guard makes re-runs of an already-regraded resource a no-op.
The existing entitlement_reconciler remains the eventually-consistent
5-min sweep backstop; this runner is the eager event-driven retry that
makes per-team retries durable + alert-able.
6 regression tests:
- EveryKindHasAHandler — rule 18 registry-iterating drift guard
- AppliesEligibleRow — happy path
- RetryOnFailure_PersistsBackoff — attempts++, backoff schedule, last_error
- DeadLettersAfterMaxAttempts — failed_at + audit row
- IdempotentReRun_AppliedRowSkipped — terminal rows invisible to picker
- BackoffSchedule_IsMonotonicAndClamps — pin schedule + clamp behaviour
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 3c05f31 commit da495a1
3 files changed
Lines changed: 1129 additions & 0 deletions
0 commit comments