Skip to content

Commit 8bcc320

Browse files
feat(worker): Layer-3 payment prober — the continuous money heartbeat (flag-gated OFF) (#98)
Builds the Layer-3 payment-health synthetic the tooling forum identified as the real gap (`grep instant_payment_probe` → none today). Forum verdict docs/ci/FORUM-PAYMENT-E2E-TOOLING.md §4: the fastest, most-deterministic money-path signal is an in-cluster iframe-free Go prober, NOT a browser driver. Mirrors the auth_probe / deploy_probe / flow_synthetic pattern exactly. payment_probe.go — a River periodic job (every 5 min), flag-gated PAYMENT_PROBE_ENABLED (default OFF, fully inert until the operator lights it): Prod-safe legs (non-charging, contract-only): - checkout_reachable — POST /api/v1/billing/checkout → non-5xx (a 402/409/502 blocked-but-alive shape is a PASS while Razorpay live-recurring is operator-blocked; only a 5xx crash fails). - billing_state — GET /api/v1/billing → non-5xx. - invoices_reachable — GET /api/v1/billing/invoices → non-5xx. - webhook_security — POST /razorpay/webhook with a garbage UNSIGNED payload MUST be rejected 400 invalid_signature (positive proof the signature gate is live; an accepted unsigned payload is a CRITICAL fail). Optional upgrade leg (only when the TEST webhook secret + test plan id are set; skips clean/degraded otherwise — NO live Razorpay, NO real money): - upgrade_webhook_e2e — mint a fresh is_test_cohort=true team → inject a correctly-signed TEST-mode subscription.charged (HMAC-SHA256 raw body, the api's verifier scheme) → assert teams.plan_tier flipped (the rule-12 downstream truth surface, NOT the webhook 200) → reap the cohort team (always, even on failure). Observability (rule 25, infra PR ships in lockstep): instant_payment_probe_outcome_total{leg,result} + instant_payment_probe_latency_seconds{leg} (lazy *Vec, primed in metrics_test.go) + the InstantPaymentProbe NR event (cohort=synthetic, excluded from billing/revenue dashboards) + an audit_log row + structured ERROR slog line on fail. result="degraded" is the config-unset / slow-but-correct state and never pages — so the prober is inert AND non-paging until the operator wires the flag + (for the upgrade leg) the test webhook secret. Tests: flag-off proven inert (zero probes, no HTTP, no DB); each leg's pass/fail/degraded outcome; the unsigned-webhook-rejected + accepted-unsigned security cases; the upgrade tier-flip pass + the no-flip / webhook-non-200 fail (rule-12 discipline); cohort reap on every upgrade path; signer parity (64-hex HMAC matching the api verifier); the leg vocabulary registry; the panic boundary. 97.9% per-func coverage on payment_probe.go. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 247ef49 commit 8bcc320

7 files changed

Lines changed: 2168 additions & 0 deletions

File tree

internal/config/config.go

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,24 @@ type Config struct {
180180
FlowSyntheticDisabled string // FLOW_SYNTHETIC_DISABLED — comma list of per-flow kill switches
181181
FlowSyntheticJWTSecret string // JWT_SECRET — shared with api; mints the synthetic session JWT
182182

183+
// Layer-3 payment prober (payment_probe.go) — the money heartbeat. Drives the
184+
// iframe-free payment-funnel contract path against prod every 5 min: checkout
185+
// reachability + billing/invoices read surfaces + the webhook signature
186+
// security contract, plus an OPTIONAL test-mode upgrade proof. INERT unless
187+
// PaymentProbeEnabled is true (the master flag) — a single env flip turns the
188+
// whole prober off. JWTSecret reuses JWT_SECRET to mint the Brevo-free session
189+
// JWT for the authed prod-safe legs. The upgrade leg is gated on
190+
// PaymentProbeTestWebhookSecret + PaymentProbeTestPlanIDPro being set (skips
191+
// clean otherwise) and drives NO live Razorpay / no real money. NEVER wire the
192+
// LIVE webhook secret here.
193+
PaymentProbeEnabled bool // PAYMENT_PROBE_ENABLED — master flag (default false)
194+
PaymentProbeBaseURL string // PAYMENT_PROBE_BASE_URL — default https://api.instanode.dev
195+
PaymentProbeJWTSecret string // JWT_SECRET — shared with api; mints the synthetic session JWT
196+
PaymentProbeEmail string // PAYMENT_PROBE_EMAIL — synthetic team primary-user email
197+
PaymentProbeTier string // PAYMENT_PROBE_TIER — seeded tier (default free)
198+
PaymentProbeTestWebhookSecret string // RAZORPAY_TEST_WEBHOOK_SECRET — gates the optional upgrade leg (TEST secret only, never live)
199+
PaymentProbeTestPlanIDPro string // PAYMENT_PROBE_TEST_PLAN_ID_PRO — Razorpay TEST plan_id resolving to "pro"
200+
183201
// Scale-to-zero idle-scaler (deploy_idle_scaler.go, Task #54). INERT unless
184202
// DeployScaleToZeroEnabled is true — the master flag (shared name with the
185203
// api's wake-path flag). When off, the idle-scaler sweep is a no-op (no k8s
@@ -305,6 +323,20 @@ func Load() *Config {
305323
FlowSyntheticDisabled: os.Getenv("FLOW_SYNTHETIC_DISABLED"),
306324
FlowSyntheticJWTSecret: os.Getenv("JWT_SECRET"),
307325

326+
// Layer-3 payment prober. INERT unless PAYMENT_PROBE_ENABLED=true
327+
// (default off, the DoD habit). JWTSecret reuses JWT_SECRET so the worker
328+
// mints a Brevo-free session JWT the api verifies against. The upgrade leg
329+
// is gated on the TEST webhook secret + test plan id (skips clean
330+
// otherwise) — NEVER the live secret, NEVER a real charge. Defaults applied
331+
// inside jobs.PaymentProbeConfig.Defaults().
332+
PaymentProbeEnabled: os.Getenv("PAYMENT_PROBE_ENABLED") == "true",
333+
PaymentProbeBaseURL: os.Getenv("PAYMENT_PROBE_BASE_URL"),
334+
PaymentProbeJWTSecret: os.Getenv("JWT_SECRET"),
335+
PaymentProbeEmail: os.Getenv("PAYMENT_PROBE_EMAIL"),
336+
PaymentProbeTier: os.Getenv("PAYMENT_PROBE_TIER"),
337+
PaymentProbeTestWebhookSecret: os.Getenv("RAZORPAY_TEST_WEBHOOK_SECRET"),
338+
PaymentProbeTestPlanIDPro: os.Getenv("PAYMENT_PROBE_TEST_PLAN_ID_PRO"),
339+
308340
// Scale-to-zero idle-scaler (Task #54). Default OFF; idle threshold
309341
// default 30 min (parsed below).
310342
DeployScaleToZeroEnabled: os.Getenv("DEPLOY_SCALE_TO_ZERO_ENABLED") == "true",

0 commit comments

Comments
 (0)