Skip to content

Cache warmer thundering herd on rolling deploys (N pods warming in parallel) #59

@jensens

Description

@jensens

Summary

On Plone-pod startup, zodb_pgjsonb.cache_warmer preloads the top-N most-referenced objects into the L2 cache (target=8803 in the observed production site). This is valuable for request-latency SLOs after startup, but every pod runs the warmer independently in its own process, multiplying DB load by the replica count during the startup window.

Observed on aaf-prod (6 Plone backend pods, 3.9M-row catalog, b54 deploy):

INFO  zodb_pgjsonb.storage:356  Cache warmer started (target=8803, decay=0.8)
INFO  zodb_pgjsonb.cache_warmer:165  Cache warmer: loaded 8803 objects into L2

When a rolling deploy lands 3 pods within ~30s, the cache warmer runs 3× concurrently — each pod issues target object fetches against the same PostgreSQL primary. Combined with the parallel plone.pgcatalog schema-check + ANALYZE object_state chatter, this pushes the primary CPU from its baseline ~10 % to saturation (3.7/4 cores observed), feeding the cascade of slow /ok probes → sick backends → 503s until queues drain.

Why it's worse than it looks

  • The L2 cache is per-process. So the warmer does need to populate each pod's cache. You can't just run it on one pod and have others benefit.
  • But N pods starting concurrently all issuing the same preload queries produces no additional benefit over serialised preloading — they do the same work on the same hot rows, paying N× DB cost for no per-pod advantage.

Ideas

Two independent optimisations, either or both:

1. Defer cache warmer until after pod is Ready

Currently the warmer runs during Plone startup on the main thread path. The pod is not serving traffic yet, but the pod's container readiness probe is gated on Zope: Ready to handle requests which fires after the warmer completes. That means warmer load happens during the "vulnerable" startup window where DB headroom is tight.

Change: spawn the warmer as a background thread / async task that runs after the HTTP server starts accepting traffic. Pod reports Ready as soon as the server binds, warmer fills L2 in the background over the next N seconds.

Trade-off: first few requests land on a cold cache → slightly slower until warmer completes. Generally acceptable because Varnish absorbs most anonymous traffic.

2. Jittered / rate-limited warm-up

Even as a background task, N pods starting concurrently still hit the DB simultaneously. Add random jitter (0 .. N×interval) before warmer start + token-bucket pacing so at any moment only K pods are actively warming.

Jitter alone (cheap, single-line change) would already break the thundering-herd on rolling deploys: pod 1 starts warming at 0s, pod 2 at ~5s, pod 3 at ~12s, etc., so the DB load is smeared instead of stacked.

Pacing (slightly more involved) would cap concurrent warmers cluster-wide — e.g. via an advisory lock slot allocator or a small Redis-less leader election.

3. Opt-out / config knob

Allow deployments to disable the warmer entirely via environment variable or config flag for situations where Varnish absorbs most traffic anyway and the cold-cache penalty on the first few authenticated requests is acceptable.

Observed numbers

On our baseline (post-warm, steady state):

  • DB primary: ~0.4 cores
  • Warmer per pod: ~5s × ~8800 queries = ~1700 qps per warmer

6 pods starting in a 30s window: burst 6× 1700 = 10 kqps on a DB that normally handles ~100 qps. That's the ~90th-percentile CPU spike on every redeploy.

Related

  • plone-pgcatalog startup gate proposal (filed separately): reduces the DDL-probe herd but not the cache-warmer herd.

Environment

  • zodb-pgjsonb 1.11.x
  • PostgreSQL 16 (CloudNativePG), 3.9M rows
  • aaf-prod, 6 backend pods

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions