Summary
On Plone-pod startup, zodb_pgjsonb.cache_warmer preloads the top-N most-referenced objects into the L2 cache (target=8803 in the observed production site). This is valuable for request-latency SLOs after startup, but every pod runs the warmer independently in its own process, multiplying DB load by the replica count during the startup window.
Observed on aaf-prod (6 Plone backend pods, 3.9M-row catalog, b54 deploy):
INFO zodb_pgjsonb.storage:356 Cache warmer started (target=8803, decay=0.8)
INFO zodb_pgjsonb.cache_warmer:165 Cache warmer: loaded 8803 objects into L2
When a rolling deploy lands 3 pods within ~30s, the cache warmer runs 3× concurrently — each pod issues target object fetches against the same PostgreSQL primary. Combined with the parallel plone.pgcatalog schema-check + ANALYZE object_state chatter, this pushes the primary CPU from its baseline ~10 % to saturation (3.7/4 cores observed), feeding the cascade of slow /ok probes → sick backends → 503s until queues drain.
Why it's worse than it looks
- The L2 cache is per-process. So the warmer does need to populate each pod's cache. You can't just run it on one pod and have others benefit.
- But N pods starting concurrently all issuing the same preload queries produces no additional benefit over serialised preloading — they do the same work on the same hot rows, paying N× DB cost for no per-pod advantage.
Ideas
Two independent optimisations, either or both:
1. Defer cache warmer until after pod is Ready
Currently the warmer runs during Plone startup on the main thread path. The pod is not serving traffic yet, but the pod's container readiness probe is gated on Zope: Ready to handle requests which fires after the warmer completes. That means warmer load happens during the "vulnerable" startup window where DB headroom is tight.
Change: spawn the warmer as a background thread / async task that runs after the HTTP server starts accepting traffic. Pod reports Ready as soon as the server binds, warmer fills L2 in the background over the next N seconds.
Trade-off: first few requests land on a cold cache → slightly slower until warmer completes. Generally acceptable because Varnish absorbs most anonymous traffic.
2. Jittered / rate-limited warm-up
Even as a background task, N pods starting concurrently still hit the DB simultaneously. Add random jitter (0 .. N×interval) before warmer start + token-bucket pacing so at any moment only K pods are actively warming.
Jitter alone (cheap, single-line change) would already break the thundering-herd on rolling deploys: pod 1 starts warming at 0s, pod 2 at ~5s, pod 3 at ~12s, etc., so the DB load is smeared instead of stacked.
Pacing (slightly more involved) would cap concurrent warmers cluster-wide — e.g. via an advisory lock slot allocator or a small Redis-less leader election.
3. Opt-out / config knob
Allow deployments to disable the warmer entirely via environment variable or config flag for situations where Varnish absorbs most traffic anyway and the cold-cache penalty on the first few authenticated requests is acceptable.
Observed numbers
On our baseline (post-warm, steady state):
- DB primary: ~0.4 cores
- Warmer per pod: ~5s × ~8800 queries = ~1700 qps per warmer
6 pods starting in a 30s window: burst 6× 1700 = 10 kqps on a DB that normally handles ~100 qps. That's the ~90th-percentile CPU spike on every redeploy.
Related
- plone-pgcatalog startup gate proposal (filed separately): reduces the DDL-probe herd but not the cache-warmer herd.
Environment
- zodb-pgjsonb 1.11.x
- PostgreSQL 16 (CloudNativePG), 3.9M rows
- aaf-prod, 6 backend pods
Summary
On Plone-pod startup,
zodb_pgjsonb.cache_warmerpreloads the top-N most-referenced objects into the L2 cache (target=8803 in the observed production site). This is valuable for request-latency SLOs after startup, but every pod runs the warmer independently in its own process, multiplying DB load by the replica count during the startup window.Observed on aaf-prod (6 Plone backend pods, 3.9M-row catalog, b54 deploy):
When a rolling deploy lands 3 pods within ~30s, the cache warmer runs 3× concurrently — each pod issues
targetobject fetches against the same PostgreSQL primary. Combined with the parallelplone.pgcatalogschema-check +ANALYZE object_statechatter, this pushes the primary CPU from its baseline ~10 % to saturation (3.7/4 cores observed), feeding the cascade of slow/okprobes → sick backends → 503s until queues drain.Why it's worse than it looks
Ideas
Two independent optimisations, either or both:
1. Defer cache warmer until after pod is Ready
Currently the warmer runs during Plone startup on the main thread path. The pod is not serving traffic yet, but the pod's container readiness probe is gated on
Zope: Ready to handle requestswhich fires after the warmer completes. That means warmer load happens during the "vulnerable" startup window where DB headroom is tight.Change: spawn the warmer as a background thread / async task that runs after the HTTP server starts accepting traffic. Pod reports Ready as soon as the server binds, warmer fills L2 in the background over the next N seconds.
Trade-off: first few requests land on a cold cache → slightly slower until warmer completes. Generally acceptable because Varnish absorbs most anonymous traffic.
2. Jittered / rate-limited warm-up
Even as a background task, N pods starting concurrently still hit the DB simultaneously. Add random jitter
(0 .. N×interval)before warmer start + token-bucket pacing so at any moment only K pods are actively warming.Jitter alone (cheap, single-line change) would already break the thundering-herd on rolling deploys: pod 1 starts warming at 0s, pod 2 at ~5s, pod 3 at ~12s, etc., so the DB load is smeared instead of stacked.
Pacing (slightly more involved) would cap concurrent warmers cluster-wide — e.g. via an advisory lock slot allocator or a small Redis-less leader election.
3. Opt-out / config knob
Allow deployments to disable the warmer entirely via environment variable or config flag for situations where Varnish absorbs most traffic anyway and the cold-cache penalty on the first few authenticated requests is acceptable.
Observed numbers
On our baseline (post-warm, steady state):
6 pods starting in a 30s window: burst 6× 1700 = 10 kqps on a DB that normally handles ~100 qps. That's the ~90th-percentile CPU spike on every redeploy.
Related
Environment