|
| 1 | +<!-- diataxis: explanation --> |
| 2 | + |
| 3 | +# Caching |
| 4 | + |
| 5 | +Between a catalog query and the objects it returns, the result passes through |
| 6 | +several caches. |
| 7 | +Each cache has a different scope, a different lifetime, and a different |
| 8 | +invalidation rule. |
| 9 | +Understanding how they compose answers two recurring questions: why some |
| 10 | +queries are almost free after the first call, and why some things that |
| 11 | +could be cached deliberately are not. |
| 12 | + |
| 13 | +This page maps the full chain from query dict to object, explains what |
| 14 | +each layer caches and when it lets go, and documents the invariant that |
| 15 | +brains do not memoize the resolved object. |
| 16 | + |
| 17 | +## The layers at a glance |
| 18 | + |
| 19 | +The table below lists every cache that a `catalog.searchResults(...)` call |
| 20 | +touches, in the order it walks through them. |
| 21 | + |
| 22 | +| # | Layer | Owner | Scope | Lifetime and eviction | |
| 23 | +|---|-------|-------|-------|-----------------------| |
| 24 | +| 1 | Query result cache | plone.pgcatalog (`cache.py`) | Process | Cost-based LRU evict; whole cache cleared on TID change | |
| 25 | +| 2 | Prepared statement cache | psycopg | Connection | Connection lifetime; invalidated by schema changes | |
| 26 | +| 3 | Request connection pool | plone.pgcatalog (`pool.py`) | Request | Released at `IPubEnd` | |
| 27 | +| 4 | zodb-pgjsonb `LoadCache` | zodb-pgjsonb (`PGJsonbStorageInstance`) | ZODB Connection | LRU by bytes (`cache_local_mb`, default 16 MB); entries invalidated on TID change | |
| 28 | +| 5 | ZODB Connection object cache | ZODB | ZODB Connection | `cache-size` / `cache-size-bytes` in `zope.conf`; invalidation messages from storage | |
| 29 | +| 6 | PostgreSQL `shared_buffers` | PostgreSQL | Database process | PG lifetime; LRU | |
| 30 | + |
| 31 | +Layers 1, 3, and part of the prefetch path belong to plone.pgcatalog. |
| 32 | +Layer 4 belongs to zodb-pgjsonb. |
| 33 | +Layers 5 and 6 are standard components that plone.pgcatalog relies on |
| 34 | +without controlling. |
| 35 | + |
| 36 | +## What is not cached, and why |
| 37 | + |
| 38 | +Two things you might expect to find cached are not, deliberately. |
| 39 | + |
| 40 | +### Brains |
| 41 | + |
| 42 | +Brains are rebuilt from scratch on every `searchResults` call, even when |
| 43 | +the underlying rows come from the query result cache (layer 1). |
| 44 | +The rebuild is cheap: a `PGCatalogBrain` holds one dict reference and two |
| 45 | +slots (`_row`, `_result_set`). |
| 46 | +Keeping brains disposable means no brain ever outlives the request that |
| 47 | +created it, which makes staleness across requests impossible by |
| 48 | +construction. |
| 49 | + |
| 50 | +### The object returned by `getObject()` |
| 51 | + |
| 52 | +`PGCatalogBrain.getObject()` does not memoize the resolved object. |
| 53 | +Every call traverses the ZODB tree again via |
| 54 | +`root.unrestrictedTraverse()` and `restrictedTraverse()`. |
| 55 | +The traversal is cheap in practice because the ZODB Connection cache |
| 56 | +(layer 5) already holds the unpickled instances along the path; all that |
| 57 | +a repeat call pays for is a fresh Acquisition wrapper chain. |
| 58 | + |
| 59 | +The reason this memoization is avoided is that the brain, unlike most |
| 60 | +short-lived objects, could in principle survive the request that produced |
| 61 | +it. |
| 62 | +If a caller stashes brains in a session, a `plone.memoize` cache, or any |
| 63 | +other request-external container, a memoized object on the brain would |
| 64 | +go stale: traversal subscribers fire only during traversal, and some of |
| 65 | +the state they set up (security manager, site hook, language) is |
| 66 | +request-local. |
| 67 | +Keeping brains pure rules out that whole class of bugs. |
| 68 | +The place that legitimately caches unpickled instances is the ZODB |
| 69 | +Connection, where the cache is scoped to the connection and invalidated |
| 70 | +through the normal TID mechanism. |
| 71 | + |
| 72 | +## How the layers compose |
| 73 | + |
| 74 | +The Mermaid diagram below shows the sequence of cache lookups and |
| 75 | +misses for a typical request that runs a catalog query and then calls |
| 76 | +`getObject()` on one of the brains. |
| 77 | + |
| 78 | +```{mermaid} |
| 79 | +:alt: Cache lookup sequence for a catalog query followed by getObject |
| 80 | +:caption: Query and getObject walk-through |
| 81 | +
|
| 82 | +sequenceDiagram |
| 83 | + participant V as View |
| 84 | + participant C as portal_catalog |
| 85 | + participant Q as Query cache (1) |
| 86 | + participant P as Prepared stmt (2) |
| 87 | + participant PG as PostgreSQL (6) |
| 88 | + participant B as Brain |
| 89 | + participant S as zodb-pgjsonb LoadCache (4) |
| 90 | + participant Z as ZODB Connection cache (5) |
| 91 | +
|
| 92 | + V->>C: searchResults(query) |
| 93 | + C->>Q: get(normalized_query, tid) |
| 94 | + alt Cache hit |
| 95 | + Q-->>C: cached rows |
| 96 | + else Cache miss |
| 97 | + C->>P: execute(sql, params) |
| 98 | + P->>PG: wire protocol |
| 99 | + PG-->>P: rows |
| 100 | + P-->>C: rows |
| 101 | + C->>Q: put(rows, cost_ms, tid) |
| 102 | + end |
| 103 | + C-->>V: CatalogSearchResults(brains) |
| 104 | +
|
| 105 | + V->>B: brain.getObject() |
| 106 | + B->>S: load_multiple(neighbourhood oids) |
| 107 | + note over S: Prefetch warms layer 4 |
| 108 | + B->>Z: traverse path |
| 109 | + Z->>S: load(oid) per segment |
| 110 | + alt Bytes cached |
| 111 | + S-->>Z: pickle bytes |
| 112 | + else Bytes missing |
| 113 | + S->>PG: SELECT state FROM object_state |
| 114 | + PG-->>S: rows |
| 115 | + S-->>Z: pickle bytes |
| 116 | + end |
| 117 | + Z-->>B: aq-wrapped object |
| 118 | + B-->>V: object |
| 119 | +``` |
| 120 | + |
| 121 | +A few properties are worth pulling out of the diagram. |
| 122 | + |
| 123 | +The query path ends at layer 1: on a cache hit no SQL is sent at all, |
| 124 | +and the brains are assembled from the cached row dicts. |
| 125 | + |
| 126 | +The `getObject()` path never touches the query cache; it goes through |
| 127 | +ZODB and the zodb-pgjsonb storage instance. |
| 128 | +The two halves are coupled only through TID-based invalidation: when a |
| 129 | +ZODB commit bumps the TID, layer 1 drops all entries and layers 4 and 5 |
| 130 | +receive invalidation messages for the specific OIDs. |
| 131 | + |
| 132 | +## Prefetch: priming the byte cache |
| 133 | + |
| 134 | +A listing that iterates over brains and calls `getObject()` on each would |
| 135 | +cause one `load()` per brain without prefetch, each a separate query to |
| 136 | +`object_state`. |
| 137 | +Prefetch turns that into a single `load_multiple()` for a neighbourhood |
| 138 | +window. |
| 139 | + |
| 140 | +The mechanism is implemented in `CatalogSearchResults._maybe_prefetch_objects`. |
| 141 | +When the first `getObject()` call lands on a brain that belongs to a |
| 142 | +result set, the result set computes a half-open window |
| 143 | +`[i, i + PGCATALOG_PREFETCH_BATCH)` around the brain's position, issues |
| 144 | +one `SELECT ... FROM object_state WHERE zoid = ANY(...)`, and inserts |
| 145 | +the returned pickle bytes into the zodb-pgjsonb `LoadCache` (layer 4). |
| 146 | +Subsequent traversals for OIDs in the window find their bytes already |
| 147 | +cached and return without a database round-trip. |
| 148 | +A `_prefetched_ranges` set on the result set prevents re-fetching the |
| 149 | +same window twice. |
| 150 | + |
| 151 | +What prefetch does and does not do: |
| 152 | + |
| 153 | +- It warms only layer 4 (pickle bytes). |
| 154 | +- It does not unpickle, does not wrap with Acquisition, and does not |
| 155 | + traverse. |
| 156 | + The work that turns bytes into an object instance still happens in |
| 157 | + layer 5 when traversal actually accesses the segment. |
| 158 | +- It is idempotent: OIDs already present in the `LoadCache` are skipped |
| 159 | + inside `load_multiple()`. |
| 160 | +- It degrades gracefully: if the storage has no `load_multiple()` method |
| 161 | + (for example, a non-pgjsonb storage during testing), the prefetch call |
| 162 | + returns silently. |
| 163 | + |
| 164 | +Disable prefetch by setting `PGCATALOG_PREFETCH_BATCH=0`. |
| 165 | +The default of 100 matches the most common Plone listing shapes |
| 166 | +(navigation trees, folder listings, news overviews) without keeping |
| 167 | +material amounts of state in memory. |
| 168 | + |
| 169 | +## Invalidation matrix |
| 170 | + |
| 171 | +The table below ties each write event to the caches it invalidates. |
| 172 | + |
| 173 | +| Event | Layer 1 | Layer 4 | Layer 5 | |
| 174 | +|-------|---------|---------|---------| |
| 175 | +| Catalog write (`catalog_object`, `reindexObject`, `uncatalog_object`, move) | Cleared when `pgcatalog_change_seq` advances past `_last_tid` | Per-OID invalidate on TID change | Per-OID invalidate on TID change | |
| 176 | +| ZODB commit that does not touch the catalog (sessions, scales, annotations) | Not cleared (counter does not advance) | Per-OID invalidate on TID change | Per-OID invalidate on TID change | |
| 177 | +| `pack` (history-free or history-preserving) | Not cleared directly; next catalog write triggers clear | Per-OID invalidate as objects reload at new TIDs | Per-OID invalidate as objects reload at new TIDs | |
| 178 | +| DDL (new column, index created) | Not cleared | Not cleared | Not cleared | |
| 179 | + |
| 180 | +Two entries deserve extra context. |
| 181 | + |
| 182 | +The query cache uses a counter that only advances on catalog writes, |
| 183 | +which means `plone.memoize`-wrapped views that depend on catalog results |
| 184 | +keep their hit rate even on busy sites where the ZODB TID increments on |
| 185 | +every session write. |
| 186 | +This is the same trick that lets the tool expose a stable `getCounter()` |
| 187 | +to `plone.memoize.ram`. |
| 188 | + |
| 189 | +DDL does not propagate to any cache automatically. |
| 190 | +A column added while a worker is running will not appear in queries |
| 191 | +issued by that worker's pooled connections until the prepared statement |
| 192 | +cache (layer 2) forgets the old plan, which typically means recycling |
| 193 | +the connection. |
| 194 | +In practice this only matters during upgrade steps; runtime DDL is not |
| 195 | +expected. |
| 196 | + |
| 197 | +## Configuration |
| 198 | + |
| 199 | +The knobs live in environment variables for plone.pgcatalog, in |
| 200 | +`zope.conf` sections for ZODB, and in `postgresql.conf` for PostgreSQL. |
| 201 | + |
| 202 | +### plone.pgcatalog environment variables |
| 203 | + |
| 204 | +`PGCATALOG_QUERY_CACHE_SIZE` |
| 205 | +: Maximum number of entries in the query result cache (layer 1). |
| 206 | + Default `200`. |
| 207 | + Set to `0` to disable. |
| 208 | + |
| 209 | +`PGCATALOG_QUERY_CACHE_TTR` |
| 210 | +: Time-to-round, in seconds, for datetime parameters during cache key |
| 211 | + normalization (not a time-to-live). |
| 212 | + Default `60`. |
| 213 | + Two queries with `modified > now()` issued within the same minute |
| 214 | + hash to the same key and share a cache slot. |
| 215 | + Set to `0` to disable rounding. |
| 216 | + |
| 217 | +`PGCATALOG_PREFETCH_BATCH` |
| 218 | +: Window size for `_maybe_prefetch_objects`. |
| 219 | + Default `100`. |
| 220 | + Set to `0` to disable prefetch entirely. |
| 221 | + |
| 222 | +`PGCATALOG_SLOW_QUERY_MS` |
| 223 | +: Threshold in milliseconds above which a query is logged as slow and |
| 224 | + recorded in `pgcatalog_slow_queries`. |
| 225 | + Default `10`. |
| 226 | + |
| 227 | +`PGCATALOG_LOG_ALL_QUERIES` |
| 228 | +: When truthy, log every query (not just slow ones) at `INFO` level. |
| 229 | + Off by default. |
| 230 | + Checked per query, so you can flip it at runtime without a restart. |
| 231 | + |
| 232 | +### zope.conf |
| 233 | + |
| 234 | +`cache-size` and `cache-size-bytes` control the ZODB Connection object |
| 235 | +cache (layer 5). |
| 236 | +This is the primary performance lever for warm-cache page loads; raising |
| 237 | +it is the single biggest win on large sites. |
| 238 | +See {doc}`performance` for concrete benchmark numbers. |
| 239 | + |
| 240 | +### zodb-pgjsonb |
| 241 | + |
| 242 | +The `cache_local_mb` option on the `<pgjsonb>` storage section sets the |
| 243 | +byte budget for layer 4, per ZODB Connection instance. |
| 244 | +Default is 16 MB. |
| 245 | +Each worker process typically holds several instances (one per |
| 246 | +open connection), so the actual resident memory is |
| 247 | +`workers * connections_per_worker * cache_local_mb`. |
| 248 | + |
| 249 | +### PostgreSQL |
| 250 | + |
| 251 | +`shared_buffers` sizes layer 6, and `work_mem` governs per-query sort |
| 252 | +and hash memory (which is not a cache but does affect whether a query |
| 253 | +spills to disk). |
| 254 | +Neither of these is plone.pgcatalog-specific; follow general PostgreSQL |
| 255 | +tuning advice for your workload. |
| 256 | + |
| 257 | +## Debugging cache behavior |
| 258 | + |
| 259 | +Cache stats for layer 1 are available through `get_query_cache().stats()` |
| 260 | +and in the ZMI under the catalog tool's management tabs. |
| 261 | +The output includes `hits`, `misses`, `hit_rate`, `invalidations`, the |
| 262 | +top entries by cost, and the `last_tid` the cache is pinned to. |
| 263 | + |
| 264 | +When a query unexpectedly hits PostgreSQL on every call, the most common |
| 265 | +causes are: a datetime parameter that is not being rounded (check |
| 266 | +`PGCATALOG_QUERY_CACHE_TTR` and whether your query uses a type that |
| 267 | +implements `timeTime()`), a non-normalizable object in the query value |
| 268 | +(unsortable mixed types in a list), and frequent catalog writes on the |
| 269 | +same worker (counter advances faster than hits accumulate). |
| 270 | + |
| 271 | +When `getObject()` is slower than expected for a warm request, first |
| 272 | +rule out layer 5 being undersized: if the ZODB cache is full, every |
| 273 | +traversal segment re-unpickles from layer 4 bytes. |
| 274 | +If that check passes, rule out prefetch being off |
| 275 | +(`PGCATALOG_PREFETCH_BATCH=0` or the brain being constructed outside a |
| 276 | +result set). |
| 277 | + |
| 278 | +When you suspect cross-request staleness on an object returned by |
| 279 | +`getObject()`, remember that brains themselves hold no object state, |
| 280 | +and that layer 5 invalidates on TID change. |
| 281 | +Staleness in that path usually traces back to either a view that cached |
| 282 | +the result of `getObject()` in its own scope across requests, or to a |
| 283 | +`_v_` attribute written by a traversal subscriber on a persistent object |
| 284 | +that then survived in layer 5 until the next commit. |
| 285 | + |
| 286 | +```{seealso} |
| 287 | +{doc}`performance` covers benchmark results and tuning for end-to-end |
| 288 | +query and `getObject()` latency. |
| 289 | +{doc}`architecture` describes the write path that drives catalog-side |
| 290 | +invalidation (`pgcatalog_change_seq`). |
| 291 | +``` |
0 commit comments