Skip to content

Commit b7eb64e

Browse files
jensensclaude
andauthored
docs(explanation): add caching chapter covering all layers (#160)
Maps the full cache chain from query dict to object: query result cache, prepared statements, request connection pool, zodb-pgjsonb LoadCache, ZODB Connection object cache, PG shared_buffers. Explains the invariant that brains and getObject results are deliberately not memoized (request-boundary hygiene, ZODB handles it in the right scope), and documents prefetch, invalidation rules, and every configuration knob. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d2da632 commit b7eb64e

3 files changed

Lines changed: 296 additions & 0 deletions

File tree

docs/.vale/styles/config/vocabularies/Project/accept.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,3 +246,7 @@ datetime
246246
[Pp]refetch(ed|er|es|ing)?
247247
[Pp]luggable
248248
roundtrips
249+
[Mm]emoiz(e|ed|es|ing|ation)
250+
[Uu]npickl(e|ed|es|ing)
251+
truthy
252+
unsortable
Lines changed: 291 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,291 @@
1+
<!-- diataxis: explanation -->
2+
3+
# Caching
4+
5+
Between a catalog query and the objects it returns, the result passes through
6+
several caches.
7+
Each cache has a different scope, a different lifetime, and a different
8+
invalidation rule.
9+
Understanding how they compose answers two recurring questions: why some
10+
queries are almost free after the first call, and why some things that
11+
could be cached deliberately are not.
12+
13+
This page maps the full chain from query dict to object, explains what
14+
each layer caches and when it lets go, and documents the invariant that
15+
brains do not memoize the resolved object.
16+
17+
## The layers at a glance
18+
19+
The table below lists every cache that a `catalog.searchResults(...)` call
20+
touches, in the order it walks through them.
21+
22+
| # | Layer | Owner | Scope | Lifetime and eviction |
23+
|---|-------|-------|-------|-----------------------|
24+
| 1 | Query result cache | plone.pgcatalog (`cache.py`) | Process | Cost-based LRU evict; whole cache cleared on TID change |
25+
| 2 | Prepared statement cache | psycopg | Connection | Connection lifetime; invalidated by schema changes |
26+
| 3 | Request connection pool | plone.pgcatalog (`pool.py`) | Request | Released at `IPubEnd` |
27+
| 4 | zodb-pgjsonb `LoadCache` | zodb-pgjsonb (`PGJsonbStorageInstance`) | ZODB Connection | LRU by bytes (`cache_local_mb`, default 16 MB); entries invalidated on TID change |
28+
| 5 | ZODB Connection object cache | ZODB | ZODB Connection | `cache-size` / `cache-size-bytes` in `zope.conf`; invalidation messages from storage |
29+
| 6 | PostgreSQL `shared_buffers` | PostgreSQL | Database process | PG lifetime; LRU |
30+
31+
Layers 1, 3, and part of the prefetch path belong to plone.pgcatalog.
32+
Layer 4 belongs to zodb-pgjsonb.
33+
Layers 5 and 6 are standard components that plone.pgcatalog relies on
34+
without controlling.
35+
36+
## What is not cached, and why
37+
38+
Two things you might expect to find cached are not, deliberately.
39+
40+
### Brains
41+
42+
Brains are rebuilt from scratch on every `searchResults` call, even when
43+
the underlying rows come from the query result cache (layer 1).
44+
The rebuild is cheap: a `PGCatalogBrain` holds one dict reference and two
45+
slots (`_row`, `_result_set`).
46+
Keeping brains disposable means no brain ever outlives the request that
47+
created it, which makes staleness across requests impossible by
48+
construction.
49+
50+
### The object returned by `getObject()`
51+
52+
`PGCatalogBrain.getObject()` does not memoize the resolved object.
53+
Every call traverses the ZODB tree again via
54+
`root.unrestrictedTraverse()` and `restrictedTraverse()`.
55+
The traversal is cheap in practice because the ZODB Connection cache
56+
(layer 5) already holds the unpickled instances along the path; all that
57+
a repeat call pays for is a fresh Acquisition wrapper chain.
58+
59+
The reason this memoization is avoided is that the brain, unlike most
60+
short-lived objects, could in principle survive the request that produced
61+
it.
62+
If a caller stashes brains in a session, a `plone.memoize` cache, or any
63+
other request-external container, a memoized object on the brain would
64+
go stale: traversal subscribers fire only during traversal, and some of
65+
the state they set up (security manager, site hook, language) is
66+
request-local.
67+
Keeping brains pure rules out that whole class of bugs.
68+
The place that legitimately caches unpickled instances is the ZODB
69+
Connection, where the cache is scoped to the connection and invalidated
70+
through the normal TID mechanism.
71+
72+
## How the layers compose
73+
74+
The Mermaid diagram below shows the sequence of cache lookups and
75+
misses for a typical request that runs a catalog query and then calls
76+
`getObject()` on one of the brains.
77+
78+
```{mermaid}
79+
:alt: Cache lookup sequence for a catalog query followed by getObject
80+
:caption: Query and getObject walk-through
81+
82+
sequenceDiagram
83+
participant V as View
84+
participant C as portal_catalog
85+
participant Q as Query cache (1)
86+
participant P as Prepared stmt (2)
87+
participant PG as PostgreSQL (6)
88+
participant B as Brain
89+
participant S as zodb-pgjsonb LoadCache (4)
90+
participant Z as ZODB Connection cache (5)
91+
92+
V->>C: searchResults(query)
93+
C->>Q: get(normalized_query, tid)
94+
alt Cache hit
95+
Q-->>C: cached rows
96+
else Cache miss
97+
C->>P: execute(sql, params)
98+
P->>PG: wire protocol
99+
PG-->>P: rows
100+
P-->>C: rows
101+
C->>Q: put(rows, cost_ms, tid)
102+
end
103+
C-->>V: CatalogSearchResults(brains)
104+
105+
V->>B: brain.getObject()
106+
B->>S: load_multiple(neighbourhood oids)
107+
note over S: Prefetch warms layer 4
108+
B->>Z: traverse path
109+
Z->>S: load(oid) per segment
110+
alt Bytes cached
111+
S-->>Z: pickle bytes
112+
else Bytes missing
113+
S->>PG: SELECT state FROM object_state
114+
PG-->>S: rows
115+
S-->>Z: pickle bytes
116+
end
117+
Z-->>B: aq-wrapped object
118+
B-->>V: object
119+
```
120+
121+
A few properties are worth pulling out of the diagram.
122+
123+
The query path ends at layer 1: on a cache hit no SQL is sent at all,
124+
and the brains are assembled from the cached row dicts.
125+
126+
The `getObject()` path never touches the query cache; it goes through
127+
ZODB and the zodb-pgjsonb storage instance.
128+
The two halves are coupled only through TID-based invalidation: when a
129+
ZODB commit bumps the TID, layer 1 drops all entries and layers 4 and 5
130+
receive invalidation messages for the specific OIDs.
131+
132+
## Prefetch: priming the byte cache
133+
134+
A listing that iterates over brains and calls `getObject()` on each would
135+
cause one `load()` per brain without prefetch, each a separate query to
136+
`object_state`.
137+
Prefetch turns that into a single `load_multiple()` for a neighbourhood
138+
window.
139+
140+
The mechanism is implemented in `CatalogSearchResults._maybe_prefetch_objects`.
141+
When the first `getObject()` call lands on a brain that belongs to a
142+
result set, the result set computes a half-open window
143+
`[i, i + PGCATALOG_PREFETCH_BATCH)` around the brain's position, issues
144+
one `SELECT ... FROM object_state WHERE zoid = ANY(...)`, and inserts
145+
the returned pickle bytes into the zodb-pgjsonb `LoadCache` (layer 4).
146+
Subsequent traversals for OIDs in the window find their bytes already
147+
cached and return without a database round-trip.
148+
A `_prefetched_ranges` set on the result set prevents re-fetching the
149+
same window twice.
150+
151+
What prefetch does and does not do:
152+
153+
- It warms only layer 4 (pickle bytes).
154+
- It does not unpickle, does not wrap with Acquisition, and does not
155+
traverse.
156+
The work that turns bytes into an object instance still happens in
157+
layer 5 when traversal actually accesses the segment.
158+
- It is idempotent: OIDs already present in the `LoadCache` are skipped
159+
inside `load_multiple()`.
160+
- It degrades gracefully: if the storage has no `load_multiple()` method
161+
(for example, a non-pgjsonb storage during testing), the prefetch call
162+
returns silently.
163+
164+
Disable prefetch by setting `PGCATALOG_PREFETCH_BATCH=0`.
165+
The default of 100 matches the most common Plone listing shapes
166+
(navigation trees, folder listings, news overviews) without keeping
167+
material amounts of state in memory.
168+
169+
## Invalidation matrix
170+
171+
The table below ties each write event to the caches it invalidates.
172+
173+
| Event | Layer 1 | Layer 4 | Layer 5 |
174+
|-------|---------|---------|---------|
175+
| Catalog write (`catalog_object`, `reindexObject`, `uncatalog_object`, move) | Cleared when `pgcatalog_change_seq` advances past `_last_tid` | Per-OID invalidate on TID change | Per-OID invalidate on TID change |
176+
| ZODB commit that does not touch the catalog (sessions, scales, annotations) | Not cleared (counter does not advance) | Per-OID invalidate on TID change | Per-OID invalidate on TID change |
177+
| `pack` (history-free or history-preserving) | Not cleared directly; next catalog write triggers clear | Per-OID invalidate as objects reload at new TIDs | Per-OID invalidate as objects reload at new TIDs |
178+
| DDL (new column, index created) | Not cleared | Not cleared | Not cleared |
179+
180+
Two entries deserve extra context.
181+
182+
The query cache uses a counter that only advances on catalog writes,
183+
which means `plone.memoize`-wrapped views that depend on catalog results
184+
keep their hit rate even on busy sites where the ZODB TID increments on
185+
every session write.
186+
This is the same trick that lets the tool expose a stable `getCounter()`
187+
to `plone.memoize.ram`.
188+
189+
DDL does not propagate to any cache automatically.
190+
A column added while a worker is running will not appear in queries
191+
issued by that worker's pooled connections until the prepared statement
192+
cache (layer 2) forgets the old plan, which typically means recycling
193+
the connection.
194+
In practice this only matters during upgrade steps; runtime DDL is not
195+
expected.
196+
197+
## Configuration
198+
199+
The knobs live in environment variables for plone.pgcatalog, in
200+
`zope.conf` sections for ZODB, and in `postgresql.conf` for PostgreSQL.
201+
202+
### plone.pgcatalog environment variables
203+
204+
`PGCATALOG_QUERY_CACHE_SIZE`
205+
: Maximum number of entries in the query result cache (layer 1).
206+
Default `200`.
207+
Set to `0` to disable.
208+
209+
`PGCATALOG_QUERY_CACHE_TTR`
210+
: Time-to-round, in seconds, for datetime parameters during cache key
211+
normalization (not a time-to-live).
212+
Default `60`.
213+
Two queries with `modified > now()` issued within the same minute
214+
hash to the same key and share a cache slot.
215+
Set to `0` to disable rounding.
216+
217+
`PGCATALOG_PREFETCH_BATCH`
218+
: Window size for `_maybe_prefetch_objects`.
219+
Default `100`.
220+
Set to `0` to disable prefetch entirely.
221+
222+
`PGCATALOG_SLOW_QUERY_MS`
223+
: Threshold in milliseconds above which a query is logged as slow and
224+
recorded in `pgcatalog_slow_queries`.
225+
Default `10`.
226+
227+
`PGCATALOG_LOG_ALL_QUERIES`
228+
: When truthy, log every query (not just slow ones) at `INFO` level.
229+
Off by default.
230+
Checked per query, so you can flip it at runtime without a restart.
231+
232+
### zope.conf
233+
234+
`cache-size` and `cache-size-bytes` control the ZODB Connection object
235+
cache (layer 5).
236+
This is the primary performance lever for warm-cache page loads; raising
237+
it is the single biggest win on large sites.
238+
See {doc}`performance` for concrete benchmark numbers.
239+
240+
### zodb-pgjsonb
241+
242+
The `cache_local_mb` option on the `<pgjsonb>` storage section sets the
243+
byte budget for layer 4, per ZODB Connection instance.
244+
Default is 16 MB.
245+
Each worker process typically holds several instances (one per
246+
open connection), so the actual resident memory is
247+
`workers * connections_per_worker * cache_local_mb`.
248+
249+
### PostgreSQL
250+
251+
`shared_buffers` sizes layer 6, and `work_mem` governs per-query sort
252+
and hash memory (which is not a cache but does affect whether a query
253+
spills to disk).
254+
Neither of these is plone.pgcatalog-specific; follow general PostgreSQL
255+
tuning advice for your workload.
256+
257+
## Debugging cache behavior
258+
259+
Cache stats for layer 1 are available through `get_query_cache().stats()`
260+
and in the ZMI under the catalog tool's management tabs.
261+
The output includes `hits`, `misses`, `hit_rate`, `invalidations`, the
262+
top entries by cost, and the `last_tid` the cache is pinned to.
263+
264+
When a query unexpectedly hits PostgreSQL on every call, the most common
265+
causes are: a datetime parameter that is not being rounded (check
266+
`PGCATALOG_QUERY_CACHE_TTR` and whether your query uses a type that
267+
implements `timeTime()`), a non-normalizable object in the query value
268+
(unsortable mixed types in a list), and frequent catalog writes on the
269+
same worker (counter advances faster than hits accumulate).
270+
271+
When `getObject()` is slower than expected for a warm request, first
272+
rule out layer 5 being undersized: if the ZODB cache is full, every
273+
traversal segment re-unpickles from layer 4 bytes.
274+
If that check passes, rule out prefetch being off
275+
(`PGCATALOG_PREFETCH_BATCH=0` or the brain being constructed outside a
276+
result set).
277+
278+
When you suspect cross-request staleness on an object returned by
279+
`getObject()`, remember that brains themselves hold no object state,
280+
and that layer 5 invalidates on TID change.
281+
Staleness in that path usually traces back to either a view that cached
282+
the result of `getObject()` in its own scope across requests, or to a
283+
`_v_` attribute written by a traversal subscriber on a persistent object
284+
that then survived in layer 5 until the next commit.
285+
286+
```{seealso}
287+
{doc}`performance` covers benchmark results and tuning for end-to-end
288+
query and `getObject()` latency.
289+
{doc}`architecture` describes the write path that drives catalog-side
290+
invalidation (`pgcatalog_change_seq`).
291+
```

docs/sources/explanation/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ architecture
1414
why-postgresql
1515
fulltext-search
1616
tika-extraction
17+
caching
1718
performance
1819
security
1920
bm25-design

0 commit comments

Comments
 (0)