Skip to content

Commit d813fdb

Browse files
jensensclaude
andauthored
docs: update reference, how-to, and explanation docs for b21-b28 features (#55) (#56)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 22afc0b commit d813fdb

4 files changed

Lines changed: 41 additions & 8 deletions

File tree

docs/sources/explanation/performance.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,3 +235,31 @@ the current floor are:
235235
queries per request.
236236
- **Increase batch sizes.** For bulk operations (reindex, migration), larger batches
237237
amortize per-statement overhead.
238+
239+
### Composite indexes for multi-field queries
240+
241+
PostgreSQL rarely combines individual single-column indexes via BitmapAnd.
242+
When a catalog query filters on multiple fields simultaneously (the common
243+
case—folder listings filter on `path_parent` + `portal_type` +
244+
`allowedRolesAndUsers`), PG picks one index and sequentially filters the
245+
rest. On a 137K-object database this means 3+ second query times.
246+
247+
plone.pgcatalog ships composite indexes for the most common patterns:
248+
249+
- `(path_parent, portal_type)` for folder listings and navigation
250+
- `(path pattern, portal_type)` for collections and search
251+
- `(path pattern, path_depth, portal_type)` for navigation tree
252+
- `(portal_type, review_state)` for workflow-filtered listings
253+
254+
Custom catalog indexes registered via GenericSetup also get btree
255+
expression indexes automatically at startup.
256+
257+
### Slow query detection
258+
259+
Queries exceeding `PGCATALOG_SLOW_QUERY_MS` (default: 10 ms) are logged
260+
as warnings and recorded in the `pgcatalog_slow_queries` PostgreSQL table.
261+
The ZMI "Slow Queries" tab aggregates these by query field pattern and
262+
suggests composite index DDL for frequent patterns.
263+
264+
This is a self-tuning feedback loop: deploy the site, let it accumulate
265+
slow query data under real load, then add the suggested indexes.

docs/sources/explanation/tika-extraction.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,11 @@ Enabling Tika does not change how existing search works:
246246
- **Title and Description** are still indexed synchronously during
247247
`catalog_object()`, with immediate availability.
248248
- **Rich-text body** (SearchableText from `portal_transforms`) is
249-
still indexed synchronously.
249+
still indexed synchronously for non-File content types. For `IFile`
250+
objects, `portal_transforms` is skipped when Tika is active—the
251+
expensive `pdftotext`/`wv` calls and BFS graph traversal of the
252+
transform registry are avoided entirely. See
253+
{doc}`../how-to/custom-blob-searchabletext` for custom types.
250254
- **Tika extraction** adds to the existing tsvector asynchronously.
251255
A brief window (seconds to minutes, depending on queue
252256
depth and Tika processing time) exists where the blob content is not yet

docs/sources/how-to/rebuild-catalog.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Expected timing: approximately 15 ms per object.
3232

3333
## Selective reindex (reindexIndex)
3434

35-
Re-extracts a single index for all objects:
35+
Re-extracts a single index from all ZODB objects:
3636

3737
```python
3838
catalog.reindexIndex("review_state")
@@ -53,7 +53,7 @@ This happens automatically and does not trigger ZODB serialization of the object
5353
| `clearFindAndRebuild()` | Yes | Yes | ~15 ms/obj | Schema changes, corrupt data, major upgrades |
5454
| `refreshCatalog(clear=0)` | No | Re-catalogs existing | ~15 ms/obj | Reindex all without losing uncataloged objects |
5555
| `refreshCatalog(clear=1)` | Yes | Yes | Same | Equivalent to `clearFindAndRebuild()` |
56-
| `reindexIndex("name")` | No (single key) | No (PG only) | Fast | Single index changed, new indexer deployed |
56+
| `reindexIndex("name")` | No (single key) | Yes (ZODB load) | ~5 ms/obj | Single index changed, new indexer deployed |
5757

5858
**`clearFindAndRebuild()`** NULLs all catalog columns (path, idx,
5959
searchable_text, backend extras), then traverses the entire portal tree
@@ -66,11 +66,11 @@ resolves each from ZODB, and re-extracts index values.
6666
It does not
6767
discover objects that were never cataloged.
6868

69-
**`reindexIndex("name")`** is a PostgreSQL-only operation: it reads the
70-
existing `idx` JSONB for all objects that have the named key and
71-
re-applies it.
72-
It does not re-extract values from the live Zope object.
73-
To re-extract from objects, use `refreshCatalog()`.
69+
**`reindexIndex("name")`** loads each cataloged object from ZODB via
70+
`unrestrictedTraverse`, extracts the requested index value, and writes
71+
a JSONB merge update. This is faster than `refreshCatalog()` because
72+
it only re-extracts the single requested index, not all of them.
73+
Available via ZMI: Indexes & Metadata tab > [reindex] button per index.
7474

7575
## Troubleshooting
7676

docs/sources/reference/configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ need a separate `%import` directive.
5858
| `PGCATALOG_TIKA_URL` | (none) | Tika server URL, for example `http://localhost:9998`. Enables async text extraction from binary content (PDFs, Office docs, images). When set, the queue table and merge function are created at startup. See {doc}`../how-to/enable-tika-extraction`. |
5959
| `PGCATALOG_TIKA_CONTENT_TYPES` | common office/PDF/image types | Comma-separated MIME types to send to Tika. Default includes PDF, MS Office, OpenDocument, RTF, and common image formats. |
6060
| `PGCATALOG_TIKA_INPROCESS` | (none) | Set to `true`, `1`, or `yes` to start the extraction worker as a daemon thread inside the Zope process. Requires `PGCATALOG_TIKA_URL`. |
61+
| `PGCATALOG_SLOW_QUERY_MS` | `10` | Threshold in milliseconds for slow query detection. Queries exceeding this are logged as warnings and recorded in the `pgcatalog_slow_queries` table for analysis via the ZMI Slow Queries tab. Set to `0` to disable. |
6162
| `ZODB_TEST_DSN` | `dbname=zodb_test host=localhost port=5433 user=zodb password=zodb` | DSN for test database (tests only). |
6263
| `BM25_TEST_DSN` | `dbname=zodb_test host=localhost port=5434 user=zodb password=zodb` | DSN for BM25 integration tests (tests only). |
6364

0 commit comments

Comments
 (0)