Suggestions engine PR α: Bundle model + partial GIN + hybrid templates (#122) by jensens · Pull Request #162 · bluedynamics/plone-pgcatalog

jensens · 2026-04-23T22:41:16Z

Summary

Closes the canonical slow-query suggestion gap from the #122 gap-analysis comment — the AT26-style query (`portal_type=Event + review_state=published + Subject=AT26 + effective range + sort_on=effective`) previously produced no useful suggestion because the engine thought in single btree composites and had no partial-GIN or hybrid-bundle output shape.

New Bundle data model. `suggest_indexes()` now returns `list[Bundle]`; each Bundle groups one or more `BundleMember`s (DDLs) that together address a slow-query shape. Back-compat flatten in `manage_get_slow_query_stats` keeps existing DTML rendering unchanged; full bundle structure exposed under new `suggestions_bundles` key for PR β's JSON/JS UI.
New templates: T4 partial GIN (GIN with AND-joined equality predicates baked into WHERE), T5 hybrid bundle (btree composite + one or more partial GINs working together; planner BitmapAnds them). T1/T3/T6 unchanged in DDL, re-wrapped in Bundle shape.
Data-driven partial-predicate scoping. Selectivity probe uses `pg_stats.most_common_vals` (zero round-trip for common values) with live `COUNT(*)` fallback. Equality filters below 10 % selectivity (configurable via `PGCATALOG_PARTIAL_SELECTIVITY_THRESHOLD` env var) are baked into partial WHERE. Request-scoped cache.
Shape classifier routes filter sets to BTREE_ONLY / KEYWORD_ONLY / MIXED / TEXT_ONLY / UNKNOWN. Customer-specific index types work automatically (rules operate on `IndexType`, not field names).
WHERE-aware coverage detection — existing plain GIN on Subject no longer falsely marks a stricter partial GIN as `already_covered` (they are genuinely distinct indexes).

No DDL changes, no schema migration. Safe to roll out mid-fleet. PR β (EXPLAIN-driven grading + DTML → JSON+JS UI + opt-in ANALYZE) gets its own brainstorm and design spec when this lands — see `docs/superpowers/specs/2026-04-15-suggestions-engine-pr-beta-notes.md` for the carryover decisions.

Spec: `docs/superpowers/specs/2026-04-15-suggestions-engine-pr-alpha-design.md`
Plan: `docs/plans/2026-04-15-suggestions-engine-pr-alpha-plan.md`

Test plan

`tests/test_suggestions.py` — 99 tests pass (52 baseline + 47 new covering Bundle types, filter extraction, shape classification, btree/GIN/hybrid builders, probe semantics, partial-WHERE rules, GIN coverage, and the canonical AT26 regression)
Full project test suite — 1481 passing, 39 skipped
`ruff check` clean on modified files
Manual smoke in a live instance: trigger the issue-Slow Queries page: >100ms patterns with only filter-only / dedicated-column keys get no index suggestion, even when buffer hits are high #122 query pattern and confirm the Slow Queries ZMI tab now renders a two-row suggestion (btree + partial GIN) via the back-compat flatten
After merge: verify PR β notes file is sufficient starting material for the next brainstorm

Known pre-existing flaky tests unrelated to this PR — 3 `test_clear_find_and_rebuild_*` tests in `tests/test_pg_integration.py::TestMaintenanceOps` pass in isolation but fail when run consecutively as a class (state contamination between tests). Filed as a separate issue for followup.

Refs #122.

🤖 Generated with Claude Code

Eleven tasks: Bundle dataclasses → filter-field extractor → shape classifier → btree-builder → GIN-builder → signature migration (list[Bundle]) → probe → partial WHERE → hybrid dispatch → GIN coverage → CHANGES.md. Refs #122. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Foundational types for PR α's multi-index suggestion output. Frozen dataclasses so the engine output is safe to pass around and serialize. No callers yet — introduced as a separate commit to keep the refactor surface reviewable. Refs #122.

Pure helper that turns (query_keys, params, registry) into a structured list of (name, IndexType, operator, value) tuples. Handles virtual field expansion (effectiveRange -> effective), equality/range/equality_multi operator inference, and all the drop rules (pagination / sort / dedicated / skip / unknown). Not yet wired into suggest_indexes — subsequent tasks will use it via the new shape classifier. Refs #122.

Pure helper that maps a filter-field list to one of five shape classifications (BTREE_ONLY, KEYWORD_ONLY, MIXED, TEXT_ONLY, UNKNOWN) — the routing input for the bundle dispatcher. Refs #122.

Wraps PR 2's btree-composite + sort-covering logic into a Bundle output shape. Not yet consumed by suggest_indexes — that migration happens in a later commit. Refs #122.

Handles both T3 (plain GIN) when partial_where_terms is empty and T4 (partial GIN) when provided. Multiple KEYWORD filters produce multiple members in one bundle so the planner can BitmapAnd them. Refs #122.

Signature gains conn=None. Return type migrates from list[dict] to list[Bundle] with BundleMember. catalog.py flattens bundles for back-compat DTML rendering and additionally exposes the full bundle structure under a new 'suggestions_bundles' key for PR β's JSON/JS consumer. Legacy _add_btree_suggestions and _add_standalone_suggestion helpers removed — their logic lives in _build_btree_bundle and _build_keyword_gin_bundle. Refs #122. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two-stage selectivity probe for partial-predicate scoping. pg_stats.most_common_vals gives zero-round-trip answers for common values; unknown values fall back to live COUNT. All results cached in a module-level dict reset per ZMI page load. Refs #122.

Pure helper that builds the list of AND-joined WHERE predicates for a partial index, applying the 10%-selectivity threshold (configurable via PGCATALOG_PARTIAL_SELECTIVITY_THRESHOLD env var). Excludes DATE and KEYWORD values, multi-value equality, and range operators. SQL-escapes single quotes. Refs #122.

MIXED shape (btree + KEYWORD filters together) now produces a single Bundle containing a btree composite member plus one partial-GIN member per KEYWORD, all scoped by the same partial WHERE predicates (the planner BitmapAnds them). Closes the issue-#122 AT26 gap: portal_type=Event + review_state=published + Subject=AT26 + effective+expires range + sort_on=effective now yields btree (portal_type, review_state, effective) AND partial GIN on Subject WHERE portal_type='Event' AND review_state='published' Dedicated KEYWORD fields (e.g. Subject) are kept in filter_fields so the MIXED shape can be detected alongside btree-eligible fields. The KEYWORD_ONLY dispatch path strips dedicated fields before calling _build_keyword_gin_bundle (avoiding duplicate suggestions), while the MIXED hybrid path passes them through (enabling partial GIN suggestions that are genuinely new, more selective indexes). Probe cache is reset in manage_get_slow_query_stats so unit tests can seed it independently. Refs #122. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Simplifies the dedicated-keyword pass-through logic in _extract_filter_fields. No behavior change. Refs #122.

Existing plain-GIN indexes on a KEYWORD field now match as 'already_covered' for plain-GIN suggestions on the same field. Partial-GIN suggestions with STRICTER WHERE than the existing index remain 'new' — they are genuinely distinct indexes with different plan usability. _check_covered now compares WHERE clauses as predicate sets: suggested ⊆ existing = covered; else new. The plan doc had the subset direction inverted — the test is ground truth and the implementation follows it. Refs #122. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Per final review: module-level _probe_cache/_pg_stats_cache assumes Zope's default single-thread-per-worker model. Documented the caveat and the ContextVar mitigation path if workers ever go multi-threaded. Refs #122.

jensens and others added 14 commits April 23, 2026 23:46

feat: add _classify_filter_shape

8ec0cf6

Pure helper that maps a filter-field list to one of five shape classifications (BTREE_ONLY, KEYWORD_ONLY, MIXED, TEXT_ONLY, UNKNOWN) — the routing input for the bundle dispatcher. Refs #122.

feat: extract _build_btree_bundle — Bundle-returning btree path

b2720c2

Wraps PR 2's btree-composite + sort-covering logic into a Bundle output shape. Not yet consumed by suggest_indexes — that migration happens in a later commit. Refs #122.

feat: add _build_keyword_gin_bundle — plain and partial GIN

a899fa9

Handles both T3 (plain GIN) when partial_where_terms is empty and T4 (partial GIN) when provided. Multiple KEYWORD filters produce multiple members in one bundle so the planner can BitmapAnd them. Refs #122.

refactor: remove dead _is_dedicated_keyword variable

636e041

Simplifies the dedicated-keyword pass-through logic in _extract_filter_fields. No behavior change. Refs #122.

docs: changelog for PR α suggestions engine (#122)

246282e

docs: acknowledge multi-threaded-WSGI caveat on probe cache

cffd7df

Per final review: module-level _probe_cache/_pg_stats_cache assumes Zope's default single-thread-per-worker model. Documented the caveat and the ContextVar mitigation path if workers ever go multi-threaded. Refs #122.

jensens mentioned this pull request Apr 23, 2026

Test-ordering flakiness in tests/test_pg_integration.py::TestMaintenanceOps::test_clear_find_and_rebuild_* #163

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions engine PR α: Bundle model + partial GIN + hybrid templates (#122)#162

Suggestions engine PR α: Bundle model + partial GIN + hybrid templates (#122)#162
jensens wants to merge 14 commits intomainfrom
feat/suggestions-pr-alpha-122

jensens commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jensens commented Apr 23, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant