Skip to content

Suggestions engine PR α: Bundle model + partial GIN + hybrid templates (#122)#162

Open
jensens wants to merge 14 commits intomainfrom
feat/suggestions-pr-alpha-122
Open

Suggestions engine PR α: Bundle model + partial GIN + hybrid templates (#122)#162
jensens wants to merge 14 commits intomainfrom
feat/suggestions-pr-alpha-122

Conversation

@jensens
Copy link
Copy Markdown
Member

@jensens jensens commented Apr 23, 2026

Summary

Closes the canonical slow-query suggestion gap from the #122 gap-analysis comment — the AT26-style query (`portal_type=Event + review_state=published + Subject=AT26 + effective range + sort_on=effective`) previously produced no useful suggestion because the engine thought in single btree composites and had no partial-GIN or hybrid-bundle output shape.

  • New Bundle data model. `suggest_indexes()` now returns `list[Bundle]`; each Bundle groups one or more `BundleMember`s (DDLs) that together address a slow-query shape. Back-compat flatten in `manage_get_slow_query_stats` keeps existing DTML rendering unchanged; full bundle structure exposed under new `suggestions_bundles` key for PR β's JSON/JS UI.
  • New templates: T4 partial GIN (GIN with AND-joined equality predicates baked into WHERE), T5 hybrid bundle (btree composite + one or more partial GINs working together; planner BitmapAnds them). T1/T3/T6 unchanged in DDL, re-wrapped in Bundle shape.
  • Data-driven partial-predicate scoping. Selectivity probe uses `pg_stats.most_common_vals` (zero round-trip for common values) with live `COUNT(*)` fallback. Equality filters below 10 % selectivity (configurable via `PGCATALOG_PARTIAL_SELECTIVITY_THRESHOLD` env var) are baked into partial WHERE. Request-scoped cache.
  • Shape classifier routes filter sets to BTREE_ONLY / KEYWORD_ONLY / MIXED / TEXT_ONLY / UNKNOWN. Customer-specific index types work automatically (rules operate on `IndexType`, not field names).
  • WHERE-aware coverage detection — existing plain GIN on Subject no longer falsely marks a stricter partial GIN as `already_covered` (they are genuinely distinct indexes).

No DDL changes, no schema migration. Safe to roll out mid-fleet. PR β (EXPLAIN-driven grading + DTML → JSON+JS UI + opt-in ANALYZE) gets its own brainstorm and design spec when this lands — see `docs/superpowers/specs/2026-04-15-suggestions-engine-pr-beta-notes.md` for the carryover decisions.

Spec: `docs/superpowers/specs/2026-04-15-suggestions-engine-pr-alpha-design.md`
Plan: `docs/plans/2026-04-15-suggestions-engine-pr-alpha-plan.md`

Test plan

  • `tests/test_suggestions.py` — 99 tests pass (52 baseline + 47 new covering Bundle types, filter extraction, shape classification, btree/GIN/hybrid builders, probe semantics, partial-WHERE rules, GIN coverage, and the canonical AT26 regression)
  • Full project test suite — 1481 passing, 39 skipped
  • `ruff check` clean on modified files
  • Manual smoke in a live instance: trigger the issue-Slow Queries page: >100ms patterns with only filter-only / dedicated-column keys get no index suggestion, even when buffer hits are high #122 query pattern and confirm the Slow Queries ZMI tab now renders a two-row suggestion (btree + partial GIN) via the back-compat flatten
  • After merge: verify PR β notes file is sufficient starting material for the next brainstorm

Known pre-existing flaky tests unrelated to this PR — 3 `test_clear_find_and_rebuild_*` tests in `tests/test_pg_integration.py::TestMaintenanceOps` pass in isolation but fail when run consecutively as a class (state contamination between tests). Filed as a separate issue for followup.

Refs #122.

🤖 Generated with Claude Code

jensens and others added 14 commits April 23, 2026 23:46
Eleven tasks: Bundle dataclasses → filter-field extractor →
shape classifier → btree-builder → GIN-builder → signature
migration (list[Bundle]) → probe → partial WHERE → hybrid
dispatch → GIN coverage → CHANGES.md.

Refs #122.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Foundational types for PR α's multi-index suggestion output.
Frozen dataclasses so the engine output is safe to pass around
and serialize. No callers yet — introduced as a separate commit
to keep the refactor surface reviewable.

Refs #122.
Pure helper that turns (query_keys, params, registry) into a
structured list of (name, IndexType, operator, value) tuples.
Handles virtual field expansion (effectiveRange -> effective),
equality/range/equality_multi operator inference, and all the
drop rules (pagination / sort / dedicated / skip / unknown).

Not yet wired into suggest_indexes — subsequent tasks will use it
via the new shape classifier.

Refs #122.
Pure helper that maps a filter-field list to one of five shape
classifications (BTREE_ONLY, KEYWORD_ONLY, MIXED, TEXT_ONLY,
UNKNOWN) — the routing input for the bundle dispatcher.

Refs #122.
Wraps PR 2's btree-composite + sort-covering logic into a
Bundle output shape.  Not yet consumed by suggest_indexes —
that migration happens in a later commit.

Refs #122.
Handles both T3 (plain GIN) when partial_where_terms is empty and
T4 (partial GIN) when provided.  Multiple KEYWORD filters produce
multiple members in one bundle so the planner can BitmapAnd them.

Refs #122.
Signature gains conn=None.  Return type migrates from
list[dict] to list[Bundle] with BundleMember.  catalog.py
flattens bundles for back-compat DTML rendering and
additionally exposes the full bundle structure under a new
'suggestions_bundles' key for PR β's JSON/JS consumer.

Legacy _add_btree_suggestions and _add_standalone_suggestion
helpers removed — their logic lives in _build_btree_bundle and
_build_keyword_gin_bundle.

Refs #122.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two-stage selectivity probe for partial-predicate scoping.
pg_stats.most_common_vals gives zero-round-trip answers for
common values; unknown values fall back to live COUNT.  All
results cached in a module-level dict reset per ZMI page load.

Refs #122.
Pure helper that builds the list of AND-joined WHERE predicates
for a partial index, applying the 10%-selectivity threshold
(configurable via PGCATALOG_PARTIAL_SELECTIVITY_THRESHOLD env
var).  Excludes DATE and KEYWORD values, multi-value equality,
and range operators.  SQL-escapes single quotes.

Refs #122.
MIXED shape (btree + KEYWORD filters together) now produces a
single Bundle containing a btree composite member plus one
partial-GIN member per KEYWORD, all scoped by the same
partial WHERE predicates (the planner BitmapAnds them).

Closes the issue-#122 AT26 gap:
  portal_type=Event + review_state=published + Subject=AT26
  + effective+expires range + sort_on=effective
now yields
  btree (portal_type, review_state, effective)  AND
  partial GIN on Subject
    WHERE portal_type='Event' AND review_state='published'

Dedicated KEYWORD fields (e.g. Subject) are kept in filter_fields
so the MIXED shape can be detected alongside btree-eligible fields.
The KEYWORD_ONLY dispatch path strips dedicated fields before calling
_build_keyword_gin_bundle (avoiding duplicate suggestions), while the
MIXED hybrid path passes them through (enabling partial GIN suggestions
that are genuinely new, more selective indexes).

Probe cache is reset in manage_get_slow_query_stats so unit
tests can seed it independently.

Refs #122.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Simplifies the dedicated-keyword pass-through logic in
_extract_filter_fields. No behavior change.

Refs #122.
Existing plain-GIN indexes on a KEYWORD field now match as
'already_covered' for plain-GIN suggestions on the same field.
Partial-GIN suggestions with STRICTER WHERE than the existing
index remain 'new' — they are genuinely distinct indexes with
different plan usability.

_check_covered now compares WHERE clauses as predicate sets:
suggested ⊆ existing = covered; else new.  The plan doc had
the subset direction inverted — the test is ground truth and
the implementation follows it.

Refs #122.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per final review: module-level _probe_cache/_pg_stats_cache assumes
Zope's default single-thread-per-worker model.  Documented the
caveat and the ContextVar mitigation path if workers ever go
multi-threaded.

Refs #122.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant