bluedynamics
diff --git a/‎CHANGES.md‎
Lines changed: 27 additions & 0 deletions b/‎CHANGES.md‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎docs/plans/2026-04-15-strip-path-from-idx-jsonb.md‎
Lines changed: 1094 additions & 0 deletions b/‎docs/plans/2026-04-15-strip-path-from-idx-jsonb.md‎
Lines changed: 1094 additions & 0 deletions
diff --git a/‎docs/superpowers/specs/2026-04-15-suggestions-engine-pr-alpha-design.md‎
Lines changed: 325 additions & 0 deletions b/‎docs/superpowers/specs/2026-04-15-suggestions-engine-pr-alpha-design.md‎
Lines changed: 325 additions & 0 deletions
diff --git a/‎docs/superpowers/specs/2026-04-15-suggestions-engine-pr-beta-notes.md‎
Lines changed: 99 additions & 0 deletions b/‎docs/superpowers/specs/2026-04-15-suggestions-engine-pr-beta-notes.md‎
Lines changed: 99 additions & 0 deletions
diff --git a/‎src/plone/pgcatalog/catalog.py‎
Lines changed: 4 additions & 6 deletions b/‎src/plone/pgcatalog/catalog.py‎
Lines changed: 4 additions & 6 deletions
diff --git a/‎src/plone/pgcatalog/indexing.py‎
Lines changed: 2 additions & 4 deletions b/‎src/plone/pgcatalog/indexing.py‎
Lines changed: 2 additions & 4 deletions
diff --git a/‎src/plone/pgcatalog/migrations/__init__.py‎
Lines changed: 1 addition & 0 deletions b/‎src/plone/pgcatalog/migrations/__init__.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/plone/pgcatalog/migrations/strip_path_keys.py‎
Lines changed: 102 additions & 0 deletions b/‎src/plone/pgcatalog/migrations/strip_path_keys.py‎
Lines changed: 102 additions & 0 deletions
diff --git a/‎src/plone/pgcatalog/processor.py‎
Lines changed: 17 additions & 14 deletions b/‎src/plone/pgcatalog/processor.py‎
Lines changed: 17 additions & 14 deletions
diff --git a/‎src/plone/pgcatalog/query.py‎
Lines changed: 18 additions & 6 deletions b/‎src/plone/pgcatalog/query.py‎
Lines changed: 18 additions & 6 deletions
@@ -1,5 +1,32 @@
 # Changelog
 
+## 1.0.0b54
+
+### Changed
+
+- Stop duplicating `path`, `path_parent`, and `path_depth` between the typed
+  columns on `object_state` and the `idx` JSONB.  These three fields now live
+  exclusively in their typed columns (`path`, `parent_path`, `path_depth`) —
+  previously identical values were stored in both places, wasting ~10 % of
+  JSONB storage and (more importantly) blocking the planner from collecting
+  selectivity statistics on path-subtree filters.  Indexes and extended
+  statistics on these fields have been migrated to reference the typed columns
+  directly.  Custom `PATH`-type indexes (e.g. `tgpath`) are unaffected and
+  continue to store their data in `idx`.
+
+  **Migration:** Schema and writer changes are picked up automatically on
+  startup (the eight affected indexes and three extended-statistics objects
+  are reissued with idempotent `DROP … IF EXISTS` / `CREATE … IF NOT EXISTS`
+  pairs).  To strip the obsolete keys from existing JSONB on large catalogs,
+  run:
+
+  ```python
+  from plone.pgcatalog.migrations.strip_path_keys import run
+  run(conn, batch_size=5000)
+  ```
+
+  Safe to run online, idempotent, batched.  Issue #132.
+
 ## 1.0.0b53
 
 ### Fixed
 
@@ -0,0 +1,99 @@
+# PR β (Suggestions Engine) — Pre-brainstorm Notes
+
+> **This is NOT a design spec.** It captures decisions and open questions agreed during the PR α brainstorm (2026-04-15) that belong to PR β but were deferred. Use this as the starting material for the PR β brainstorm; don't re-derive these answers.
+
+**Parent spec:** `2026-04-15-suggestions-engine-pr-alpha-design.md`
+**Issue:** [bluedynamics/plone-pgcatalog#122](https://github.com/bluedynamics/plone-pgcatalog/issues/122)
+
+## Scope of PR β
+
+Build on PR α's Bundle data model to add:
+
+1. **EXPLAIN-driven grading** — live-DB insight per slow-query row and per bundle.
+2. **JSON endpoint** — server renders grades + plans as JSON; UI fetches async.
+3. **ZMI UI refactor** — DTML → JSON + vanilla JS for the Slow Queries tab (no heavy framework, per project rule).
+4. **Opt-in ANALYZE** — user-triggered deep inspection per row.
+
+## Decisions already locked
+
+### Q5 (EXPLAIN's role) — two-stage: rules propose, EXPLAIN grades + annotates
+
+Bundle candidates come from PR α's rule-based dispatcher. For each slow-query group, PR β runs `EXPLAIN (FORMAT JSON)` on the representative row and:
+
+- Extracts the chosen plan (indexes used, top cost nodes, estimated rows, filter predicates).
+- Grades each candidate bundle against the baseline plan.
+- Attaches a diagnostic panel: the top filter-cost node (node type, estimated Rows Removed by Filter when inferable from plan shape), so users see **why** the query is slow even if no bundle changes it.
+
+### Q7 (EXPLAIN trigger) — eager plain + opt-in ANALYZE
+
+- **Default on tab load:** JS fetches plain `EXPLAIN (FORMAT JSON)` per visible slow-query row. Cheap (~10 ms per plan). Grades computed on the server, returned with the plan.
+- **On click:** per-row "deep inspect" button triggers `EXPLAIN (ANALYZE, TIMING off, BUFFERS off)` with `SET LOCAL statement_timeout = '10s'`. UI shows a spinner. On completion, Rows Removed by Filter and actual row counts replace the estimates. Timeout → user sees "query too slow to analyze; here are the estimates we have".
+
+### Grading semantics
+
+Three states per bundle (plus the row-level row-is-in-slow-queries fact):
+
+| Grade | Meaning | Signal |
+|---|---|---|
+| `already_fast` | Query wouldn't be in `pgcatalog_slow_queries`. Not shown — the row wouldn't be listed. | N/A |
+| `covered_but_slow` | Current plan uses indexes that match the bundle's target predicates, but the query is still observed slow (it's in the slow-queries table definitionally). | Planner's chosen indexes cover the filter set; Rows Removed by Filter low / low-ish; something else bites (stats drift, bloat, correlation, or the GIN is too big → partial GIN bundle would help). |
+| `uncovered` | Current plan has a Seq Scan on `object_state`, OR chosen indexes don't cover the bundle's target predicates. Bundle would plausibly help. | Seq Scan node present, OR index-set difference between chosen and target non-empty. |
+
+Grade applies per-bundle, not per-row. A single slow query may yield multiple bundles each with its own grade.
+
+### Async JS fetch model
+
+- Page initial load: fast (bundles only, no per-row plans).
+- On DOMContentLoaded: JS iterates visible slow-query rows, issues `fetch()` to the per-row plan endpoint. Fills in grades and diagnostic panels as responses arrive.
+- Per-row "deep inspect" button triggers a second fetch with `?analyze=true`. Separate endpoint or query-param on the same endpoint — design open.
+
+### HypoPG as optional enhancement, graceful degradation
+
+- At startup, check `SELECT 1 FROM pg_extension WHERE extname = 'hypopg'`.
+- If present, bundle grading gains a "what-if plan" mode: `SELECT hypopg_create_index('<ddl without CONCURRENTLY>')`, run `EXPLAIN` again, compare. Gives confidence that applying the bundle would actually change the plan. Wrap in a read-only transaction that gets rolled back so hypothetical index disappears.
+- If absent, grade only from the baseline plan + index-set comparison heuristic.
+
+### UI refactor scope
+
+- Replace `manage_slow_queries.dtml` with a small JSON+JS page (or keep DTML as HTML shell + JS doing the rendering — **decide in PR β brainstorm**).
+- No heavy framework. Vanilla JS, optional lightweight helpers (alpine.js is acceptable per user's earlier "leichtes ok"; React/Vue are not).
+- Per-row expansion → shows plan tree, bundles with grade badges, "deep inspect" button.
+
+### PR α → PR β contract
+
+PR β's UI consumes what PR α's `manage_get_slow_query_stats` puts under `row["suggestions_bundles"]`:
+
+- `bundles[*].name` — stable identifier for UI keyed updates.
+- `bundles[*].rationale` — rendered as bundle description.
+- `bundles[*].shape_classification` — shown as a small badge.
+- `bundles[*].members[*]` — rendered as DDL code blocks with Apply / Drop buttons.
+
+## Open questions deferred to PR β brainstorm
+
+1. **JSON endpoint URL shape.** Per-row endpoint keyed by `query_keys` hash? Single batch endpoint? Separate endpoints for baseline plan vs. ANALYZE? Affects caching.
+2. **Plan caching.** Key by `(query_keys, representative_params_hash)`? TTL? Invalidate on index apply/drop? How do we handle plans that are valid for 5 seconds but not 5 hours (stats drift)?
+3. **Deep-inspect authorization + UX.** ANALYZE actually runs the slow query. Should the UI show a confirmation dialog ("this may take up to 10 s — proceed?") or silently trigger with a spinner? Who is allowed to trigger — `Manage portal` only, or any ZMI user?
+4. **Handling missing `query_text`.** PR 2's slow-query log stores the SQL. If an old row has null `query_text` (logged under a pre-PR-2 schema?), we can't EXPLAIN it. Skip? Reconstruct from `query_keys + params`? How do we reconstruct safely?
+5. **`_derived_attname_for_key` resolution for MCV stats** — carried over from PR α. If PR α ships COUNT-only, PR β may want to improve it once we know which pg_stats path is reachable for expression indexes.
+6. **Bundle-level grade aggregation.** A bundle has N members, each with their own `status` (new / already_covered). Bundle grade aggregates: what's the rule when some members are new and some already covered? Needed for the UI badge.
+7. **Rate limiting.** 50 slow queries × 1 EXPLAIN each on every tab load. At 10 ms each that's 500 ms total. Acceptable. If EXPLAIN is ever slow (stats issues), do we need a global timeout / circuit breaker?
+8. **Drop-bundle flow.** If a bundle's members were applied together, should there be a "drop bundle" action (drops all members in sequence)? Or only per-member?
+9. **HypoPG production readiness.** Is `hypopg` available in typical Plone PG deployments? Is asking customers to install an extension acceptable? (Extended statistics from PR 1 didn't need an extension; this would be new.)
+10. **Auto-apply opt-in.** Far future — a "trust the engine" mode where highly-confident uncovered-grade bundles get applied automatically during low-traffic windows? Definitely not PR β, but worth capturing the ergonomic target.
+
+## Non-goals for PR β
+
+- New templates beyond PR α's T1/T3/T4/T5/T6.
+- Changing the shape classifier or the partial-scoping threshold.
+- Cross-pod coordination for EXPLAIN caching (each pod EXPLAINs independently; that's fine).
+- Writing to `pgcatalog_slow_queries` from PR β (read-only).
+
+## Recommended next actions when PR β becomes the active work
+
+1. Re-invoke `superpowers:brainstorming` with this notes file as context.
+2. Answer the open questions above in the same Q-by-Q flow as PR α.
+3. Produce a proper design spec (`YYYY-MM-DD-suggestions-engine-pr-beta-design.md`).
+4. Feed spec into `superpowers:writing-plans`.
+5. Execute via `superpowers:subagent-driven-development`.
+
+All PR α infrastructure (Bundle dataclass, `suggestions_bundles` catalog.py output key, test helpers) is ready for PR β to consume — no re-plumbing needed.
@@ -28,7 +28,6 @@
 from plone.pgcatalog.backends import BM25Backend
 from plone.pgcatalog.backends import get_backend
 from plone.pgcatalog.brain import CatalogSearchResults
-from plone.pgcatalog.columns import compute_path_info
 from plone.pgcatalog.columns import get_registry
 from plone.pgcatalog.extraction import extract_from_translators
 from plone.pgcatalog.extraction import extract_idx
@@ -496,12 +495,11 @@ def _set_pg_annotation(self, obj, uid=None):
         wrapper = self._wrap_object(obj)
         idx = self._extract_idx(wrapper)
         searchable_text = self._extract_searchable_text(wrapper)
-        parent_path, path_depth = compute_path_info(uid)
 
-        # Store built-in path data in idx JSONB for unified path queries
-        idx["path"] = uid
-        idx["path_parent"] = parent_path
-        idx["path_depth"] = path_depth
+        # Path data lives in typed columns only.  The typed `path` column
+        # is set below via pending_data["path"]; parent_path and path_depth
+        # are derived from it by CatalogStateProcessor.
+        # See: docs/plans/2026-04-15-strip-path-from-idx-jsonb.md (#132)
 
         pending_data = {
             "path": uid,
 
@@ -40,10 +40,8 @@ def catalog_object(conn, zoid, path, idx, searchable_text=None, language="simple
     """
     parent_path, path_depth = compute_path_info(path)
 
-    # Store path data in idx JSONB for unified path queries
-    idx.setdefault("path", path)
-    idx.setdefault("path_parent", parent_path)
-    idx.setdefault("path_depth", path_depth)
+    # Path data lives in typed columns only (path, parent_path, path_depth).
+    # See: docs/plans/2026-04-15-strip-path-from-idx-jsonb.md (#132)
 
     # Extract registered extra idx columns (pops from idx → dedicated columns)
     extra = extract_extra_idx_columns(idx)
 
@@ -0,0 +1 @@
+"""Migration helpers for evolving plone-pgcatalog schema/data in place."""
@@ -0,0 +1,102 @@
+"""One-shot migration: remove path/path_parent/path_depth from idx JSONB.
+
+After issue #132, these three keys live exclusively in typed columns
+(path, parent_path, path_depth).  This script removes them from existing
+rows that were written before the cleanup.
+
+Idempotent and batched.  Safe to re-run; safe to interrupt.
+
+Usage (from a shell with a configured psycopg connection):
+
+    from plone.pgcatalog.migrations.strip_path_keys import run
+    result = run(conn, batch_size=5000)
+    print(result)  # {"batches": N, "rows_updated": M}
+"""
+
+import logging
+
+
+log = logging.getLogger(__name__)
+
+# Rows that still have any of the three keys present in idx
+_DIRTY_PREDICATE = "idx ?| ARRAY['path', 'path_parent', 'path_depth']"
+
+_BATCH_SQL = f"""
+    WITH batch AS (
+        SELECT zoid
+        FROM object_state
+        WHERE zoid > %(after_zoid)s
+          AND idx IS NOT NULL
+          AND {_DIRTY_PREDICATE}
+        ORDER BY zoid
+        LIMIT %(batch_size)s
+    )
+    UPDATE object_state os
+       SET idx = idx - 'path' - 'path_parent' - 'path_depth'
+      FROM batch
+     WHERE os.zoid = batch.zoid
+    RETURNING os.zoid
+"""
+
+
+def run(conn, batch_size: int = 5000) -> dict:
+    """Strip path keys from idx in batches.
+
+    The connection is switched to autocommit for the duration of the
+    migration so each batch commits independently; the prior autocommit
+    state is restored on return (even if an exception escapes).  If the
+    caller had an open transaction, ``conn.commit()`` is called first to
+    flush pending work before flipping autocommit on.
+
+    Args:
+        conn: psycopg connection.  Each batch commits independently --
+              the caller's transaction state (if any) is committed first
+              before switching to autocommit, and the original autocommit
+              flag is restored on exit.
+        batch_size: rows per batch.  Default 5000 keeps each UPDATE under
+                    ~10 MB WAL and ~1 s on a typical pod.
+
+    Returns: {"batches": int, "rows_updated": int}
+    """
+    original_autocommit = conn.autocommit
+    if not original_autocommit:
+        conn.commit()  # flush any pending work; migration needs per-batch commits
+        conn.autocommit = True
+
+    after_zoid = -1
+    batches = 0
+    total = 0
+
+    try:
+        while True:
+            with conn.cursor() as cur:
+                cur.execute(
+                    _BATCH_SQL,
+                    {
+                        "after_zoid": after_zoid,
+                        "batch_size": batch_size,
+                    },
+                )
+                zoids = [
+                    row[0] if isinstance(row, tuple) else row["zoid"]
+                    for row in cur.fetchall()
+                ]
+
+            if not zoids:
+                break
+
+            batches += 1
+            total += len(zoids)
+            after_zoid = max(zoids)
+            log.info(
+                "strip_path_keys: batch %d, %d rows, last zoid=%d, total=%d",
+                batches,
+                len(zoids),
+                after_zoid,
+                total,
+            )
+
+        log.info("strip_path_keys: done. %d batches, %d rows updated.", batches, total)
+        return {"batches": batches, "rows_updated": total}
+    finally:
+        conn.autocommit = original_autocommit
@@ -6,6 +6,7 @@
 """
 
 from plone.pgcatalog.backends import get_backend
+from plone.pgcatalog.columns import compute_path_info
 from plone.pgcatalog.columns import extract_extra_idx_columns
 from plone.pgcatalog.columns import get_extra_idx_columns
 from plone.pgcatalog.pending import _MISSING
@@ -228,10 +229,19 @@ def process(self, zoid, class_mod, class_name, state):
         idx = pending.get("idx")
         extra_values = extract_extra_idx_columns(idx)
 
+        # Path data lives in typed columns only.  parent_path and path_depth
+        # are derived from the canonical `path` field — not from idx.
+        # See: docs/plans/2026-04-15-strip-path-from-idx-jsonb.md (#132)
+        path = pending.get("path")
+        if path:
+            parent_path, path_depth = compute_path_info(path)
+        else:
+            parent_path, path_depth = None, None
+
         result = {
-            "path": pending.get("path"),
-            "parent_path": idx.get("path_parent") if idx else None,
-            "path_depth": idx.get("path_depth") if idx else None,
+            "path": path,
+            "parent_path": parent_path,
+            "path_depth": path_depth,
             "idx": Json(idx) if idx else None,
             "searchable_text": pending.get("searchable_text"),
             **extra_values,
@@ -275,25 +285,18 @@ def finalize(self, cursor):
                     {"zoid": zoid, "patch": Json(idx_updates), **extra_params},
                 )
 
-        # Execute bulk path moves (one SQL per moved subtree)
+        # Execute bulk path moves (one SQL per moved subtree).
+        # Touches typed columns only — idx no longer carries path keys.
+        # See: docs/plans/2026-04-15-strip-path-from-idx-jsonb.md (#132)
         moves = pop_all_pending_moves()
         for old_prefix, new_prefix, depth_delta in moves:
             cursor.execute(
                 """
                 UPDATE object_state SET
                     path = %(new)s || substring(path FROM length(%(old)s) + 1),
                     parent_path = %(new)s || substring(parent_path FROM length(%(old)s) + 1),
-                    path_depth = path_depth + %(dd)s,
-                    idx = idx || jsonb_build_object(
-                        'path',
-                        %(new)s || substring(idx->>'path' FROM length(%(old)s) + 1),
-                        'path_parent',
-                        %(new)s || substring(idx->>'path_parent' FROM length(%(old)s) + 1),
-                        'path_depth',
-                        (idx->>'path_depth')::int + %(dd)s
-                    )
+                    path_depth = path_depth + %(dd)s
                 WHERE path LIKE %(like)s
-                  AND idx IS NOT NULL
                 """,
                 {
                     "old": old_prefix,
 
@@ -514,12 +514,19 @@ def _handle_path(self, name, idx_key, spec):
 
         paths = [_validate_path(p) for p in paths]  # validates AND normalizes
 
-        # All path indexes (built-in "path" and additional like "tgpath")
-        # store their data in idx JSONB and query via expression indexes.
-        key = name if idx_key is None else idx_key
-        expr_path = f"idx->>'{key}'"
-        expr_parent = f"idx->>'{key}_parent'"
-        expr_depth = f"(idx->>'{key}_depth')::integer"
+        # Dispatch: the built-in "path" index lives in typed columns
+        # (path, parent_path, path_depth).  Custom path indexes
+        # (e.g. "tgpath") still store their data in idx JSONB.
+        # See: docs/plans/2026-04-15-strip-path-from-idx-jsonb.md (#132)
+        if idx_key is None and name == "path":
+            expr_path = "path"
+            expr_parent = "parent_path"
+            expr_depth = "path_depth"
+        else:
+            key = name if idx_key is None else idx_key
+            expr_path = f"idx->>'{key}'"
+            expr_parent = f"idx->>'{key}_parent'"
+            expr_depth = f"(idx->>'{key}_depth')::integer"
 
         if navtree:
             self._path_navtree(expr_path, expr_parent, paths[0], depth, navtree_start)
@@ -652,6 +659,11 @@ def _process_sort(self, sort_on_list, sort_order_list):
                 continue
 
             idx_type, idx_key, _source_attrs = entry
+            # Built-in "path" sort lives in the typed `path` column (#132).
+            # Custom PATH indexes (e.g. "tgpath") still store data in idx JSONB.
+            if idx_type == IndexType.PATH and idx_key is None and sort_on == "path":
+                parts.append(f"path {direction}")
+                continue
             if idx_key is None:
                 if idx_type == IndexType.PATH:
                     idx_key = sort_on
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+"""Migration helpers for evolving plone-pgcatalog schema/data in place."""`