fix(metrics): rank don't filter — keep pg_catalog/pg_toast/timescale visible

claude · claude · commit bd2f16c5b612 · 2026-05-15T16:25:44.000Z
The first revision read from pg_stat_user_*/pg_statio_user_*, which the
Postgres views define as 'pg_stat_all_* WHERE schemaname NOT IN
(pg_catalog, information_schema) AND schemaname !~ ^pg_toast'. That's
identity-based filtering wearing a different hat: it silently hides bloat
in pg_toast, hot scans in pg_catalog, and any issue inside
_timescaledb_internal. If a TOAST table is bloated or a catalog index is
being hammered, the operator wouldn't see it.

Rework the four metrics to read pg_stat_all_*/pg_statio_all_* directly
and rely PURELY on cardinality control:

- Top 100 by relevance per database (idx_scan / pg_total_relation_size /
  heap_blks_read / idx_blks_read).
- Tail aggregated into a single 'other' row so totals stay correct.
- No pg_temp%, no pg_toast%, no _timescaledb% schema filtering anywhere.
  A relation enters the top-N by activity or by size; if it's not in the
  top-N, it's in 'other'.

The only WHERE filter kept is the zero-counter row skip on the two statio
metrics — those rows literally carry no information (every gauge is 0)
and cannot mask any issue, so dropping them is information-preserving,
not identity-based.

Smoke-tested against PG16:
- pg_stat_all_tables: 101 rows, 75 from pg_catalog/etc. in top-100.
- pg_stat_all_indexes: 101 rows, 98 from system schemas.
- pg_statio_all_tables / pg_statio_all_indexes: catalog/toast rows
  appear in top-N once they have any I/O.

Regression tests updated to assert: reads pg_stat_all_*/pg_statio_all_*,
no schemaname/nspname LIKE patterns, no 'pg_toast'/'pg_catalog'/
'_timescaledb' literals — top-N + 'other' is the only mechanism.
diff --git a/config/pgwatch-prometheus/metrics.yml b/config/pgwatch-prometheus/metrics.yml
@@ -1552,13 +1552,15 @@ metrics:
       - total_relation_size_bytes
     statement_timeout_seconds: 15
   pg_stat_all_indexes:
-    # Top-N + "other" bucket pattern ported from pgwatch2 postgres.ai edition
-    # (gitlab.com/postgres-ai/pgwatch2 — our fork of Cybertec's pgwatch2,
-    # used as gen2 of our monitoring stack before postgresai). Reads
-    # pg_stat_user_indexes so pg_catalog/information_schema/pg_toast are
-    # excluded by the Postgres view itself, no hand-curated nspname pattern.
-    # The "other" row aggregates the tail so totals stay correct under a
-    # hard cardinality cap.
+    # Bound cardinality by ranking — NOT by identity. Reads pg_stat_all_indexes
+    # directly (NOT pg_stat_user_indexes) so pg_catalog, pg_toast and
+    # _timescaledb_internal indexes stay visible: a heavily-scanned catalog
+    # index or a hot Timescale chunk index will naturally rank into the
+    # top-N. Everything below the cap is aggregated into a single `'other'`
+    # row so dashboard totals stay correct. Pattern adapted from pgwatch2
+    # postgres.ai edition (gitlab.com/postgres-ai/pgwatch2 — our fork of
+    # Cybertec's pgwatch2), but without that edition's pg_temp%/user-view
+    # filters which would silently hide system-schema problems.
     sqls:
       11: |
         with ranked as ( /* pgwatch_generated */
@@ -1570,8 +1572,7 @@ metrics:
             idx_scan,
             idx_tup_read,
             idx_tup_fetch
-          from pg_stat_user_indexes
-          where not schemaname like E'pg\\_temp%'
+          from pg_stat_all_indexes
         )
         select
           current_database() as tag_datname,
@@ -1601,11 +1602,15 @@ metrics:
       - idx_tup_fetch
     statement_timeout_seconds: 15
   pg_stat_all_tables:
-    # Top-N + "other" bucket pattern ported from pgwatch2 postgres.ai edition
-    # (gitlab.com/postgres-ai/pgwatch2). Ranks by
-    # pg_total_relation_size — large tables are usually the interesting ones,
-    # which avoids starving big-but-static tables out of the top-N (the old
-    # n_live_tup+n_dead_tup ordering did exactly that).
+    # Bound cardinality by ranking — NOT by identity. Reads pg_stat_all_tables
+    # directly (NOT pg_stat_user_tables) so pg_catalog, pg_toast and
+    # _timescaledb_internal tables stay visible: a bloated TOAST table or a
+    # huge Timescale chunk will naturally rank into the top-N by
+    # pg_total_relation_size. Everything below the cap is summed into a
+    # single `'other'` row.
+    #
+    # Ordering by total relation size (vs the previous n_live_tup+n_dead_tup)
+    # keeps big-but-static tables — including pg_toast — in scope.
     sqls:
       11: |
         with ranked as ( /* pgwatch_generated */
@@ -1631,7 +1636,7 @@ metrics:
             autovacuum_count,
             analyze_count,
             autoanalyze_count
-          from pg_stat_user_tables
+          from pg_stat_all_tables
         )
         select
           current_database() as tag_datname,
@@ -2961,13 +2966,16 @@ metrics:
     statement_timeout_seconds: 15
   pg_statio_all_tables:
     description: >
-      Retrieves table-level I/O statistics from `pg_statio_user_tables`, returning
-      block-level read and hit counters for heap, index, TOAST and TOAST-index pages.
-      Ports the top-N + `'other'` bucket pattern from pgwatch2 postgres.ai
-      edition (gitlab.com/postgres-ai/pgwatch2): ranks tables by
-      heap_blks_read, keeps the top 100, and folds the tail into a single `'other'`
-      row so totals remain accurate while cardinality stays bounded. Drops rows
-      with no I/O activity at all (every counter zero).
+      Retrieves table-level I/O statistics from `pg_statio_all_tables`, returning
+      block-level read and hit counters for heap, index, TOAST and TOAST-index
+      pages. Adapts the top-N + `'other'` bucket pattern from pgwatch2 postgres.ai
+      edition (gitlab.com/postgres-ai/pgwatch2): ranks tables by heap_blks_read,
+      keeps the top 100, and folds the tail into a single `'other'` row so totals
+      remain accurate while cardinality stays bounded.
+      Reads pg_statio_all_tables (not pg_statio_user_tables) so I/O on pg_catalog,
+      pg_toast and _timescaledb_internal stays visible — those tables enter the
+      top-N by activity, not by schema membership. The zero-counter row skip is
+      kept (those rows literally carry no information and are not identity-based).
       Compatible with all PostgreSQL versions.
     sqls:
       11: |-
@@ -2984,7 +2992,7 @@ metrics:
             toast_blks_hit,
             tidx_blks_read,
             tidx_blks_hit
-          from pg_statio_user_tables
+          from pg_statio_all_tables
           where
             heap_blks_read > 0 or heap_blks_hit > 0
             or idx_blks_read > 0 or idx_blks_hit > 0
@@ -3028,12 +3036,15 @@ metrics:
     statement_timeout_seconds: 15
   pg_statio_all_indexes:
     description: >
-      Retrieves index-level I/O statistics from `pg_statio_user_indexes`, returning
-      block-level read and hit counters per index. Ports the pgwatch2
-      postgres.ai edition (gitlab.com/postgres-ai/pgwatch2)
-      top-N + `'other'` bucket pattern: ranks indexes by idx_blks_read, keeps the
+      Retrieves index-level I/O statistics from `pg_statio_all_indexes`, returning
+      block-level read and hit counters per index. Adapts the top-N + `'other'`
+      bucket pattern from pgwatch2 postgres.ai edition
+      (gitlab.com/postgres-ai/pgwatch2): ranks indexes by idx_blks_read, keeps the
       top 100, folds the tail into a single `'other'` row, and drops indexes with
-      no I/O activity. Filters temp schemas.
+      no I/O activity (zero-counter rows carry no information).
+      Reads pg_statio_all_indexes (not pg_statio_user_indexes) so catalog,
+      pg_toast and _timescaledb_internal indexes stay visible: a hot catalog
+      index will rank into the top-N by activity, not be hidden by schema name.
       Compatible with all PostgreSQL versions.
     sqls:
       11: |-
@@ -3045,10 +3056,8 @@ metrics:
             indexrelname,
             idx_blks_read,
             idx_blks_hit
-          from pg_statio_user_indexes
-          where
-            not schemaname like E'pg\\_temp%'
-            and (idx_blks_read > 0 or idx_blks_hit > 0)
+          from pg_statio_all_indexes
+          where idx_blks_read > 0 or idx_blks_hit > 0
         )
         select
           (extract(epoch from now()) * 1e9)::int8 as epoch_ns,
diff --git a/tests/compliance_vectors/test_mr219_monitoring_guards.py b/tests/compliance_vectors/test_mr219_monitoring_guards.py
@@ -81,43 +81,57 @@ def test_pgwatch_metrics_yml_pg_stat_statements_has_top_n_filter():
 
 
 def test_pgwatch_stat_views_use_topn_and_other_bucket():
-    """High-cardinality per-relation metrics must port the pattern from
-    pgwatch2 postgres.ai edition (gitlab.com/postgres-ai/pgwatch2, our
-    fork of Cybertec's pgwatch2 used as the previous generation of our
-    monitoring stack): read pg_stat_user_*/pg_statio_user_* (so pg_catalog,
-    information_schema and pg_toast are excluded by the Postgres view
-    itself, no hand-curated nspname pattern), keep the top 100 by relevance,
-    and aggregate the tail into a single `'other'` tag row so dashboard
-    totals stay correct under a hard cardinality cap. Hand-rolled nspname
-    LIKE filters or LIMIT-only truncation silently drop the tail and break
-    sums on extension-heavy or schema-heavy databases.
+    """High-cardinality per-relation metrics must bound cardinality by
+    RANKING, not by IDENTITY. Read pg_stat_all_*/pg_statio_all_* directly
+    (NOT the pg_stat_user_*/pg_statio_user_* views, which silently exclude
+    pg_catalog/pg_toast and would hide bloat or hot scans in those
+    relations), keep the top 100 by relevance, and aggregate the tail into
+    a single `'other'` tag row so dashboard totals stay correct.
+
+    The principle: a bloated pg_toast or a heavy _timescaledb_internal
+    chunk should appear in the top-N when its activity/size warrants it.
+    Schema-name filtering (`pg_stat_user_*` views, `NOT LIKE 'pg_toast%'`,
+    `NOT LIKE '_timescaledb%'`) makes those issues invisible. Hand-rolled
+    nspname LIKE filters or LIMIT-only truncation likewise silently drop
+    the tail and break sums on extension-heavy or schema-heavy databases.
     """
     metrics = yaml.safe_load(
         (PROJECT_ROOT / "config/pgwatch-prometheus/metrics.yml").read_text()
     )
     expectations = {
-        "pg_stat_all_indexes": "pg_stat_user_indexes",
-        "pg_stat_all_tables": "pg_stat_user_tables",
-        "pg_statio_all_tables": "pg_statio_user_tables",
-        "pg_statio_all_indexes": "pg_statio_user_indexes",
+        "pg_stat_all_indexes": "pg_stat_all_indexes",
+        "pg_stat_all_tables": "pg_stat_all_tables",
+        "pg_statio_all_tables": "pg_statio_all_tables",
+        "pg_statio_all_indexes": "pg_statio_all_indexes",
     }
     for metric_name, base_view in expectations.items():
         for sql in metrics["metrics"][metric_name]["sqls"].values():
             compact_sql = _compact_sql(sql)
-            assert base_view in compact_sql, metric_name
+            # Reads the _all_ view, not the _user_ view — keeps catalog/toast/timescale visible.
+            assert f"from {base_view}" in compact_sql, metric_name
+            user_view = base_view.replace("_all_", "_user_")
+            assert user_view not in compact_sql, metric_name
             # Top-N window + tail aggregation
             assert "row_number() over" in compact_sql, metric_name
             assert "rownum <= 100" in compact_sql, metric_name
             assert "rownum > 100" in compact_sql, metric_name
             assert "'other'" in compact_sql, metric_name
             # No unfiltered LIMIT-only truncation left in place
             assert "limit 5000" not in compact_sql, metric_name
+            # No identity-based schema exclusions sneaking back in.
+            assert "schemaname like" not in compact_sql, metric_name
+            assert "nspname like" not in compact_sql, metric_name
+            assert "'pg_toast'" not in compact_sql, metric_name
+            assert "'pg_catalog'" not in compact_sql, metric_name
+            assert "_timescaledb" not in compact_sql, metric_name
 
 
 def test_pgwatch_statio_skips_zero_activity_rows():
-    """pg_statio_user_* tail is mostly zero-I/O rows on schema-heavy DBs.
-    Filtering them out (pgwatch2 behavior) cuts cardinality before the
-    top-N cap is even reached and keeps the `'other'` bucket meaningful.
+    """pg_statio tail is mostly zero-I/O rows on schema-heavy DBs. Skipping
+    them cuts cardinality before the top-N cap is even reached and keeps
+    the `'other'` bucket meaningful. This is NOT identity-based filtering:
+    a row with every counter zero literally carries no information and
+    cannot mask any issue.
     """
     metrics = yaml.safe_load(
         (PROJECT_ROOT / "config/pgwatch-prometheus/metrics.yml").read_text()