fix: limit queryid cardinality in postgres sink and query-info scrape

claude · claude · commit b858689e4d39 · 2026-02-13T04:53:37.000Z
The pg_stat_statements WHERE filter was applied only on the Prometheus sink path. The postgres sink (pgss_queryid_queries) had NO LIMIT, dumping all queryids into the sink table every 30s. The flask backend then exported them all as pgwatch_query_info Prometheus metrics — and the query-info scrape job had no sample_limit safety net. This was the second, unfiltered path feeding 100K queryids into VictoriaMetrics. Changes: - Add LIMIT 100 to pgss_queryid_queries (postgres sink), matching the prometheus sink's top-100-by-exec-time cap - Add sample_limit: 500 to the query-info scrape job in prometheus.yml https://claude.ai/code/session_01SzJxzZNQjDQphaHyaX3RU7
diff --git a/config/pgwatch-postgres/metrics.yml b/config/pgwatch-postgres/metrics.yml
@@ -12,6 +12,7 @@ metrics:
           queryid is not null
           and dbid = (select oid from pg_database where datname = current_database())
         order by total_exec_time desc
+        limit 100
     gauges:
       - '*' 
   
diff --git a/config/prometheus/prometheus.yml b/config/prometheus/prometheus.yml
@@ -49,4 +49,5 @@ scrape_configs:
       - targets: ['monitoring_flask_backend:8000']
     scrape_interval: 300s     # 5 minutes - query texts rarely change
     scrape_timeout: 30s
-    metrics_path: /query_info_metrics
+    metrics_path: /query_info_metrics
+    sample_limit: 500         # Safety net: reject if flask exports too many queryids