Skip to content

[pull] master from DataDog:master#535

Merged
pull[bot] merged 2 commits into
ConnectionMaster:masterfrom
DataDog:master
May 13, 2026
Merged

[pull] master from DataDog:master#535
pull[bot] merged 2 commits into
ConnectionMaster:masterfrom
DataDog:master

Conversation

@pull

@pull pull Bot commented May 13, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

ethanperez and others added 2 commits May 13, 2026 02:50
* [Postgres] Add column statistics collection

Add a new column stats collector that queries pg_stats via the
datadog.column_stats() function and submits column-level statistics
(avg_width, n_distinct, null_frac) to the dbm-column-stats pipeline.

Key features:
- Streams results via cursor with configurable statement_timeout
- Chunks payloads at 5,000 columns to limit memory footprint
- Flushes at database boundaries to prevent cross-database accumulation
- Multi-database support via autodiscovery with per-database error isolation
- Health events for missing function and insufficient privileges
- Recovery detection when errors resolve
- Table include/exclude filtering pushed into SQL
- APM tracing via @tracked_method

* [Postgres] Move database_monitoring_column_stats to PostgreSql

Column statistics is currently a postgres-only feature, so the EvP
submission helper does not belong on the shared DatabaseCheck base.
Define it on the PostgreSql class instead. If MySQL or SQL Server
later add column-stats collectors, the method can be promoted up to
db.py without caller-side changes.

This also avoids cross-package release sequencing: a postgres PR
importing a new datadog_checks_base API would have to wait on a base
release before merging.

* [Postgres] Add regression tests for schema collector include filters

Existing schemas tests covered the exclude_databases / exclude_schemas /
exclude_tables paths but never include_*. Add coverage for the include
paths and for include + exclude combinations across all three filter
levels.

These tests serve as a baseline before refactoring the filter-clause
construction into a shared helper used by both the schemas and
column_stats collectors.

* [Postgres] Share filter clause helpers between schemas and column_stats

Extract two small helpers — regex_exclude_clauses and regex_include_clause
— into a new filters module, and use them from both the schemas and
column_stats collectors. The schemas methods (_get_schemas_query,
_get_tables_query) keep the same shape and behavior; column_stats's
_build_filters now composes both schema and table filters via the same
helpers.

While we are here, bring column_stats's filter set up to parity with
schemas by adding include_databases / exclude_databases (applied in
Python against the autodiscovered list) and include_schemas /
exclude_schemas (applied in SQL via the shared helpers). All four are
empty by default.

Coverage:
- Unit tests for the helpers in test_filters.py (12 cases).
- Integration tests in test_column_stats.py for include_schemas,
  exclude_schemas, include_and_exclude_schemas, include_databases,
  exclude_databases, include_and_exclude_databases.
- test_config_defaults updated for the four new knobs.

* [Postgres] Default collect_column_stats.enabled to false

Customers must opt in by setting collect_column_stats.enabled: true in
their instance config. Matches the default-off posture used by
collect_schemas.

* [Postgres] Update test_config feature checks for default-off column stats

test_initialize_features_enabled_and_disabled relied on collect_column_stats
defaulting to true; now that it defaults to false, enable it explicitly
alongside the other features. Also assert it is disabled in the
disabled-by-default test.

* [Postgres] Parameterize regex include/exclude filters

Address review feedback to switch from f-string interpolation to psycopg
parameterized queries for the regex-based include/exclude filters.

Helpers in filters.py now produce only `%s` placeholders, and the
collectors thread the pattern values through cursor.execute(query, params).
Mirrors the pattern already used by schemas._get_databases().

Changes:
- filters.py: helpers emit `%s` placeholders; pattern values no longer
  appear in the SQL string.
- column_stats._build_filters returns (sql, params); _collect_for_database
  passes params to cursor.execute.
- schemas._get_schemas_query / _get_tables_query / get_rows_query now
  return (sql, params); _get_cursor unpacks; _get_databases refactored
  to use the helpers.
- SCHEMA_QUERY: escape literal % as %% in LIKE clauses now that the
  query goes through parameterized execute.
- test_filters: assertions now check placeholder shape, not pattern values.
- test_column_stats: split special-chars test into two clearer tests
  (one for valid patterns containing quotes, one for invalid regex).

* [Postgres] Rename column_stats to column_statistics throughout

Use the full word 'column_statistics' instead of the abbreviated
'column_stats' for the new collector. Renames symbols, config keys, file
names, metric names, the SECURITY DEFINER function, the EvP track type,
and the dbm_type payload field.

Companion agent forwarder PR and backend dbm-metrics-intake registration
will be updated to match in a follow-up.

* [Postgres] Column statistics: tighten defaults, GCD scheduling, add diagnose probe

* [Postgres] Release column statistics collector on cancel

* [Postgres] Collect inherited, correlation, most_common_freqs in column statistics

* [Postgres] Capture column statistics collector before cancel in default-config test

* [Postgres] Address column statistics review feedback

* [Postgres] Drop column_statistics.max_query_duration config
@pull pull Bot locked and limited conversation to collaborators May 13, 2026
@pull pull Bot added the ⤵️ pull label May 13, 2026
@pull pull Bot merged commit 6614e9b into ConnectionMaster:master May 13, 2026
5 of 7 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants