docs: import CloudNativePG v1.29.1

cnpg-bot · cnpg-bot · commit 6a6b8e2788a1 · 2026-05-08T14:32:47.000Z
diff --git a/website/versioned_docs/version-1.29/failover.md b/website/versioned_docs/version-1.29/failover.md
@@ -103,6 +103,49 @@ expected outage.
 Enabling a new configuration option to delay failover provides a mechanism to
 prevent premature failover for short-lived network or node instability.
 
+## Detection of node-level failures
+
+When the node hosting the primary becomes unreachable (for example, due to a
+kubelet crash or a network partition between the node and the Kubernetes API
+server), the operator relies on the pod's `Ready` condition to decide that the
+primary is no longer serviceable. While the node is healthy the kubelet keeps
+that condition up to date from the readiness probe; once the node stops
+reporting, the Kubernetes node lifecycle controller is the one that flips the
+condition to `False` as soon as it declares the node `Unknown`.
+
+With stock kube-controller-manager settings, the transition is governed by
+`--node-monitor-grace-period` (default `40s` on Kubernetes 1.29-1.31, raised
+to `50s` in 1.32 and later): after that window the controller marks the node
+`Unknown` and, in the same monitoring pass, issues a patch per pod on that
+node to flip the `Ready` condition. In practice the operator observes the
+primary as unready about **40 to 55 seconds** after the node becomes
+unreachable (the grace period plus up to one `--node-monitor-period` poll,
+default `5s`). Managed Kubernetes distributions (GKE, EKS, AKS) may tune
+these values; consult the provider's documentation if the observed timing
+does not match. After that, the failover procedure starts (further gated by
+`.spec.failoverDelay`).
+
+The `Ready` condition flip is not subject to the rate limiters that throttle
+pod *eviction* during partial-zonal or large-cluster disruptions
+(`--node-eviction-rate`, `--secondary-node-eviction-rate`,
+`--unhealthy-zone-threshold`). The operator reacts to the condition flip as
+soon as the controller emits the patch, regardless of the zone or cluster-wide
+health state.
+
+Pod *eviction* (actual deletion from the unreachable node) is a separate
+mechanism, driven by `tolerationSeconds` on the
+`node.kubernetes.io/unreachable` `NoExecute` taint (`300s` by default). That
+timer does not hold up the operator's failover decision; CloudNativePG
+promotes a new primary as soon as the `Ready` condition flips. By that point
+the kubelet on the isolated node has already stopped the old PostgreSQL
+container locally: with the default
+`.spec.probes.liveness.isolationCheck.enabled: true`, the instance manager
+fails its own liveness probe once it can reach neither the API server nor
+the rest of the cluster, and the kubelet kills the container within
+approximately three probe periods (`~30s`). Full high availability
+(recreation of the old primary on a healthy node by the operator) is still
+gated on the taint-based eviction actually deleting the pod.
+
 ## Failover Quorum (Quorum-based Failover)
 
 Failover quorum is a mechanism that enhances data durability and safety during
diff --git a/website/versioned_docs/version-1.29/installation_upgrade.md b/website/versioned_docs/version-1.29/installation_upgrade.md
@@ -14,12 +14,12 @@ title: Installation and upgrades
 The operator can be installed like any other resource in Kubernetes,
 through a YAML manifest applied via `kubectl`.
 
-You can install the [latest operator manifest](https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.29/releases/cnpg-1.29.0.yaml)
+You can install the [latest operator manifest](https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.29/releases/cnpg-1.29.1.yaml)
 for this minor release as follows:
 
 ```sh
 kubectl apply --server-side -f \
-  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.29/releases/cnpg-1.29.0.yaml
+  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.29/releases/cnpg-1.29.1.yaml
 ```
 
 You can verify that with:
@@ -267,6 +267,22 @@ removed before installing the new one. This won't affect user data but
 only the operator itself.
 
 
+### Upgrading to 1.29.1 or 1.28.3
+
+Version 1.29.1 and 1.28.3 ship the fix for `CVE-2026-44477` /
+`GHSA-423p-g724-fr39`. The metrics exporter now authenticates as a
+dedicated `cnpg_metrics_exporter` role with `pg_monitor` privileges
+only, instead of the `postgres` superuser.
+
+Custom monitoring queries that read user-owned tables, or use
+`target_databases: '*'` against databases where `PUBLIC` `CONNECT`
+has been revoked, need explicit `GRANT` statements to
+`cnpg_metrics_exporter`. See ["Custom query privileges and
+safety"](monitoring.md#custom-query-privileges-and-safety) and ["Manually creating
+the metrics exporter
+role"](monitoring.md#manually-creating-the-metrics-exporter-role) in
+the monitoring documentation.
+
 ### Upgrading to 1.29.0 or 1.28.x
 
 :::info[Important]
diff --git a/website/versioned_docs/version-1.29/kubectl-plugin.md b/website/versioned_docs/version-1.29/kubectl-plugin.md
@@ -38,11 +38,11 @@ them in your systems.
 
 #### Debian packages
 
-For example, let's install the 1.29.0 release of the plugin, for an Intel based
+For example, let's install the 1.29.1 release of the plugin, for an Intel based
 64 bit server. First, we download the right `.deb` file.
 
 ```sh
-wget https://github.com/cloudnative-pg/cloudnative-pg/releases/download/v1.29.0/kubectl-cnpg_1.29.0_linux_x86_64.deb \
+wget https://github.com/cloudnative-pg/cloudnative-pg/releases/download/v1.29.1/kubectl-cnpg_1.29.1_linux_x86_64.deb \
   --output-document kube-plugin.deb
 ```
 
@@ -53,17 +53,17 @@ $ sudo dpkg -i kube-plugin.deb
 Selecting previously unselected package cnpg.
 (Reading database ... 6688 files and directories currently installed.)
 Preparing to unpack kube-plugin.deb ...
-Unpacking cnpg (1.29.0) ...
-Setting up cnpg (1.29.0) ...
+Unpacking cnpg (1.29.1) ...
+Setting up cnpg (1.29.1) ...
 ```
 
 #### RPM packages
 
-As in the example for `.rpm` packages, let's install the 1.29.0 release for an
+As in the example for `.rpm` packages, let's install the 1.29.1 release for an
 Intel 64 bit machine. Note the `--output` flag to provide a file name.
 
 ```sh
-curl -L https://github.com/cloudnative-pg/cloudnative-pg/releases/download/v1.29.0/kubectl-cnpg_1.29.0_linux_x86_64.rpm \
+curl -L https://github.com/cloudnative-pg/cloudnative-pg/releases/download/v1.29.1/kubectl-cnpg_1.29.1_linux_x86_64.rpm \
   --output kube-plugin.rpm
 ```
 
@@ -77,7 +77,7 @@ Dependencies resolved.
  Package            Architecture         Version                   Repository                  Size
 ====================================================================================================
 Installing:
- cnpg               x86_64               1.29.0-1                  @commandline                20 M
+ cnpg               x86_64               1.29.1-1                  @commandline                20 M
 
 Transaction Summary
 ====================================================================================================
@@ -306,9 +306,9 @@ sandbox-3  0/604DE38  0/604DE38  0/604DE38  0/604DE38   00:00:00   00:00:00   00
 Instances status
 Name       Current LSN  Replication role  Status  QoS         Manager Version  Node
 ----       -----------  ----------------  ------  ---         ---------------  ----
-sandbox-1  0/604DE38    Primary           OK      BestEffort  1.29.0           k8s-eu-worker
-sandbox-2  0/604DE38    Standby (async)   OK      BestEffort  1.29.0           k8s-eu-worker2
-sandbox-3  0/604DE38    Standby (async)   OK      BestEffort  1.29.0           k8s-eu-worker
+sandbox-1  0/604DE38    Primary           OK      BestEffort  1.29.1           k8s-eu-worker
+sandbox-2  0/604DE38    Standby (async)   OK      BestEffort  1.29.1           k8s-eu-worker2
+sandbox-3  0/604DE38    Standby (async)   OK      BestEffort  1.29.1           k8s-eu-worker
 ```
 
 If you require more detailed status information, use the `--verbose` option (or
@@ -362,9 +362,9 @@ sandbox-primary  primary  1              1                1
 Instances status
 Name       Current LSN  Replication role  Status  QoS         Manager Version  Node
 ----       -----------  ----------------  ------  ---         ---------------  ----
-sandbox-1  0/6053720    Primary           OK      BestEffort  1.29.0           k8s-eu-worker
-sandbox-2  0/6053720    Standby (async)   OK      BestEffort  1.29.0           k8s-eu-worker2
-sandbox-3  0/6053720    Standby (async)   OK      BestEffort  1.29.0           k8s-eu-worker
+sandbox-1  0/6053720    Primary           OK      BestEffort  1.29.1           k8s-eu-worker
+sandbox-2  0/6053720    Standby (async)   OK      BestEffort  1.29.1           k8s-eu-worker2
+sandbox-3  0/6053720    Standby (async)   OK      BestEffort  1.29.1           k8s-eu-worker
 ```
 
 With an additional `-v` (e.g. `kubectl cnpg status sandbox -v -v`), you can
@@ -640,12 +640,12 @@ Archive:  report_operator_<TIMESTAMP>.zip
 
 ```output
 ====== Beginning of Previous Log =====
-2023-03-28T12:56:41.251711811Z {"level":"info","ts":"2023-03-28T12:56:41Z","logger":"setup","msg":"Starting CloudNativePG Operator","version":"1.29.0","build":{"Version":"1.29.0+dev107","Commit":"cc9bab17","Date":"2023-03-28"}}
+2023-03-28T12:56:41.251711811Z {"level":"info","ts":"2023-03-28T12:56:41Z","logger":"setup","msg":"Starting CloudNativePG Operator","version":"1.29.1","build":{"Version":"1.29.1+dev107","Commit":"cc9bab17","Date":"2023-03-28"}}
 2023-03-28T12:56:41.251851909Z {"level":"info","ts":"2023-03-28T12:56:41Z","logger":"setup","msg":"Starting pprof HTTP server","addr":"0.0.0.0:6060"}
   <snipped …>
 
 ====== End of Previous Log =====
-2023-03-28T12:57:09.854306024Z {"level":"info","ts":"2023-03-28T12:57:09Z","logger":"setup","msg":"Starting CloudNativePG Operator","version":"1.29.0","build":{"Version":"1.29.0+dev107","Commit":"cc9bab17","Date":"2023-03-28"}}
+2023-03-28T12:57:09.854306024Z {"level":"info","ts":"2023-03-28T12:57:09Z","logger":"setup","msg":"Starting CloudNativePG Operator","version":"1.29.1","build":{"Version":"1.29.1+dev107","Commit":"cc9bab17","Date":"2023-03-28"}}
 2023-03-28T12:57:09.854363943Z {"level":"info","ts":"2023-03-28T12:57:09Z","logger":"setup","msg":"Starting pprof HTTP server","addr":"0.0.0.0:6060"}
 ```
 
diff --git a/website/versioned_docs/version-1.29/labels_annotations.md b/website/versioned_docs/version-1.29/labels_annotations.md
@@ -102,15 +102,21 @@ This label is available only on `VolumeSnapshot` resources.
   default users created by CloudNativePG (typically `postgres` and `app`).
 
 `role` - **deprecated**
-:  Whether the instance running in a pod is a `primary` or a `replica`.
-   This label is deprecated, you should use `cnpg.io/instanceRole` instead.
+:  Role of the instance running in a pod: `primary`, `replica`, or
+   `unhealthy`. The `unhealthy` value is transient: the operator sets
+   it on the old primary during a failover or switchover and clears it
+   automatically once the transition completes. This label is deprecated,
+   you should use `cnpg.io/instanceRole` instead.
 
 `cnpg.io/scheduled-backup`
 :  When available, name of the `ScheduledBackup` resource that created a given
    `Backup` object.
 
 `cnpg.io/instanceRole`
-: Whether the instance running in a pod is a `primary` or a `replica`.
+: Role of the instance running in a pod: `primary`, `replica`, or
+  `unhealthy`. The `unhealthy` value is transient: the operator sets
+  it on the old primary during a failover or switchover and clears it
+  automatically once the transition completes.
 
 `app.kubernetes.io/managed-by`
 : Name of the manager. It will always be `cloudnative-pg`.
diff --git a/website/versioned_docs/version-1.29/monitoring.md b/website/versioned_docs/version-1.29/monitoring.md
@@ -36,10 +36,17 @@ more `ConfigMap` or `Secret` resources (see the
 
 All monitoring queries that are performed on PostgreSQL are:
 
-- atomic (one transaction per query)
-- executed with the `pg_monitor` role
+- atomic (one read-only transaction per query)
+- executed as the `cnpg_metrics_exporter` role (a member of `pg_monitor`)
 - executed with `application_name` set to `cnpg_metrics_exporter`
-- executed as user `postgres`
+
+The connection uses peer authentication on the pod-local Unix socket;
+because `session_user` is never a superuser, the monitoring session
+cannot escalate via `RESET ROLE` or `RESET SESSION AUTHORIZATION`. Do
+not grant additional privileges or role memberships to
+`cnpg_metrics_exporter` beyond `pg_monitor` and the table-level grants
+required by your custom queries: any extra membership flows into the
+scrape session via inheritance and weakens this property.
 
 Please refer to the "Predefined Roles" section in PostgreSQL
 [documentation](https://www.postgresql.org/docs/current/predefined-roles.html)
@@ -494,6 +501,42 @@ Take care that the referred resources have to be created **in the same namespace
     and a key `queryName` containing the overwritten query name.
 :::
 
+#### Custom query privileges and safety
+
+:::warning
+    Custom queries run as the `cnpg_metrics_exporter` role, which inherits
+    `pg_monitor`. Queries within `pg_monitor`'s scope (catalog reads,
+    `pg_stat_*` views, configuration parameters) work without modification.
+    Queries that read user-owned tables or superuser-only catalogs (e.g.
+    `pg_authid`, `pg_subscription`) need explicit grants. Reading a table
+    also requires USAGE on its schema:
+
+    ```sql
+    GRANT USAGE ON SCHEMA myschema TO cnpg_metrics_exporter;
+    GRANT SELECT ON TABLE myschema.mytable TO cnpg_metrics_exporter;
+    ```
+
+    Every database in `target_databases` must allow
+    `cnpg_metrics_exporter` to `CONNECT`. On clusters that have
+    revoked `CONNECT` from `PUBLIC` for a database, grant it
+    explicitly to that role:
+
+    ```sql
+    GRANT CONNECT ON DATABASE domainapp TO cnpg_metrics_exporter;
+    ```
+
+    Prefer an explicit list of trusted databases (e.g.
+    `target_databases: ["domainapp"]`) over the `"*"` wildcard. The
+    wildcard scrapes every database the role can connect to and
+    silently skips the rest, so an explicit list makes a missing grant
+    easier to notice. Use `"*"` only when the query is meant to
+    collect per-database metrics across the whole cluster.
+
+    Schema-qualify catalog references (`pg_catalog.now()`,
+    `pg_catalog.current_database()`) to prevent `search_path` shadowing
+    by user-owned objects.
+:::
+
 #### Example of a user defined metric
 
 Here you can see an example of a `ConfigMap` containing a single custom query,
@@ -510,14 +553,14 @@ metadata:
 data:
   custom-queries: |
     pg_replication:
-      query: "SELECT CASE WHEN NOT pg_is_in_recovery()
+      query: "SELECT CASE WHEN NOT pg_catalog.pg_is_in_recovery()
               THEN 0
               ELSE GREATEST (0,
-                EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())))
+                EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) pg_catalog.pg_last_xact_replay_timestamp())))
               END AS lag,
-              pg_is_in_recovery() AS in_recovery,
-              EXISTS (TABLE pg_stat_wal_receiver) AS is_wal_receiver_up,
-              (SELECT count(*) FROM pg_stat_replication) AS streaming_replicas"
+              pg_catalog.pg_is_in_recovery() AS in_recovery,
+              EXISTS (TABLE pg_catalog.pg_stat_wal_receiver) AS is_wal_receiver_up,
+              (SELECT pg_catalog.count(*) FROM pg_catalog.pg_stat_replication) AS streaming_replicas"
 
       metrics:
         - lag:
@@ -553,7 +596,7 @@ some_query: |
     FROM some_table
   query: |
     SELECT
-     count(*) as rows
+     pg_catalog.count(*) as rows
     FROM some_table
   metrics:
     - rows:
@@ -570,24 +613,29 @@ Database auto-discovery can be enabled for a specific query by specifying a
 *shell-like pattern* (i.e., containing `*`, `?` or `[]`) in the list of
 `target_databases`. If provided, the operator will expand the list of target
 databases by adding all the databases returned by the execution of `SELECT
-datname FROM pg_database WHERE datallowconn AND NOT datistemplate` and matching
-the pattern according to [path.Match()](https://pkg.go.dev/path#Match) rules.
+datname FROM pg_catalog.pg_database WHERE datallowconn AND NOT datistemplate
+AND pg_catalog.has_database_privilege(datname, 'CONNECT')` and matching the
+pattern according to [path.Match()](https://pkg.go.dev/path#Match) rules.
+Databases on which `cnpg_metrics_exporter` lacks the `CONNECT` privilege are
+silently skipped; if you want a database with revoked `PUBLIC` access to be
+scraped, grant `CONNECT` explicitly (see "Custom query privileges and safety"
+above).
 
 :::note
     The `*` character has a [special meaning](https://yaml.org/spec/1.2/spec.html#id2786448) in yaml,
     so you need to quote (`"*"`) the `target_databases` value when it includes such a pattern.
 :::
 
 It is recommended that you always include the name of the database
-in the returned labels, for example using the `current_database()` function
-as in the following example:
+in the returned labels, for example using the `pg_catalog.current_database()`
+function as in the following example:
 
 ```yaml
 some_query: |
   query: |
     SELECT
-     current_database() as datname,
-     count(*) as rows
+     pg_catalog.current_database() as datname,
+     pg_catalog.count(*) as rows
     FROM some_table
   metrics:
     - datname:
@@ -618,8 +666,8 @@ aforementioned query):
 some_query: |
   query: |
     SELECT
-     current_database() as datname,
-     count(*) as rows
+     pg_catalog.current_database() as datname,
+     pg_catalog.count(*) as rows
     FROM some_table
   metrics:
     - datname:
@@ -757,6 +805,33 @@ CloudNativePG is inspired by the PostgreSQL Prometheus Exporter, but
 presents some differences. In particular, the `cache_seconds` field is not implemented
 in CloudNativePG's exporter.
 
+### Manually creating the metrics exporter role
+
+The operator creates the `cnpg_metrics_exporter` PostgreSQL role on the
+primary during reconciliation; it then propagates to standbys and
+replica clusters via streaming replication.
+
+If the role is missing (replica cluster upgraded before its primary,
+restore from a backup that predates the role, accidental removal),
+recreate it as a superuser on the writable primary of the replication
+chain (the source primary, not a designated primary of a replica
+cluster):
+
+```sql
+CREATE ROLE cnpg_metrics_exporter WITH LOGIN NOSUPERUSER NOCREATEDB
+    NOCREATEROLE NOREPLICATION NOBYPASSRLS INHERIT;
+GRANT pg_monitor TO cnpg_metrics_exporter;
+```
+
+If your custom monitoring queries need access to objects outside
+`pg_monitor`'s scope, grant the necessary privileges explicitly. SELECT
+on a table also requires USAGE on its schema:
+
+```sql
+GRANT USAGE ON SCHEMA myschema TO cnpg_metrics_exporter;
+GRANT SELECT ON TABLE myschema.mytable TO cnpg_metrics_exporter;
+```
+
 ## Monitoring the CloudNativePG operator
 
 The operator internally exposes [Prometheus](https://prometheus.io/) metrics
diff --git a/website/versioned_docs/version-1.29/operator_conf.md b/website/versioned_docs/version-1.29/operator_conf.md
@@ -169,7 +169,7 @@ Add `--pprof-server=true` to the args list, for example:
       containers:
       - args:
         - controller
-        - --enable-leader-election
+        - --leader-elect
         - --config-map-name=cnpg-controller-manager-config
         - --secret-name=cnpg-controller-manager-config
         - --log-level=info
diff --git a/website/versioned_docs/version-1.29/release_notes/v1.28.md b/website/versioned_docs/version-1.29/release_notes/v1.28.md
diff --git a/website/versioned_docs/version-1.29/release_notes/v1.29.md b/website/versioned_docs/version-1.29/release_notes/v1.29.md
diff --git a/website/versioned_docs/version-1.29/samples/cluster-example-monitoring.yaml b/website/versioned_docs/version-1.29/samples/cluster-example-monitoring.yaml