Skip to content

Commit 6a6b8e2

Browse files
committed
docs: import CloudNativePG v1.29.1
1 parent 1679246 commit 6a6b8e2

9 files changed

Lines changed: 424 additions & 44 deletions

File tree

website/versioned_docs/version-1.29/failover.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,49 @@ expected outage.
103103
Enabling a new configuration option to delay failover provides a mechanism to
104104
prevent premature failover for short-lived network or node instability.
105105

106+
## Detection of node-level failures
107+
108+
When the node hosting the primary becomes unreachable (for example, due to a
109+
kubelet crash or a network partition between the node and the Kubernetes API
110+
server), the operator relies on the pod's `Ready` condition to decide that the
111+
primary is no longer serviceable. While the node is healthy the kubelet keeps
112+
that condition up to date from the readiness probe; once the node stops
113+
reporting, the Kubernetes node lifecycle controller is the one that flips the
114+
condition to `False` as soon as it declares the node `Unknown`.
115+
116+
With stock kube-controller-manager settings, the transition is governed by
117+
`--node-monitor-grace-period` (default `40s` on Kubernetes 1.29-1.31, raised
118+
to `50s` in 1.32 and later): after that window the controller marks the node
119+
`Unknown` and, in the same monitoring pass, issues a patch per pod on that
120+
node to flip the `Ready` condition. In practice the operator observes the
121+
primary as unready about **40 to 55 seconds** after the node becomes
122+
unreachable (the grace period plus up to one `--node-monitor-period` poll,
123+
default `5s`). Managed Kubernetes distributions (GKE, EKS, AKS) may tune
124+
these values; consult the provider's documentation if the observed timing
125+
does not match. After that, the failover procedure starts (further gated by
126+
`.spec.failoverDelay`).
127+
128+
The `Ready` condition flip is not subject to the rate limiters that throttle
129+
pod *eviction* during partial-zonal or large-cluster disruptions
130+
(`--node-eviction-rate`, `--secondary-node-eviction-rate`,
131+
`--unhealthy-zone-threshold`). The operator reacts to the condition flip as
132+
soon as the controller emits the patch, regardless of the zone or cluster-wide
133+
health state.
134+
135+
Pod *eviction* (actual deletion from the unreachable node) is a separate
136+
mechanism, driven by `tolerationSeconds` on the
137+
`node.kubernetes.io/unreachable` `NoExecute` taint (`300s` by default). That
138+
timer does not hold up the operator's failover decision; CloudNativePG
139+
promotes a new primary as soon as the `Ready` condition flips. By that point
140+
the kubelet on the isolated node has already stopped the old PostgreSQL
141+
container locally: with the default
142+
`.spec.probes.liveness.isolationCheck.enabled: true`, the instance manager
143+
fails its own liveness probe once it can reach neither the API server nor
144+
the rest of the cluster, and the kubelet kills the container within
145+
approximately three probe periods (`~30s`). Full high availability
146+
(recreation of the old primary on a healthy node by the operator) is still
147+
gated on the taint-based eviction actually deleting the pod.
148+
106149
## Failover Quorum (Quorum-based Failover)
107150

108151
Failover quorum is a mechanism that enhances data durability and safety during

website/versioned_docs/version-1.29/installation_upgrade.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,12 @@ title: Installation and upgrades
1414
The operator can be installed like any other resource in Kubernetes,
1515
through a YAML manifest applied via `kubectl`.
1616

17-
You can install the [latest operator manifest](https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.29/releases/cnpg-1.29.0.yaml)
17+
You can install the [latest operator manifest](https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.29/releases/cnpg-1.29.1.yaml)
1818
for this minor release as follows:
1919

2020
```sh
2121
kubectl apply --server-side -f \
22-
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.29/releases/cnpg-1.29.0.yaml
22+
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.29/releases/cnpg-1.29.1.yaml
2323
```
2424

2525
You can verify that with:
@@ -267,6 +267,22 @@ removed before installing the new one. This won't affect user data but
267267
only the operator itself.
268268

269269

270+
### Upgrading to 1.29.1 or 1.28.3
271+
272+
Version 1.29.1 and 1.28.3 ship the fix for `CVE-2026-44477` /
273+
`GHSA-423p-g724-fr39`. The metrics exporter now authenticates as a
274+
dedicated `cnpg_metrics_exporter` role with `pg_monitor` privileges
275+
only, instead of the `postgres` superuser.
276+
277+
Custom monitoring queries that read user-owned tables, or use
278+
`target_databases: '*'` against databases where `PUBLIC` `CONNECT`
279+
has been revoked, need explicit `GRANT` statements to
280+
`cnpg_metrics_exporter`. See ["Custom query privileges and
281+
safety"](monitoring.md#custom-query-privileges-and-safety) and ["Manually creating
282+
the metrics exporter
283+
role"](monitoring.md#manually-creating-the-metrics-exporter-role) in
284+
the monitoring documentation.
285+
270286
### Upgrading to 1.29.0 or 1.28.x
271287

272288
:::info[Important]

website/versioned_docs/version-1.29/kubectl-plugin.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,11 @@ them in your systems.
3838

3939
#### Debian packages
4040

41-
For example, let's install the 1.29.0 release of the plugin, for an Intel based
41+
For example, let's install the 1.29.1 release of the plugin, for an Intel based
4242
64 bit server. First, we download the right `.deb` file.
4343

4444
```sh
45-
wget https://github.com/cloudnative-pg/cloudnative-pg/releases/download/v1.29.0/kubectl-cnpg_1.29.0_linux_x86_64.deb \
45+
wget https://github.com/cloudnative-pg/cloudnative-pg/releases/download/v1.29.1/kubectl-cnpg_1.29.1_linux_x86_64.deb \
4646
--output-document kube-plugin.deb
4747
```
4848

@@ -53,17 +53,17 @@ $ sudo dpkg -i kube-plugin.deb
5353
Selecting previously unselected package cnpg.
5454
(Reading database ... 6688 files and directories currently installed.)
5555
Preparing to unpack kube-plugin.deb ...
56-
Unpacking cnpg (1.29.0) ...
57-
Setting up cnpg (1.29.0) ...
56+
Unpacking cnpg (1.29.1) ...
57+
Setting up cnpg (1.29.1) ...
5858
```
5959

6060
#### RPM packages
6161

62-
As in the example for `.rpm` packages, let's install the 1.29.0 release for an
62+
As in the example for `.rpm` packages, let's install the 1.29.1 release for an
6363
Intel 64 bit machine. Note the `--output` flag to provide a file name.
6464

6565
```sh
66-
curl -L https://github.com/cloudnative-pg/cloudnative-pg/releases/download/v1.29.0/kubectl-cnpg_1.29.0_linux_x86_64.rpm \
66+
curl -L https://github.com/cloudnative-pg/cloudnative-pg/releases/download/v1.29.1/kubectl-cnpg_1.29.1_linux_x86_64.rpm \
6767
--output kube-plugin.rpm
6868
```
6969

@@ -77,7 +77,7 @@ Dependencies resolved.
7777
Package Architecture Version Repository Size
7878
====================================================================================================
7979
Installing:
80-
cnpg x86_64 1.29.0-1 @commandline 20 M
80+
cnpg x86_64 1.29.1-1 @commandline 20 M
8181

8282
Transaction Summary
8383
====================================================================================================
@@ -306,9 +306,9 @@ sandbox-3 0/604DE38 0/604DE38 0/604DE38 0/604DE38 00:00:00 00:00:00 00
306306
Instances status
307307
Name Current LSN Replication role Status QoS Manager Version Node
308308
---- ----------- ---------------- ------ --- --------------- ----
309-
sandbox-1 0/604DE38 Primary OK BestEffort 1.29.0 k8s-eu-worker
310-
sandbox-2 0/604DE38 Standby (async) OK BestEffort 1.29.0 k8s-eu-worker2
311-
sandbox-3 0/604DE38 Standby (async) OK BestEffort 1.29.0 k8s-eu-worker
309+
sandbox-1 0/604DE38 Primary OK BestEffort 1.29.1 k8s-eu-worker
310+
sandbox-2 0/604DE38 Standby (async) OK BestEffort 1.29.1 k8s-eu-worker2
311+
sandbox-3 0/604DE38 Standby (async) OK BestEffort 1.29.1 k8s-eu-worker
312312
```
313313

314314
If you require more detailed status information, use the `--verbose` option (or
@@ -362,9 +362,9 @@ sandbox-primary primary 1 1 1
362362
Instances status
363363
Name Current LSN Replication role Status QoS Manager Version Node
364364
---- ----------- ---------------- ------ --- --------------- ----
365-
sandbox-1 0/6053720 Primary OK BestEffort 1.29.0 k8s-eu-worker
366-
sandbox-2 0/6053720 Standby (async) OK BestEffort 1.29.0 k8s-eu-worker2
367-
sandbox-3 0/6053720 Standby (async) OK BestEffort 1.29.0 k8s-eu-worker
365+
sandbox-1 0/6053720 Primary OK BestEffort 1.29.1 k8s-eu-worker
366+
sandbox-2 0/6053720 Standby (async) OK BestEffort 1.29.1 k8s-eu-worker2
367+
sandbox-3 0/6053720 Standby (async) OK BestEffort 1.29.1 k8s-eu-worker
368368
```
369369

370370
With an additional `-v` (e.g. `kubectl cnpg status sandbox -v -v`), you can
@@ -640,12 +640,12 @@ Archive: report_operator_<TIMESTAMP>.zip
640640

641641
```output
642642
====== Beginning of Previous Log =====
643-
2023-03-28T12:56:41.251711811Z {"level":"info","ts":"2023-03-28T12:56:41Z","logger":"setup","msg":"Starting CloudNativePG Operator","version":"1.29.0","build":{"Version":"1.29.0+dev107","Commit":"cc9bab17","Date":"2023-03-28"}}
643+
2023-03-28T12:56:41.251711811Z {"level":"info","ts":"2023-03-28T12:56:41Z","logger":"setup","msg":"Starting CloudNativePG Operator","version":"1.29.1","build":{"Version":"1.29.1+dev107","Commit":"cc9bab17","Date":"2023-03-28"}}
644644
2023-03-28T12:56:41.251851909Z {"level":"info","ts":"2023-03-28T12:56:41Z","logger":"setup","msg":"Starting pprof HTTP server","addr":"0.0.0.0:6060"}
645645
<snipped …>
646646
647647
====== End of Previous Log =====
648-
2023-03-28T12:57:09.854306024Z {"level":"info","ts":"2023-03-28T12:57:09Z","logger":"setup","msg":"Starting CloudNativePG Operator","version":"1.29.0","build":{"Version":"1.29.0+dev107","Commit":"cc9bab17","Date":"2023-03-28"}}
648+
2023-03-28T12:57:09.854306024Z {"level":"info","ts":"2023-03-28T12:57:09Z","logger":"setup","msg":"Starting CloudNativePG Operator","version":"1.29.1","build":{"Version":"1.29.1+dev107","Commit":"cc9bab17","Date":"2023-03-28"}}
649649
2023-03-28T12:57:09.854363943Z {"level":"info","ts":"2023-03-28T12:57:09Z","logger":"setup","msg":"Starting pprof HTTP server","addr":"0.0.0.0:6060"}
650650
```
651651

website/versioned_docs/version-1.29/labels_annotations.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -102,15 +102,21 @@ This label is available only on `VolumeSnapshot` resources.
102102
default users created by CloudNativePG (typically `postgres` and `app`).
103103

104104
`role` - **deprecated**
105-
: Whether the instance running in a pod is a `primary` or a `replica`.
106-
This label is deprecated, you should use `cnpg.io/instanceRole` instead.
105+
: Role of the instance running in a pod: `primary`, `replica`, or
106+
`unhealthy`. The `unhealthy` value is transient: the operator sets
107+
it on the old primary during a failover or switchover and clears it
108+
automatically once the transition completes. This label is deprecated,
109+
you should use `cnpg.io/instanceRole` instead.
107110

108111
`cnpg.io/scheduled-backup`
109112
: When available, name of the `ScheduledBackup` resource that created a given
110113
`Backup` object.
111114

112115
`cnpg.io/instanceRole`
113-
: Whether the instance running in a pod is a `primary` or a `replica`.
116+
: Role of the instance running in a pod: `primary`, `replica`, or
117+
`unhealthy`. The `unhealthy` value is transient: the operator sets
118+
it on the old primary during a failover or switchover and clears it
119+
automatically once the transition completes.
114120

115121
`app.kubernetes.io/managed-by`
116122
: Name of the manager. It will always be `cloudnative-pg`.

website/versioned_docs/version-1.29/monitoring.md

Lines changed: 92 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,17 @@ more `ConfigMap` or `Secret` resources (see the
3636

3737
All monitoring queries that are performed on PostgreSQL are:
3838

39-
- atomic (one transaction per query)
40-
- executed with the `pg_monitor` role
39+
- atomic (one read-only transaction per query)
40+
- executed as the `cnpg_metrics_exporter` role (a member of `pg_monitor`)
4141
- executed with `application_name` set to `cnpg_metrics_exporter`
42-
- executed as user `postgres`
42+
43+
The connection uses peer authentication on the pod-local Unix socket;
44+
because `session_user` is never a superuser, the monitoring session
45+
cannot escalate via `RESET ROLE` or `RESET SESSION AUTHORIZATION`. Do
46+
not grant additional privileges or role memberships to
47+
`cnpg_metrics_exporter` beyond `pg_monitor` and the table-level grants
48+
required by your custom queries: any extra membership flows into the
49+
scrape session via inheritance and weakens this property.
4350

4451
Please refer to the "Predefined Roles" section in PostgreSQL
4552
[documentation](https://www.postgresql.org/docs/current/predefined-roles.html)
@@ -494,6 +501,42 @@ Take care that the referred resources have to be created **in the same namespace
494501
and a key `queryName` containing the overwritten query name.
495502
:::
496503

504+
#### Custom query privileges and safety
505+
506+
:::warning
507+
Custom queries run as the `cnpg_metrics_exporter` role, which inherits
508+
`pg_monitor`. Queries within `pg_monitor`'s scope (catalog reads,
509+
`pg_stat_*` views, configuration parameters) work without modification.
510+
Queries that read user-owned tables or superuser-only catalogs (e.g.
511+
`pg_authid`, `pg_subscription`) need explicit grants. Reading a table
512+
also requires USAGE on its schema:
513+
514+
```sql
515+
GRANT USAGE ON SCHEMA myschema TO cnpg_metrics_exporter;
516+
GRANT SELECT ON TABLE myschema.mytable TO cnpg_metrics_exporter;
517+
```
518+
519+
Every database in `target_databases` must allow
520+
`cnpg_metrics_exporter` to `CONNECT`. On clusters that have
521+
revoked `CONNECT` from `PUBLIC` for a database, grant it
522+
explicitly to that role:
523+
524+
```sql
525+
GRANT CONNECT ON DATABASE domainapp TO cnpg_metrics_exporter;
526+
```
527+
528+
Prefer an explicit list of trusted databases (e.g.
529+
`target_databases: ["domainapp"]`) over the `"*"` wildcard. The
530+
wildcard scrapes every database the role can connect to and
531+
silently skips the rest, so an explicit list makes a missing grant
532+
easier to notice. Use `"*"` only when the query is meant to
533+
collect per-database metrics across the whole cluster.
534+
535+
Schema-qualify catalog references (`pg_catalog.now()`,
536+
`pg_catalog.current_database()`) to prevent `search_path` shadowing
537+
by user-owned objects.
538+
:::
539+
497540
#### Example of a user defined metric
498541

499542
Here you can see an example of a `ConfigMap` containing a single custom query,
@@ -510,14 +553,14 @@ metadata:
510553
data:
511554
custom-queries: |
512555
pg_replication:
513-
query: "SELECT CASE WHEN NOT pg_is_in_recovery()
556+
query: "SELECT CASE WHEN NOT pg_catalog.pg_is_in_recovery()
514557
THEN 0
515558
ELSE GREATEST (0,
516-
EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())))
559+
EXTRACT(EPOCH FROM (pg_catalog.now() OPERATOR(pg_catalog.-) pg_catalog.pg_last_xact_replay_timestamp())))
517560
END AS lag,
518-
pg_is_in_recovery() AS in_recovery,
519-
EXISTS (TABLE pg_stat_wal_receiver) AS is_wal_receiver_up,
520-
(SELECT count(*) FROM pg_stat_replication) AS streaming_replicas"
561+
pg_catalog.pg_is_in_recovery() AS in_recovery,
562+
EXISTS (TABLE pg_catalog.pg_stat_wal_receiver) AS is_wal_receiver_up,
563+
(SELECT pg_catalog.count(*) FROM pg_catalog.pg_stat_replication) AS streaming_replicas"
521564
522565
metrics:
523566
- lag:
@@ -553,7 +596,7 @@ some_query: |
553596
FROM some_table
554597
query: |
555598
SELECT
556-
count(*) as rows
599+
pg_catalog.count(*) as rows
557600
FROM some_table
558601
metrics:
559602
- rows:
@@ -570,24 +613,29 @@ Database auto-discovery can be enabled for a specific query by specifying a
570613
*shell-like pattern* (i.e., containing `*`, `?` or `[]`) in the list of
571614
`target_databases`. If provided, the operator will expand the list of target
572615
databases by adding all the databases returned by the execution of `SELECT
573-
datname FROM pg_database WHERE datallowconn AND NOT datistemplate` and matching
574-
the pattern according to [path.Match()](https://pkg.go.dev/path#Match) rules.
616+
datname FROM pg_catalog.pg_database WHERE datallowconn AND NOT datistemplate
617+
AND pg_catalog.has_database_privilege(datname, 'CONNECT')` and matching the
618+
pattern according to [path.Match()](https://pkg.go.dev/path#Match) rules.
619+
Databases on which `cnpg_metrics_exporter` lacks the `CONNECT` privilege are
620+
silently skipped; if you want a database with revoked `PUBLIC` access to be
621+
scraped, grant `CONNECT` explicitly (see "Custom query privileges and safety"
622+
above).
575623

576624
:::note
577625
The `*` character has a [special meaning](https://yaml.org/spec/1.2/spec.html#id2786448) in yaml,
578626
so you need to quote (`"*"`) the `target_databases` value when it includes such a pattern.
579627
:::
580628

581629
It is recommended that you always include the name of the database
582-
in the returned labels, for example using the `current_database()` function
583-
as in the following example:
630+
in the returned labels, for example using the `pg_catalog.current_database()`
631+
function as in the following example:
584632

585633
```yaml
586634
some_query: |
587635
query: |
588636
SELECT
589-
current_database() as datname,
590-
count(*) as rows
637+
pg_catalog.current_database() as datname,
638+
pg_catalog.count(*) as rows
591639
FROM some_table
592640
metrics:
593641
- datname:
@@ -618,8 +666,8 @@ aforementioned query):
618666
some_query: |
619667
query: |
620668
SELECT
621-
current_database() as datname,
622-
count(*) as rows
669+
pg_catalog.current_database() as datname,
670+
pg_catalog.count(*) as rows
623671
FROM some_table
624672
metrics:
625673
- datname:
@@ -757,6 +805,33 @@ CloudNativePG is inspired by the PostgreSQL Prometheus Exporter, but
757805
presents some differences. In particular, the `cache_seconds` field is not implemented
758806
in CloudNativePG's exporter.
759807

808+
### Manually creating the metrics exporter role
809+
810+
The operator creates the `cnpg_metrics_exporter` PostgreSQL role on the
811+
primary during reconciliation; it then propagates to standbys and
812+
replica clusters via streaming replication.
813+
814+
If the role is missing (replica cluster upgraded before its primary,
815+
restore from a backup that predates the role, accidental removal),
816+
recreate it as a superuser on the writable primary of the replication
817+
chain (the source primary, not a designated primary of a replica
818+
cluster):
819+
820+
```sql
821+
CREATE ROLE cnpg_metrics_exporter WITH LOGIN NOSUPERUSER NOCREATEDB
822+
NOCREATEROLE NOREPLICATION NOBYPASSRLS INHERIT;
823+
GRANT pg_monitor TO cnpg_metrics_exporter;
824+
```
825+
826+
If your custom monitoring queries need access to objects outside
827+
`pg_monitor`'s scope, grant the necessary privileges explicitly. SELECT
828+
on a table also requires USAGE on its schema:
829+
830+
```sql
831+
GRANT USAGE ON SCHEMA myschema TO cnpg_metrics_exporter;
832+
GRANT SELECT ON TABLE myschema.mytable TO cnpg_metrics_exporter;
833+
```
834+
760835
## Monitoring the CloudNativePG operator
761836
762837
The operator internally exposes [Prometheus](https://prometheus.io/) metrics

website/versioned_docs/version-1.29/operator_conf.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ Add `--pprof-server=true` to the args list, for example:
169169
containers:
170170
- args:
171171
- controller
172-
- --enable-leader-election
172+
- --leader-elect
173173
- --config-map-name=cnpg-controller-manager-config
174174
- --secret-name=cnpg-controller-manager-config
175175
- --log-level=info

0 commit comments

Comments
 (0)