Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions charts/cluster/docs/Getting Started.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ helm upgrade --install cnpg \
## Creating a cluster configuration

Once you have the operator installed, the next step is to prepare the cluster configuration. Whether this will be managed
via a GitOps solution or directly via Helm is up to you. The following sections outlines the important steps in both cases.
via a GitOps solution or directly via Helm is up to you. The following sections outline the important steps in both cases.

### Choosing the database type

Expand Down Expand Up @@ -88,15 +88,17 @@ There are several important cluster options. Here are the most important ones:
`cluster.affinity.topologyKey` - The chart sets it to `topology.kubernetes.io/zone` by default which is useful if you are
running a production cluster in a multi AZ cluster (highly recommended). If you are running a single AZ cluster, you may
want to change that to `kubernetes.io/hostname` to ensure that cluster instances are not provisioned on the same node.
`cluster.postgresql` - Allows you to override PostgreSQL configuration parameters example:
`cluster.postgresql.parameters` - Allows you to override PostgreSQL configuration parameters, for example:
```yaml
cluster:
postgresql:
max_connections: "200"
shared_buffers: "2GB"
parameters:
max_connections: "200"
shared_buffers: "2GB"
```
`cluster.initSQL` - Allows you to run custom SQL queries during the cluster initialization. This is useful for creating
extensions, schemas and databases. Note that these are as a superuser.
`cluster.initdb.postInitSQL` - Allows you to run custom SQL queries during cluster initialization. This is useful for creating
extensions, schemas, and databases. Use `cluster.initdb.postInitApplicationSQL` and `cluster.initdb.postInitTemplateSQL` when
you need application-database or template-database specific initialization.

For a full list - refer to the Helm chart [configuration options](../README.md#Configuration-options).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,26 +25,30 @@ The `CNPGClusterLogicalReplicationErrors` alert indicates that a logical replica
# Connect to the subscriber and check subscription status
kubectl exec -it svc/SUBSCRIBER-CLUSTER-rw -n NAMESPACE -- psql -c "
SELECT
subname,
subenabled,
apply_error_count,
sync_error_count,
stats_reset
FROM pg_stat_subscription
WHERE apply_error_count > 0 OR sync_error_count > 0;
s.subname,
s.subenabled,
COALESCE(sss.apply_error_count, 0) AS apply_error_count,
COALESCE(sss.sync_error_count, 0) AS sync_error_count,
sss.stats_reset
FROM pg_subscription s
LEFT JOIN pg_stat_subscription_stats sss ON s.oid = sss.subid
WHERE COALESCE(sss.apply_error_count, 0) > 0 OR COALESCE(sss.sync_error_count, 0) > 0;
"

# Check the last error message
kubectl exec -it svc/SUBSCRIBER-CLUSTER-rw -n NAMESPACE -- psql -c "
SELECT
subname,
last_msg_receipt_time,
latest_end_time,
s.subname,
ss.last_msg_receipt_time,
ss.latest_end_time,
CASE
WHEN apply_error_count > 0 THEN 'Apply errors detected'
WHEN sync_error_count > 0 THEN 'Sync errors detected'
WHEN COALESCE(sss.apply_error_count, 0) > 0 THEN 'Apply errors detected'
WHEN COALESCE(sss.sync_error_count, 0) > 0 THEN 'Sync errors detected'
ELSE 'No errors detected'
END as error_type
FROM pg_stat_subscription;
FROM pg_subscription s
LEFT JOIN pg_stat_subscription ss ON s.oid = ss.subid
LEFT JOIN pg_stat_subscription_stats sss ON s.oid = sss.subid;
"
```

Expand Down Expand Up @@ -96,21 +100,21 @@ FROM pg_publication;
kubectl exec -it svc/SUBSCRIBER-CLUSTER-rw -n NAMESPACE -- psql -c "
SELECT
subname,
srconninfo,
srschema,
srslotname,
srsynccommit
subconninfo,
subslotname,
subsynccommit,
subpublications
FROM pg_subscription;
"

# Check which tables are being replicated
kubectl exec -it svc/SUBSCRIBER-CLUSTER-rw -n NAMESPACE -- psql -c "
SELECT
relid::regclass as table_name,
srsubstate as state
FROM pg_subscription_rel
JOIN pg_class ON relid = oid
WHERE srsubstate NOT IN ('r', 's'); -- Not ready or synchronizing
sr.srrelid::regclass as table_name,
sr.srsubstate as state
FROM pg_subscription_rel sr
JOIN pg_class c ON sr.srrelid = c.oid
WHERE sr.srsubstate NOT IN ('r', 's'); -- Not ready or synchronizing
"
```

Expand Down Expand Up @@ -378,4 +382,4 @@ ALTER TABLE table_name ENABLE TRIGGER trigger_name;
- You encounter frequent constraint violations
- The schema cannot be synchronized
- You need to skip transactions repeatedly
- Error rate is increasing despite fixes
- Error rate is increasing despite fixes
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,19 @@ Connect to the subscriber and check the current state:
```bash
kubectl exec -it svc/SUBSCRIBER-CLUSTER-rw -n NAMESPACE -- psql -c "
SELECT
subname,
enabled,
EXTRACT(EPOCH FROM (NOW() - last_msg_receipt_time)) as receipt_lag_seconds,
EXTRACT(EPOCH FROM (NOW() - latest_end_time)) as apply_lag_seconds,
pg_wal_lsn_diff(received_lsn, latest_end_lsn) as pending_bytes,
s.subname,
s.subenabled AS enabled,
EXTRACT(EPOCH FROM (NOW() - ss.last_msg_receipt_time)) AS receipt_lag_seconds,
EXTRACT(EPOCH FROM (NOW() - ss.latest_end_time)) AS apply_lag_seconds,
COALESCE(pg_wal_lsn_diff(ss.received_lsn, ss.latest_end_lsn), 0) AS pending_bytes,
CASE
WHEN EXTRACT(EPOCH FROM (NOW() - last_msg_receipt_time)) > 60 THEN 'High receipt lag'
WHEN EXTRACT(EPOCH FROM (NOW() - latest_end_time)) > 60 THEN 'High apply lag'
WHEN pg_wal_lsn_diff(received_lsn, latest_end_lsn) > 1024^3 THEN 'High LSN distance'
WHEN EXTRACT(EPOCH FROM (NOW() - ss.last_msg_receipt_time)) > 60 THEN 'High receipt lag'
WHEN EXTRACT(EPOCH FROM (NOW() - ss.latest_end_time)) > 60 THEN 'High apply lag'
WHEN COALESCE(pg_wal_lsn_diff(ss.received_lsn, ss.latest_end_lsn), 0) > 1024^3 THEN 'High LSN distance'
ELSE 'Healthy'
END as primary_issue
FROM pg_stat_subscription;
FROM pg_subscription s
LEFT JOIN pg_stat_subscription ss ON s.oid = ss.subid;
"
```

Expand Down Expand Up @@ -230,4 +232,4 @@ kubectl exec -it svc/SUBSCRIBER-CLUSTER-rw -n NAMESPACE -- psql -c "\dRs+"
- Lag continues to increase despite optimization
- Network issues persist between clusters
- Resource utilization is at maximum but lag continues
- You experience frequent replication failures
- You experience frequent replication failures
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,18 @@ The `CNPGClusterLogicalReplicationStopped` alert indicates that a logical replic
# Check all subscriptions and their status
kubectl exec -it svc/SUBSCRIBER-CLUSTER-rw -n NAMESPACE -- psql -c "
SELECT
pg_subscription.subname,
pg_subscription.enabled,
s.subname,
s.subenabled AS enabled,
CASE
WHEN pg_subscription.enabled = false THEN 'Explicitly disabled'
WHEN pid IS NULL AND buffered_lag_bytes > 0 THEN 'Stuck (no worker)'
WHEN pid IS NOT NULL THEN 'Active'
WHEN NOT s.subenabled THEN 'Explicitly disabled'
WHEN ss.pid IS NULL AND COALESCE(pg_wal_lsn_diff(ss.received_lsn, ss.latest_end_lsn), 0) > 0 THEN 'Stuck (no worker)'
WHEN ss.pid IS NOT NULL THEN 'Active'
ELSE 'Unknown'
END as status,
pg_wal_lsn_diff(received_lsn, latest_end_lsn) as pending_bytes,
pid IS NOT NULL as has_worker
FROM pg_subscription
LEFT JOIN pg_stat_subscription ON pg_subscription.oid = pg_stat_subscription.subid;
COALESCE(pg_wal_lsn_diff(ss.received_lsn, ss.latest_end_lsn), 0) AS pending_bytes,
ss.pid IS NOT NULL AS has_worker
FROM pg_subscription s
LEFT JOIN pg_stat_subscription ss ON s.oid = ss.subid;
"
```

Expand All @@ -63,10 +63,10 @@ WHERE application_name LIKE '%subscription%' OR backend_type = 'logical replicat
kubectl exec -it svc/SUBSCRIBER-CLUSTER-rw -n NAMESPACE -- psql -c "
SELECT
subname,
srconninfo,
srsynccommit,
srslotname,
srsyncstate as sync_state
subconninfo,
subsynccommit,
subslotname,
subpublications
FROM pg_subscription;
"
```
Expand All @@ -86,7 +86,7 @@ kubectl logs -n NAMESPACE $POD --tail=200 | grep -i "subscription\|replication\|
```bash
# Extract connection info from subscription
kubectl exec -it svc/SUBSCRIBER-CLUSTER-rw -n NAMESPACE -- psql -c "
SELECT srconninfo FROM pg_subscription WHERE subname = 'your_subscription_name';
SELECT subconninfo FROM pg_subscription WHERE subname = 'your_subscription_name';
" | grep -o "host=[^ ]*" | cut -d= -f2

# Test connection
Expand Down Expand Up @@ -333,4 +333,4 @@ kubectl exec -it svc/CLUSTER-rw -n NS -- psql -c "SELECT * FROM pg_stat_activity
- Workers fail to start despite adequate resources
- WAL retention issues prevent catch-up
- Frequent disconnections occur
- Data cannot be resynchronized successfully
- Data cannot be resynchronized successfully
4 changes: 2 additions & 2 deletions charts/cluster/templates/console-statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,10 @@ spec:
apt install -y screen curl wget jq unzip gzip nano vim util-linux less htop
cat <<EOF > /root/.bashrc
echo -e "\nHere are some examples for connecting and running queries on the cluster:"
echo ' nohup psql \$DB_SUPERUSER_URI"/DB_NAME" -c "SELECT 1;" 2>&1 > command.log &'
echo ' nohup psql "$DB_SUPERUSER_URI/<db-name>" -c "SELECT 1;" > command.log 2>&1 &'
echo -e "\nTo check up on the command, use:"
echo " tail -f command.log"
echo -e "\nYou can also use 'screen' for an interactive session. See https://github.com/paradedb/charts/blob/dev/charts/paradedb/docs/long-running-tasks.md for examples."
echo -e "\nYou can also use 'screen' for an interactive session. See https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/Console.md for examples."
echo -e "\n"
EOF
sleep infinity
Expand Down