Skip to content

K8SPG-911: pg_tde improvements/fixes#1502

Closed
egegunes wants to merge 2 commits into
mainfrom
K8SPG-911
Closed

K8SPG-911: pg_tde improvements/fixes#1502
egegunes wants to merge 2 commits into
mainfrom
K8SPG-911

Conversation

@egegunes
Copy link
Copy Markdown
Contributor

CHANGE DESCRIPTION

This PR includes:

  • add pg version validation
  • explicitly disable wal encryption
  • enable pg_tde in restore job
  • [e2e] read from all pods after restore
  • use pg_tde binaries in patroni
  • fix vault provider change

All items except the last is straightforward. Fixing the vault provider change required a lot of changes.

The problem with changing the Vault token in pg_tde was that pg_tde requires both the new and the old token at the same time to perform the change. This is not trivial to achieve on K8s, since operator needs to mount the new secret to the pods and somehow needs the keep the old secret mounted.

To achieve this, operator performs provider change in two phases:

  1. In the first phase, operator keeps the old secret mounted in the pod and prevents restart. Then it fetches the new secret contents and stores them in temporary files in /pgdata directory. Then, operator runs pg_tde_change_global_key_provider_vault_v2.
  2. In the second phase, operator mounts the new secret and restarts the pods. Then it runs pg_tde_change_global_key_provider_vault_v2 with standard credential paths. At the end of this phase, temporary files are cleaned up.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

- add pg version validation
- explicitly disable wal encryption
- enable pg_tde in restore job
- [e2e] read from all pods after restore
- use pg_tde binaries in patroni
- fix vault provider change

All items except the last is straightforward. Fixing the vault provider
change, required a lot of changes.

The problem with changing the Vault token in pg_tde was that pg_tde
requires both the new and the old token at the same time to perform the
change. This is not trivial to achieve on K8s, since operator needs to
mount the new secret to the pods and somehow needs the keep the old
secret mounted.

To achieve this, operator performs provider change in two phases:

1. In the first phase, operator keeps the old secret mounted in the pod
   and prevents restart. Then it fetches the new secret contents and
   stores them in temporary files in `/pgdata` directory. Then, operator
   runs pg_tde_change_global_key_provider_vault_v2.
2. In the second phase, operator mounts the new secret and restarts the
   pods. Then it runs pg_tde_change_global_key_provider_vault_v2 with
   standard credential paths. At the end of this phase, temporary files
   are cleaned up.
@egegunes egegunes added this to the v2.9.0 milestone Mar 17, 2026
Copilot AI review requested due to automatic review settings March 17, 2026 10:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves pg_tde support across the operator by adding CRD-level Postgres version validation, enforcing required runtime parameters, enabling TDE support during restore and Patroni operations, and implementing a two-phase Vault credential rotation workflow that preserves old secrets long enough for pg_tde to perform provider changes safely.

Changes:

  • Add CRD validation that pg_tde can only be enabled on PostgreSQL 17+ and extend controller validation tests.
  • Ensure pg_tde runtime config is applied consistently (explicitly disable WAL encryption) and enable pg_tde during restore jobs and Patroni operations.
  • Implement and validate (via E2E steps) a two-phase Vault provider credential rotation using temporary files under /pgdata, plus logic to prevent premature pod restarts.

Reviewed changes

Copilot reviewed 30 out of 33 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go Adds CRD XValidation gating pg_tde enablement to PG17+.
percona/controller/pgcluster/controller_test.go Adds envtest coverage for pg_tde enablement/transition validation scenarios.
internal/postgres/reconcile.go Factors pg_tde projected secret volume creation into a reusable helper.
internal/pgtde/postgres.go Adds temp credential paths, parameter changes (wal_encrypt=off), and path-plumbing for Vault provider ops.
internal/pgtde/postgres_test.go Updates unit tests for the new mandatory parameter and Vault provider function signatures.
internal/pgbackrest/config.go Extends restore command generation to optionally preload pg_tde.
internal/pgbackrest/config_test.go Updates tests for restore command signature change.
internal/patroni/config.go Adds Patroni postgresql.bin_name overrides for pg_tde binaries when enabled.
internal/patroni/config_test.go Adds tests asserting bin_name presence/absence depending on pg_tde enablement.
internal/controller/postgrescluster/postgres.go Implements the two-phase Vault provider rotation + secret-to-temp-file plumbing and cleanup.
internal/controller/postgrescluster/instance.go Adds logic to preserve old pg_tde volume during phase-1 to prevent premature restarts + revision hashing helper.
internal/controller/postgrescluster/pgbackrest.go Enables pg_tde in restore jobs (command + mounts/volume for Vault secrets).
e2e-tests/tests/pg-tde/06-write-data.yaml Adds a write step before restore-related checks.
e2e-tests/tests/pg-tde/07-create-restore.yaml Adds restore CR creation step for the suite.
e2e-tests/tests/pg-tde/07-assert.yaml Updates assertions to wait for restore completion and cluster readiness.
e2e-tests/tests/pg-tde/08-read-data.yaml Reads restored data from primary and replicas and stores in ConfigMaps.
e2e-tests/tests/pg-tde/08-assert.yaml Asserts restored data reads from primary + replicas via ConfigMaps.
e2e-tests/tests/pg-tde/09-change-vault-provider.yaml Adds Vault token rotation and triggers provider change via CR update.
e2e-tests/tests/pg-tde/09-assert.yaml Updates assertions for provider-change outcomes and pg_tde mounts/revision.
e2e-tests/tests/pg-tde/10-verify-after-change.yaml Verifies provider change works and temp credential files are cleaned up.
e2e-tests/tests/pg-tde/10-assert.yaml Asserts data read after provider change.
e2e-tests/tests/pg-tde/11-disable-pgtde.yaml Adjusts disable step to use rotated Vault secret and adds a CHECKPOINT.
e2e-tests/tests/pg-tde/11-assert.yaml Adds assertions for disabled pg_tde + expected Vault secret projection.
e2e-tests/tests/pg-tde/12-remove-pgtde-config.yaml Removes pg_tde spec block as a final transition test.
e2e-tests/tests/pg-tde/12-assert.yaml Asserts operator state after removing pg_tde config.
deploy/cw-bundle.yaml Regenerates CRD bundle with the new XValidation rule.
deploy/crd.yaml Regenerates CRD bundle with the new XValidation rule.
deploy/bundle.yaml Regenerates CRD bundle with the new XValidation rule.
config/crd/bases/pgv2.percona.com_perconapgclusters.yaml Regenerates base CRD with the new XValidation rule.
build/crd/percona/generated/pgv2.percona.com_perconapgclusters.yaml Regenerates generated CRD with the new XValidation rule.

Comment on lines +576 to +579
var stdout, stderr bytes.Buffer
_ = podExec(ctx, pod.Namespace, pod.Name, container,
nil, &stdout, &stderr,
"bash", "-c", fmt.Sprintf("rm -f %s", path))
Comment on lines +190 to +192
if tdeEnabled {
ps.Add("shared_preload_libraries", "pg_tde")
}
@JNKPercona
Copy link
Copy Markdown
Collaborator

Test Name Result Time
backup-enable-disable passed 00:00:00
builtin-extensions passed 00:06:46
cert-manager-tls passed 00:00:00
custom-envs passed 00:00:00
custom-extensions passed 00:00:00
custom-tls passed 00:00:00
database-init-sql passed 00:00:00
demand-backup passed 00:00:00
demand-backup-offline-snapshot passed 00:00:00
dynamic-configuration passed 00:00:00
finalizers passed 00:00:00
init-deploy passed 00:00:00
huge-pages passed 00:00:00
ldap passed 00:00:00
ldap-tls passed 00:00:00
monitoring passed 00:00:00
monitoring-pmm3 passed 00:00:00
one-pod passed 00:00:00
operator-self-healing passed 00:00:00
pg-tde passed 00:00:00
pitr passed 00:00:00
scaling passed 00:00:00
scheduled-backup passed 00:00:00
self-healing passed 00:00:00
sidecars passed 00:00:00
standby-pgbackrest passed 00:00:00
standby-streaming passed 00:00:00
start-from-backup passed 00:00:00
tablespaces passed 00:00:00
telemetry-transfer passed 00:00:00
upgrade-consistency passed 00:00:00
upgrade-minor passed 00:00:00
users passed 00:00:00
Summary Value
Tests Run 33/33
Job Duration 00:23:02
Total Test Time 00:06:46

commit: ab319a1
image: perconalab/percona-postgresql-operator:PR-1502-ab319a176

@egegunes
Copy link
Copy Markdown
Contributor Author

This will be merged in another PR.

@egegunes egegunes closed this Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants