Skip to content

K8SPSMDB-1602: support Workload Identity for GCS backup storage#2315

Merged
hors merged 19 commits into
percona:mainfrom
TineoC:feat/gcs-workload-identity
Jun 22, 2026
Merged

K8SPSMDB-1602: support Workload Identity for GCS backup storage#2315
hors merged 19 commits into
percona:mainfrom
TineoC:feat/gcs-workload-identity

Conversation

@TineoC

@TineoC TineoC commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add credentials.workloadIdentity field to BackupStorageGCSSpec enabling GCS backups via GKE Workload Identity without exported service account keys
  • Modify PBM's newGoogleClient to fall back to Application Default Credentials (ADC) when no explicit credentials are provided
  • Make credentialsSecret optional when workload identity is enabled

Motivation

GKE environments using Workload Identity (Google's recommended approach) currently cannot use GCS backups because:

  1. The operator always reads credentialsSecret and passes clientEmail/privateKey to PBM
  2. PBM's newGoogleClient requires these fields and errors with: "clientEmail and privateKey are required for GCS credentials"
  3. Even when the pod's KSA is federated to a GSA with roles/storage.objectUser, PBM never tries ADC

This is blocking for IL4/FedRAMP environments where exporting service account JSON keys is not permitted.

Closes #2314

Changes

Operator API (pkg/apis/psmdb/v1/psmdb_types.go)

  • Add GCSCredentials struct with WorkloadIdentity bool field
  • Add Credentials *GCSCredentials to BackupStorageGCSSpec
  • Make CredentialsSecret tag omitempty

Operator Backup Logic (pkg/psmdb/backup/pbm.go)

  • When credentials.workloadIdentity: true, skip reading credentialsSecret
  • PBM receives empty credentials, triggering ADC fallback

PBM GCS Client (pbm/storage/gcs/google_client.go)

  • When ClientEmail and PrivateKey are both empty, use storagegcs.NewClient(ctx) (ADC) instead of erroring
  • Preserves backward compatibility: explicit credentials still take priority

Usage

spec:
  backup:
    storages:
      gcs-backup:
        type: gcs
        gcs:
          bucket: my-backup-bucket
          prefix: mongodb-backup
          credentials:
            workloadIdentity: true

Test plan

  • Existing unit tests pass (go test ./pkg/...)
  • GCS backup with explicit credentialsSecret still works (backward compat)
  • GCS backup with credentials.workloadIdentity: true on GKE with WI succeeds
  • GCS backup with no credentials and no WI annotation fails with clear error

🤖 Generated with Claude Code

@CLAassistant

CLAassistant commented Apr 16, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@pull-request-size pull-request-size Bot added size/XXL 1000+ lines size/S 10-29 lines and removed size/XXL 1000+ lines labels Apr 16, 2026

@mayankshah1607 mayankshah1607 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your PR. I believe we also need to add E2E tests for this

Comment thread pkg/psmdb/backup/pbm.go Outdated
Comment on lines +505 to +509
// When WorkloadIdentity is enabled, skip credential secret loading entirely.
// PBM will use Application Default Credentials (ADC) provided by GKE Workload Identity.
useWorkloadIdentity := stg.GCS.Credentials != nil && stg.GCS.Credentials.WorkloadIdentity

if !useWorkloadIdentity && stg.GCS.CredentialsSecret != "" {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add coverage for this in pbm_test.go

Comment thread pkg/apis/psmdb/v1/psmdb_types.go Outdated
// When WorkloadIdentity is true, PBM uses Application Default Credentials (ADC)
// provided by GKE Workload Identity instead of a credentialsSecret.
type GCSCredentials struct {
WorkloadIdentity bool `json:"workloadIdentity,omitempty"`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to be embedded inside GCSCredentials object?

@egegunes egegunes changed the title feat(gcs): support Workload Identity for GCS backup storage K8SPSMDB-1602: support Workload Identity for GCS backup storage Apr 20, 2026

@hors hors left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TineoC, could you please check the comments?

@pull-request-size pull-request-size Bot added size/M 30-99 lines and removed size/S 10-29 lines labels Apr 30, 2026
@TineoC

TineoC commented Apr 30, 2026

Copy link
Copy Markdown
Contributor Author

Ready

@pull-request-size pull-request-size Bot added size/XXL 1000+ lines and removed size/M 30-99 lines labels May 1, 2026
Comment on lines +51 to +53
if err := r.disableBalancer(ctx, cluster); err != nil {
return status, errors.Wrap(err, "disable balancer")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the rationale behind this change? pbm already disables balancer before restore.

Comment thread coverage.html Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed?

- Add WorkloadIdentity bool field to BackupStorageGCSSpec
- Make CredentialsSecret omitempty (not required when using workload identity)
- Skip credential secret loading when WorkloadIdentity is true, allowing PBM to use GKE Application Default Credentials
- Add unit test for GCS workload identity storage config
- Update CRD schemas for all three resource types
@TineoC TineoC force-pushed the feat/gcs-workload-identity branch from 26c0506 to e547891 Compare May 4, 2026 06:49
@pull-request-size pull-request-size Bot added size/M 30-99 lines and removed size/XXL 1000+ lines labels May 4, 2026

@hors hors left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TineoC @egegunes, why do we need to have this new option "workloadIdentity: true"? Can we have a rule that if the user does not provide a secret, the operator must try WI? As I remember, we have such logic for AWS.

@TineoC

TineoC commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

@TineoC @egegunes, why do we need to have this new option "workloadIdentity: true"? Can we have a rule that if the user does not provide a secret, the operator must try WI? As I remember, we have such logic for AWS.

This approach assumes someone would be using Workload Identity, but what if they didn't set either credentialSecret or workloadIdentity?

The user would be able to create the database without problem, and the error wouldn't be caught until runtime.

I prefer that it fails fast to the user instead.

@egegunes egegunes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach assumes someone would be using Workload Identity, but what if they didn't set either credentialSecret or workloadIdentity?

The user would be able to create the database without problem, and the error wouldn't be caught until runtime.

I prefer that it fails fast to the user instead.

@TineoC we discussed this internally. although most of us think explicitly enabling workloadIdentity is a better design, we don't have this behavior for s3 storages for the same feature. I think consistency across features is worth preserving.

Could you please remove workloadIdentity field and check if credentialSecret is empty like we do for s3 storages?

@egegunes egegunes added this to the v1.23.0 milestone Jun 11, 2026
@egegunes

Copy link
Copy Markdown
Contributor

@TineoC friendly ping

TineoC added 2 commits June 16, 2026 08:38
- Remove explicit workloadIdentity field from API
- Follow AWS S3 pattern: empty credentialsSecret triggers ADC fallback (hors feedback)
- Remove workloadIdentity from all CRD YAMLs via make generate manifests
- Add E2E test: demand-backup-gcs-workload-identity (mayankshah1607 feedback)
- Keep PBM-side ADC fallback for when credentials are not provided
@TineoC TineoC dismissed stale reviews from gkech and mayankshah1607 via ff17396 June 16, 2026 12:45
egegunes
egegunes previously approved these changes Jun 17, 2026
gkech
gkech previously approved these changes Jun 17, 2026
pooknull
pooknull previously approved these changes Jun 17, 2026

@mayankshah1607 mayankshah1607 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this works, don't we need to set WorkloadIdentity: true in the PBM storage conf for the fallback to happen? @TineoC did you have a chance to test this?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request aims to enable GCS backups without providing a credentialsSecret (intended for GKE Workload Identity / ADC-based auth) by relaxing the Operator API + CRD requirements and adding unit/e2e coverage around “GCS with no explicit credentials”.

Changes:

  • Make spec.backup.storages[].gcs.credentialsSecret optional in the Go API types and in generated CRD schemas.
  • Add a unit test case asserting PBM storage config generation for GCS works with empty credentials (no secret).
  • Add a new e2e test scenario and wire it into PR/release e2e run lists.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/psmdb/backup/pbm_test.go Adds a unit test case for GCS config generation without credentials.
pkg/apis/psmdb/v1/psmdb_types.go Makes BackupStorageGCSSpec.CredentialsSecret omitempty (optional in JSON).
config/crd/bases/psmdb.percona.com_perconaservermongodbs.yaml Removes credentialsSecret from required fields for GCS storage in the main CRD base.
config/crd/bases/psmdb.percona.com_perconaservermongodbbackups.yaml Removes credentialsSecret from required fields for GCS storage in the Backup CRD base.
config/crd/bases/psmdb.percona.com_perconaservermongodbrestores.yaml Removes credentialsSecret from required fields for GCS storage in the Restore CRD base.
deploy/crd.yaml Regenerated CRD manifest reflecting credentialsSecret no longer required.
deploy/bundle.yaml Regenerated bundle reflecting credentialsSecret no longer required.
deploy/cw-bundle.yaml Regenerated cw-bundle reflecting credentialsSecret no longer required.
e2e-tests/version-service/conf/crd.yaml Updates version-service CRD copy to not require credentialsSecret.
e2e-tests/run-release.csv Adds the new WI/ADC GCS e2e test to release runs.
e2e-tests/run-pr.csv Adds the new WI/ADC GCS e2e test to PR runs.
e2e-tests/demand-backup-gcs-workload-identity/run New e2e test runner script for GCS WI/ADC flow.
e2e-tests/demand-backup-gcs-workload-identity/conf/some-name.yml New PSMDB cluster manifest for the WI/ADC e2e test.
e2e-tests/demand-backup-gcs-workload-identity/conf/restore.yml Restore manifest template used by the new e2e test.
e2e-tests/demand-backup-gcs-workload-identity/conf/backup-gcs-wi.yml Backup manifest template used by the new e2e test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1320 to 1326
type BackupStorageGCSSpec struct {
Bucket string `json:"bucket"`
Prefix string `json:"prefix,omitempty"`
CredentialsSecret string `json:"credentialsSecret"`
CredentialsSecret string `json:"credentialsSecret,omitempty"`
ChunkSize int `json:"chunkSize,omitempty"`
Retryer *GCSRetryer `json:"retryer,omitempty"`
}
Comment on lines 364 to 366
required:
- bucket
- credentialsSecret
type: object
Comment on lines +27 to +30
kubectl_bin annotate serviceaccount \
--namespace "${namespace}" \
"${namespace}-psmdb-db" \
"iam.gke.io/gcp-service-account=${GCS_WI_SERVICE_ACCOUNT}"
Comment on lines +15 to +20
storages:
gcs-wi:
type: gcs
gcs:
bucket: operator-testing
prefix: psmdb-demand-backup-gcs-wi
Comment on lines 100 to 102
required:
- bucket
- credentialsSecret
type: object
Comment on lines 142 to 144
required:
- bucket
- credentialsSecret
type: object
- Set WorkloadIdentity: true in GCS credentials when credentialsSecret
  is empty so PBM uses ADC instead of erroring
- Update unit test to expect WorkloadIdentity: true in the no-credentials
  GCS config
@TineoC TineoC dismissed stale reviews from pooknull, gkech, and egegunes via fa59213 June 17, 2026 09:35
@JNKPercona

Copy link
Copy Markdown
Collaborator
Test Name Result Time
arbiter passed 00:11:29
balancer passed 00:18:06
cert-management-policy passed 00:09:00
clustersync passed 00:14:31
cross-site-sharded passed 00:18:26
custom-replset-name passed 00:10:10
custom-tls passed 00:14:24
custom-users-roles passed 00:10:49
custom-users-roles-sharded passed 00:12:05
data-at-rest-encryption passed 00:13:14
data-sharded passed 00:20:25
demand-backup passed 00:22:13
demand-backup-eks-credentials-irsa passed 00:00:08
demand-backup-fs passed 00:23:33
demand-backup-if-unhealthy passed 00:08:11
demand-backup-incremental-aws passed 00:12:11
demand-backup-incremental-azure passed 00:11:57
demand-backup-incremental-gcp-native passed 00:11:12
demand-backup-incremental-gcp-s3 passed 00:11:22
demand-backup-incremental-minio passed 00:25:55
demand-backup-incremental-sharded-aws passed 00:18:07
demand-backup-incremental-sharded-azure passed 00:17:37
demand-backup-incremental-sharded-gcp-native passed 00:17:36
demand-backup-incremental-sharded-gcp-s3 passed 00:17:42
demand-backup-incremental-sharded-minio passed 00:27:47
demand-backup-logical-minio-native-tls passed 00:09:13
demand-backup-physical-parallel passed 00:08:56
demand-backup-physical-aws passed 00:12:39
demand-backup-physical-azure passed 00:12:39
demand-backup-physical-gcp-s3 passed 00:12:29
demand-backup-gcs-workload-identity passed 00:00:07
demand-backup-physical-gcp-native passed 00:12:19
demand-backup-physical-minio passed 00:21:28
demand-backup-physical-minio-native passed 00:26:55
demand-backup-physical-minio-native-tls passed 00:19:52
demand-backup-physical-sharded-parallel passed 00:11:30
demand-backup-physical-sharded-aws passed 00:18:20
demand-backup-physical-sharded-azure passed 00:18:31
demand-backup-physical-sharded-gcp-native passed 00:17:52
demand-backup-physical-sharded-minio passed 00:17:35
demand-backup-physical-sharded-minio-native passed 00:17:46
demand-backup-sharded passed 00:25:36
demand-backup-snapshot passed 00:39:59
demand-backup-snapshot-vault passed 00:18:31
disabled-auth passed 00:16:38
expose-sharded passed 00:34:31
finalizer passed 00:10:17
ignore-labels-annotations passed 00:08:08
init-deploy passed 00:13:47
ldap passed 00:09:15
ldap-tls passed 00:12:54
limits passed 00:06:42
liveness passed 00:09:10
mongod-major-upgrade passed 00:12:33
mongod-major-upgrade-sharded passed 00:21:23
monitoring-2-0 passed 00:25:19
monitoring-pmm3 passed 00:44:55
multi-cluster-service passed 00:14:20
multi-storage passed 00:19:45
non-voting-and-hidden passed 00:16:55
one-pod passed 00:07:55
operator-self-healing-chaos passed 00:13:25
pitr passed 00:38:15
pitr-physical passed 01:08:28
pitr-sharded passed 00:22:56
pitr-to-new-cluster passed 00:25:52
pitr-physical-backup-source passed 00:52:12
preinit-updates passed 00:05:25
pvc-auto-resize passed 00:14:54
pvc-resize passed 00:17:14
recover-no-primary passed 00:27:21
replset-overrides passed 00:19:24
replset-remapping passed 00:17:54
replset-remapping-sharded passed 00:17:16
rs-shard-migration passed 00:15:49
scaling passed 00:11:36
scheduled-backup passed 00:18:39
security-context passed 00:07:09
self-healing-chaos passed 00:15:32
service-per-pod passed 00:19:16
serviceless-external-nodes passed 00:07:46
smart-update passed 00:08:18
split-horizon passed 00:13:48
split-horizon-manual-tls passed 00:13:33
stable-resource-version passed 00:04:54
storage passed 00:07:36
tls-issue-cert-manager passed 00:30:42
unsafe-psa passed 00:08:01
upgrade passed 00:10:33
upgrade-consistency passed 00:08:03
upgrade-consistency-sharded-tls passed 01:01:50
upgrade-sharded passed 00:18:39
upgrade-partial-backup passed 00:16:32
users failure 00:10:37
users-vault passed 00:14:36
vector-search passed 00:00:07
vector-search-sharded passed 00:00:08
version-service passed 00:27:35
Summary Value
Tests Run 98/98
Job Duration 03:37:33
Total Test Time 27:57:41

commit: 13ff352
image: perconalab/percona-server-mongodb-operator:PR-2315-13ff35222

@hors hors merged commit 7386243 into percona:main Jun 22, 2026
2 checks passed
@hors

hors commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

@TineoC thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support GCS Workload Identity in CRD (credentials.workloadIdentity) for backup storage

9 participants