Skip to content

fix(shield): add PSA-compatible securityContext to GKE allowlist waiter Job#2626

Open
EdwardArchive wants to merge 1 commit into
sysdiglabs:mainfrom
EdwardArchive:fix/shield-gke-allowlist-waiter-psa-2623
Open

fix(shield): add PSA-compatible securityContext to GKE allowlist waiter Job#2626
EdwardArchive wants to merge 1 commit into
sysdiglabs:mainfrom
EdwardArchive:fix/shield-gke-allowlist-waiter-psa-2623

Conversation

@EdwardArchive
Copy link
Copy Markdown
Contributor

Fixes #2623

Summary

The wait-for-allowlist Job (Helm pre-install,pre-upgrade hook added in shield 1.37.x) had no Pod or container securityContext, so it was rejected on any cluster enforcing the Kubernetes PodSecurity "restricted" profile or OpenShift restricted-v2 SCC. Because it's a Helm hook, admission failure aborts the whole install/upgrade — and the waiter is only used on GKE Autopilot, which itself enforces a hardened profile.

Reproduction (live, vanilla k8s 1.34)

kubectl create ns shield-psa-test
kubectl label ns shield-psa-test \\
  pod-security.kubernetes.io/enforce=restricted \\
  pod-security.kubernetes.io/enforce-version=latest

# Render unmodified 1.37.1 waiter Job and apply
helm template testrel charts/shield \\
  -n shield-psa-test \\
  -f charts/shield/tests/values/gke-autopilot.yaml \\
  --set gke_autopilot.allowlist_waiter.enabled=true \\
  | yq 'select(.metadata.name == \"testrel-shield-host-allowlist-waiter\")' \\
  | kubectl apply -n shield-psa-test -f -

Result:

Warning  FailedCreate  Error creating: pods \"...\" is forbidden: violates PodSecurity
\"restricted:latest\": allowPrivilegeEscalation != false (container
\"wait-for-allowlist\" must set securityContext.allowPrivilegeEscalation=false),
unrestricted capabilities (container must set securityContext.capabilities.drop=[\"ALL\"]),
runAsNonRoot != true, seccompProfile (must set to RuntimeDefault or Localhost)

The Job loops in FailedCreate forever — in Helm context, the install/upgrade hangs and ultimately fails.

After this patch the same manifest is admitted and the Pod runs (it exits non-zero only because the AllowlistSynchronizer CRD is not installed on the test cluster — expected, as the waiter only makes sense on GKE Autopilot).

Changes

  • templates/host/gke-allowlist-waiter-job.yaml — add Pod-level and container-level securityContext, both sourced from values.
  • values.yaml — new gke_autopilot.allowlist_waiter.pod_security_context and .security_context. Defaults satisfy the PodSecurity restricted profile and OpenShift restricted-v2 SCC:
    • Pod: runAsNonRoot: true, runAsUser/Group: 65532, seccompProfile.type: RuntimeDefault
    • Container: allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities.drop: [ALL]
  • tests/host/gke-allowlist-waiter-job_test.yaml — new test suite with 7 cases (default disabled, enabled rendering, pod context defaults, container context defaults, pod override, container override, hook annotations).
  • README.md — new rows for the two configuration keys.
  • Chart.yaml — bump to 1.37.2.

Why these defaults are safe

The container only executes kubectl wait against an AllowlistSynchronizer CRD. It does not need root, privilege escalation, Linux capabilities, or a writable rootfs. Users on permissive clusters can override either context via values if needed.

Test plan

  • helm unittest --strict -f tests/host/gke-allowlist-waiter-job_test.yaml charts/shield → 7/7 passed
  • Full chart test suite: helm unittest --strict -f tests/**/*_test.yaml charts/shield → 482/482 passed
  • Live reproduction on PSA Restricted namespace: before-fix FailedCreate loop, after-fix Scheduled and runs

Checklist

  • Title starts with type and scope (fix(shield):)
  • Chart version bumped (1.37.1 → 1.37.2)
  • Tests added in tests/ with _test suffix
  • README updated with new variables

🤖 Generated with Claude Code

…ob (sysdiglabs#2623)

The wait-for-allowlist Job had no Pod or container securityContext, so
the pre-install/pre-upgrade hook was rejected on any cluster enforcing
the Kubernetes PodSecurity "restricted" profile or OpenShift
restricted-v2 SCC. Because it is a Helm hook, admission failure aborts
the entire install/upgrade — and the waiter is only used on GKE
Autopilot, which itself enforces a hardened profile.

Reproduction on a vanilla k8s 1.34 namespace with
`pod-security.kubernetes.io/enforce=restricted`:

  Error creating: pods "...-allowlist-waiter-xxx" is forbidden:
  violates PodSecurity "restricted:latest": allowPrivilegeEscalation
  != false, unrestricted capabilities, runAsNonRoot != true,
  seccompProfile

After this fix the Pod is admitted and the script runs.

The Pod only executes `kubectl wait` against an AllowlistSynchronizer
CRD, so it does not need root, escalation, capabilities, or a writable
rootfs. Defaults are configurable via
.Values.gke_autopilot.allowlist_waiter.pod_security_context and
.security_context for users who need to relax them on legacy clusters.

Refs: sysdiglabs#2623
@EdwardArchive EdwardArchive requested a review from a team as a code owner May 13, 2026 03:15
@github-actions
Copy link
Copy Markdown
Contributor

Hi @EdwardArchive. Thanks for your PR.

After inspecting your changes someone with write access to this repo needs
to approve and run the workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] gke-allowlist-waiter Job has no securityContext; may be rejected by Restricted PSA / hardened OpenShift

2 participants