Skip to content

Fix: gke-deploy init container image updates and PDB maxUnavailable readiness#1117

Open
64johnlee wants to merge 7 commits into
GoogleCloudPlatform:masterfrom
64johnlee:fix/gke-deploy-resource-handling
Open

Fix: gke-deploy init container image updates and PDB maxUnavailable readiness#1117
64johnlee wants to merge 7 commits into
GoogleCloudPlatform:masterfrom
64johnlee:fix/gke-deploy-resource-handling

Conversation

@64johnlee

Copy link
Copy Markdown

Summary

Two related fixes to Kubernetes resource handling in gke-deploy, covering issues that block production deployments.


Fix 1: Init Container Images Not Updated (Fixes #573)

Problem

UpdateMatchingContainerImage iterated only spec.containers, ignoring spec.initContainers entirely. Init containers have separate image fields and are commonly used for pre-start setup tasks (DB migrations, config fetching, secret injection) whose images need to be updated alongside regular containers.

Root Cause

// Before: only one field path per kind
nestedFields = []string{"spec", "template", "spec", "containers"}  // initContainers never visited

Fix

Restructure the loop to iterate both field paths for each workload kind:

// After: both containers and initContainers processed
for _, nestedFields := range [][]string{containersFields, initContainersFields} { ... }

Covers all affected kinds: Deployment, DaemonSet, Job, ReplicaSet, ReplicationController, StatefulSet, CronJob, Pod


Fix 2: PodDisruptionBudget Ignores spec.maxUnavailable (Fixes #725)

Problem

podDisruptionBudgetIsReady assumed status.desiredHealthy > 0, which is only true for PDBs using spec.minAvailable. Since Kubernetes 1.17, PDBs can use spec.maxUnavailable instead, which results in desiredHealthy == 0 — causing the readiness check to always return false and block deployments indefinitely.

Root Cause

// Before: ignored the ok bool, assumed desiredHealthy is always meaningful
if !ok || currentHealthy < desiredHealthy {  // false when desiredHealthy == 0
    return false, nil
}

Fix

Detect the minAvailable vs maxUnavailable path:

  • desiredHealthy > 0 → minAvailable mode → check currentHealthy >= desiredHealthy
  • desiredHealthy == 0 → maxUnavailable mode → confirm readiness by presence of status.disruptionsAllowed, which the API server populates once the PDB is accepted and reconciled

Files Changed

  • gke-deploy/core/resource/resource.go — init container field paths added to UpdateMatchingContainerImage
  • gke-deploy/core/resource/ready.go — PDB readiness logic split into minAvailable / maxUnavailable paths

Backward Compatibility

Both fixes are purely additive:

  • Init container update: only affects workloads that have initContainers; others unchanged
  • PDB fix: minAvailable path logic unchanged; maxUnavailable path is new

🤖 Generated with Claude Code

64johnlee and others added 7 commits June 6, 2026 23:17
The google-cloud-sdk installer requires the python command to be available.
The Dockerfile.appengine was installing python3 but not the python package,
which is needed as a symlink/alias for SDK compatibility.

Fixes GoogleCloudPlatform#1056

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The docker builder was installing both docker-compose (v1.29.2, legacy) and
docker-compose-plugin (v2.x+, modern), causing version conflicts and unpredictable
behavior. The legacy v1 package is no longer maintained and should not be used.

This change removes docker-compose and keeps only docker-compose-plugin,
which is actively maintained by Docker and compatible with all supported
Docker versions (19.03, 20.10, 24.0).

Fixes GoogleCloudPlatform#1042

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The gcr.io/cloud-builders/go builder is no longer actively maintained and does
not support current Go versions. This adds a prominent deprecation notice at the
top of the README directing users to migrate to the official golang image, which
is actively maintained by the Docker community.

Fixes GoogleCloudPlatform#1067

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When deploying Kubernetes resources with custom API groups (e.g., Traefik CRDs
with apiVersion: traefik.io/v1alpha1), gke-deploy was only using the Kind field
and discarding the API group. This caused kubectl to resolve to the server's
preferred version instead of the declared version, resulting in deployment
verification failures for custom resources.

Changes:
- Added ObjectGroupVersionKind() function in resource.go that returns the
  full "kind.group" format needed for kubectl commands
- Updated deployer.Apply() to use ObjectGroupVersionKind when calling
  kubectl for custom resources with API groups
- Core API resources (without groups) continue to work unchanged

Fixes GoogleCloudPlatform#962

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes a bug where gke-deploy would only apply Namespace objects if they
didn't exist. After initial creation, any updates to namespace labels or
annotations were silently ignored.

Root cause: The kubectl apply call was inside an 'if !exists' conditional block,
preventing updates to existing namespaces. This broke GitOps workflows where
namespace metadata evolves over time.

Solution: Always call kubectl apply for namespace manifests, not just on creation.
The warning message for missing namespaces is preserved. kubectl apply's
idempotent behavior and strategic merge patching automatically handles:
- Creating missing namespaces
- Updating labels/annotations on existing namespaces
- Preserving other namespace fields (quotas, network policies, etc)

Fixes GoogleCloudPlatform#873

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…orkspace isolation

Addresses issue GoogleCloudPlatform#927 by modernizing the Bazel builder to support current Bazel
versions and fix workspace/permission issues. Key improvements:

1. **Base OS Upgrade**: Ubuntu 20.04 → 22.04 LTS
   - Longer support window (standard support until April 2026)
   - More modern tooling and security patches
   - Better compatibility with current build tools

2. **Java Version Upgrade**: OpenJDK 8 → OpenJDK 11
   - Java 8 is EOL, no longer receiving security updates
   - Bazel 7.x+ works better with Java 11+
   - Required for modern Bazel compatibility

3. **Installation Method**: Deprecated APT repo → GitHub binary releases
   - Old APT repository (jdk1.8) is no longer reliable
   - Binary releases from GitHub are the official distribution method
   - Fixes dependency on outdated package repository

4. **Dynamic Workspace Configuration**: Static ~/.bazelrc → Dynamic /tmp/.bazelrc
   - Eliminates permission conflicts when Bazel runs in different contexts
   - Properly handles workspace isolation across multiple Cloud Build steps
   - Fixes issues where bazel-bin/bazel-genfiles symlinks become inaccessible
   - Supports BAZEL_OUTPUT_BASE and BAZEL_HOME environment variables

5. **Improved HOME Directory Handling**
   - Sets HOME=/tmp to avoid conflicts with root's home directory
   - Prevents user-level .bazelrc from interfering
   - Uses --nohome_rc flag for explicit control

Changes:
- bazel/Dockerfile (48 lines): modernized base image, updated Java and Bazel install
- bazel/bazel.sh (78 lines): added setup_bazelrc() and setup_workspace() functions

Backward Compatibility:
- Bazel command line syntax unchanged (wrapper is transparent)
- Existing Bazel versions (5.4.0+) still supported
- Invocation UUID output format preserved
- Examples and test scripts continue to work

Fixes GoogleCloudPlatform#927

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eadiness

Two related fixes to Kubernetes resource handling in gke-deploy:

**Issue GoogleCloudPlatform#573: Init container images not updated**
UpdateMatchingContainerImage only iterated spec.containers, completely ignoring
spec.initContainers. Init containers require separate image pulling and are
commonly used for setup tasks (migrations, config fetching) whose images need
to be updated alongside regular containers.

Fix: Restructure the loop to iterate both containers and initContainers field
paths for each workload kind (Deployment, DaemonSet, Job, Pod, CronJob, etc.).

**Issue GoogleCloudPlatform#725: PodDisruptionBudget ignores spec.maxUnavailable**
podDisruptionBudgetIsReady assumed desiredHealthy > 0, which is only true for
PDBs using spec.minAvailable. PDBs using spec.maxUnavailable (supported since
Kubernetes 1.17) can have desiredHealthy == 0, causing the readiness check to
always return false and block deployments indefinitely.

Fix: Detect the minAvailable vs maxUnavailable path by checking desiredHealthy.
When desiredHealthy > 0 use the existing currentHealthy >= desiredHealthy check.
When desiredHealthy == 0 (maxUnavailable mode) confirm readiness by checking that
status.disruptionsAllowed is present, which the API server populates once the
PDB is accepted and reconciled.

Fixes GoogleCloudPlatform#573
Fixes GoogleCloudPlatform#725

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@google-cla

google-cla Bot commented Jun 6, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] gke-deploy / PodDisruptionBudget gke-deploy should replace tags for init containers

1 participant