Fix: gke-deploy init container image updates and PDB maxUnavailable readiness#1117
Open
64johnlee wants to merge 7 commits into
Open
Fix: gke-deploy init container image updates and PDB maxUnavailable readiness#111764johnlee wants to merge 7 commits into
64johnlee wants to merge 7 commits into
Conversation
The google-cloud-sdk installer requires the python command to be available. The Dockerfile.appengine was installing python3 but not the python package, which is needed as a symlink/alias for SDK compatibility. Fixes GoogleCloudPlatform#1056 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The docker builder was installing both docker-compose (v1.29.2, legacy) and docker-compose-plugin (v2.x+, modern), causing version conflicts and unpredictable behavior. The legacy v1 package is no longer maintained and should not be used. This change removes docker-compose and keeps only docker-compose-plugin, which is actively maintained by Docker and compatible with all supported Docker versions (19.03, 20.10, 24.0). Fixes GoogleCloudPlatform#1042 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The gcr.io/cloud-builders/go builder is no longer actively maintained and does not support current Go versions. This adds a prominent deprecation notice at the top of the README directing users to migrate to the official golang image, which is actively maintained by the Docker community. Fixes GoogleCloudPlatform#1067 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When deploying Kubernetes resources with custom API groups (e.g., Traefik CRDs with apiVersion: traefik.io/v1alpha1), gke-deploy was only using the Kind field and discarding the API group. This caused kubectl to resolve to the server's preferred version instead of the declared version, resulting in deployment verification failures for custom resources. Changes: - Added ObjectGroupVersionKind() function in resource.go that returns the full "kind.group" format needed for kubectl commands - Updated deployer.Apply() to use ObjectGroupVersionKind when calling kubectl for custom resources with API groups - Core API resources (without groups) continue to work unchanged Fixes GoogleCloudPlatform#962 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes a bug where gke-deploy would only apply Namespace objects if they didn't exist. After initial creation, any updates to namespace labels or annotations were silently ignored. Root cause: The kubectl apply call was inside an 'if !exists' conditional block, preventing updates to existing namespaces. This broke GitOps workflows where namespace metadata evolves over time. Solution: Always call kubectl apply for namespace manifests, not just on creation. The warning message for missing namespaces is preserved. kubectl apply's idempotent behavior and strategic merge patching automatically handles: - Creating missing namespaces - Updating labels/annotations on existing namespaces - Preserving other namespace fields (quotas, network policies, etc) Fixes GoogleCloudPlatform#873 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…orkspace isolation Addresses issue GoogleCloudPlatform#927 by modernizing the Bazel builder to support current Bazel versions and fix workspace/permission issues. Key improvements: 1. **Base OS Upgrade**: Ubuntu 20.04 → 22.04 LTS - Longer support window (standard support until April 2026) - More modern tooling and security patches - Better compatibility with current build tools 2. **Java Version Upgrade**: OpenJDK 8 → OpenJDK 11 - Java 8 is EOL, no longer receiving security updates - Bazel 7.x+ works better with Java 11+ - Required for modern Bazel compatibility 3. **Installation Method**: Deprecated APT repo → GitHub binary releases - Old APT repository (jdk1.8) is no longer reliable - Binary releases from GitHub are the official distribution method - Fixes dependency on outdated package repository 4. **Dynamic Workspace Configuration**: Static ~/.bazelrc → Dynamic /tmp/.bazelrc - Eliminates permission conflicts when Bazel runs in different contexts - Properly handles workspace isolation across multiple Cloud Build steps - Fixes issues where bazel-bin/bazel-genfiles symlinks become inaccessible - Supports BAZEL_OUTPUT_BASE and BAZEL_HOME environment variables 5. **Improved HOME Directory Handling** - Sets HOME=/tmp to avoid conflicts with root's home directory - Prevents user-level .bazelrc from interfering - Uses --nohome_rc flag for explicit control Changes: - bazel/Dockerfile (48 lines): modernized base image, updated Java and Bazel install - bazel/bazel.sh (78 lines): added setup_bazelrc() and setup_workspace() functions Backward Compatibility: - Bazel command line syntax unchanged (wrapper is transparent) - Existing Bazel versions (5.4.0+) still supported - Invocation UUID output format preserved - Examples and test scripts continue to work Fixes GoogleCloudPlatform#927 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eadiness Two related fixes to Kubernetes resource handling in gke-deploy: **Issue GoogleCloudPlatform#573: Init container images not updated** UpdateMatchingContainerImage only iterated spec.containers, completely ignoring spec.initContainers. Init containers require separate image pulling and are commonly used for setup tasks (migrations, config fetching) whose images need to be updated alongside regular containers. Fix: Restructure the loop to iterate both containers and initContainers field paths for each workload kind (Deployment, DaemonSet, Job, Pod, CronJob, etc.). **Issue GoogleCloudPlatform#725: PodDisruptionBudget ignores spec.maxUnavailable** podDisruptionBudgetIsReady assumed desiredHealthy > 0, which is only true for PDBs using spec.minAvailable. PDBs using spec.maxUnavailable (supported since Kubernetes 1.17) can have desiredHealthy == 0, causing the readiness check to always return false and block deployments indefinitely. Fix: Detect the minAvailable vs maxUnavailable path by checking desiredHealthy. When desiredHealthy > 0 use the existing currentHealthy >= desiredHealthy check. When desiredHealthy == 0 (maxUnavailable mode) confirm readiness by checking that status.disruptionsAllowed is present, which the API server populates once the PDB is accepted and reconciled. Fixes GoogleCloudPlatform#573 Fixes GoogleCloudPlatform#725 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related fixes to Kubernetes resource handling in gke-deploy, covering issues that block production deployments.
Fix 1: Init Container Images Not Updated (Fixes #573)
Problem
UpdateMatchingContainerImageiterated onlyspec.containers, ignoringspec.initContainersentirely. Init containers have separate image fields and are commonly used for pre-start setup tasks (DB migrations, config fetching, secret injection) whose images need to be updated alongside regular containers.Root Cause
Fix
Restructure the loop to iterate both field paths for each workload kind:
Covers all affected kinds: Deployment, DaemonSet, Job, ReplicaSet, ReplicationController, StatefulSet, CronJob, Pod
Fix 2: PodDisruptionBudget Ignores spec.maxUnavailable (Fixes #725)
Problem
podDisruptionBudgetIsReadyassumedstatus.desiredHealthy > 0, which is only true for PDBs usingspec.minAvailable. Since Kubernetes 1.17, PDBs can usespec.maxUnavailableinstead, which results indesiredHealthy == 0— causing the readiness check to always return false and block deployments indefinitely.Root Cause
Fix
Detect the minAvailable vs maxUnavailable path:
desiredHealthy > 0→ minAvailable mode → checkcurrentHealthy >= desiredHealthydesiredHealthy == 0→ maxUnavailable mode → confirm readiness by presence ofstatus.disruptionsAllowed, which the API server populates once the PDB is accepted and reconciledFiles Changed
gke-deploy/core/resource/resource.go— init container field paths added to UpdateMatchingContainerImagegke-deploy/core/resource/ready.go— PDB readiness logic split into minAvailable / maxUnavailable pathsBackward Compatibility
Both fixes are purely additive:
🤖 Generated with Claude Code