feat(ci): parallelize base and RBAC helm deployments#4679
feat(ci): parallelize base and RBAC helm deployments#4679gustavolira wants to merge 5 commits intoredhat-developer:mainfrom
Conversation
Code Review by Qodo
1.
|
Review Summary by QodoParallelize base and RBAC helm deployments for CI optimization
WalkthroughsDescription• Parallelize base and RBAC helm deployments in CI pipelines • New _run_parallel_deployments helper runs deployments concurrently • Captures exit codes and propagates failures individually • Saves ~3-5 minutes of wall-clock time on CI Diagramflowchart LR
A["initiate_deployments<br/>initiate_deployments_osd_gcp"] -->|calls| B["_run_parallel_deployments"]
B -->|spawns| C["base_deployment<br/>base_deployment_osd_gcp"]
B -->|spawns| D["rbac_deployment<br/>rbac_deployment_osd_gcp"]
C -->|parallel| E["Helm upgrades<br/>+ readiness checks"]
D -->|parallel| E
E -->|waits for both| F["Return success/failure<br/>with individual reporting"]
File Changes1. .ci/pipelines/utils.sh
|
…eployments Address code review findings from PR redhat-developer#4679: 1. **Namespace validation**: Add defensive check to ensure NAME_SPACE and NAME_SPACE_RBAC are different and non-empty before starting parallel deployments. Prevents subtle race conditions if misconfigured. 2. **Improved logging**: - Log both namespace names at the start for debugging - Add synchronization point logs ("Waiting for parallel deployments..." and "Parallel deployments finished") to make log flow clearer when output from both background jobs is interleaved 3. **Documentation**: Add "Requires" section to function header documenting the namespace disjointness requirement These changes improve debuggability and fail-fast behavior without changing the core parallelization logic.
Run base_deployment and rbac_deployment concurrently in initiate_deployments() and initiate_deployments_osd_gcp(). Both target disjoint namespaces (NAME_SPACE vs NAME_SPACE_RBAC + NAME_SPACE_POSTGRES_DB), so there is no resource conflict. Overlapping the two helm upgrades and their internal readiness waits saves ~3-5 min of wall-clock time on CI. The _run_parallel_deployments helper captures each background job's exit code via `wait "$pid" || rc=$?` and returns 1 if either fails, so failures always propagate and are individually reported in the logs. Tests continue to run sequentially downstream (parallel Playwright would OOM given CI memory limits). Resolves: RHIDP-12296
|
/review |
…eployments Address code review findings from PR redhat-developer#4679: 1. **Namespace validation**: Add defensive check to ensure NAME_SPACE and NAME_SPACE_RBAC are different and non-empty before starting parallel deployments. Prevents subtle race conditions if misconfigured. 2. **Improved logging**: - Log both namespace names at the start for debugging - Add synchronization point logs ("Waiting for parallel deployments..." and "Parallel deployments finished") to make log flow clearer when output from both background jobs is interleaved 3. **Documentation**: Add "Requires" section to function header documenting the namespace disjointness requirement These changes improve debuggability and fail-fast behavior without changing the core parallelization logic.
Address 3 critical race conditions identified by Qodo code review: **1. Shared temp files in helm::merge_values (Bug #1)** - BEFORE: Used fixed paths /tmp/step-without-plugins.yaml and /tmp/step-only-plugins.yaml - AFTER: Generate unique temp files per invocation via mktemp - WHY: Concurrent helm::merge_values calls would overwrite each other's intermediate files, corrupting merged values **2. Kubeconfig global state mutation (Bug #2)** - BEFORE: namespace::configure and apply_yaml_files both ran `oc config set-context --current --namespace=...` - AFTER: Removed all oc config set-context calls; added explicit --namespace flag to all oc/kubectl apply commands that lacked it - WHY: Parallel deployments racing on current context would apply resources to nondeterministic namespaces **3. In-place YAML file edits (Bug redhat-developer#3)** - BEFORE: apply_yaml_files used sed -i to modify shared YAML templates directly in the repo - AFTER: Copy templates to per-deployment tmpdir, patch copies, apply from tmpdir - WHY: Concurrent sed -i edits would clobber each other and apply manifests with wrong namespace fields All three bugs would have caused nondeterministic deployment failures when base_deployment and rbac_deployment run in parallel. Changes: - .ci/pipelines/lib/helm.sh: mktemp + trap cleanup for intermediate files - .ci/pipelines/lib/namespace.sh: Remove oc config set-context - .ci/pipelines/utils.sh: Tmpdir for YAML patches, explicit --namespace on all oc apply calls
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
1. postgres-cred.yaml in-place edit race: configure_external_postgres_db
now patches a temp copy instead of editing the shared template via
sed_inplace. Same class of bug fixed in apply_yaml_files.
2. Unquoted ${NAME_SPACE} in namespace::configure calls: quote both
occurrences (lines 553, 721) to prevent word-splitting.
3. mktemp in helm::merge_values overwrite path: move mktemp + trap
inside the "merge" branch so the "overwrite" path doesn't create
and immediately discard two unused temp files.
4. Remove redundant "Waiting for parallel deployments" log::section
that fires right before the blocking wait calls — the bookend
sections are sufficient.
a7771f4 to
9403307
Compare
trap RETURN leaks to the calling function in bash < 4.4, causing unbound variable errors when step_1_file/tmpdir are expanded in the caller's scope under set -o nounset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|



Summary
Run
base_deploymentandrbac_deploymentconcurrently insideinitiate_deployments()andinitiate_deployments_osd_gcp(). Both target disjoint namespaces (NAME_SPACEvsNAME_SPACE_RBAC+NAME_SPACE_POSTGRES_DB) so there is no resource conflict, and overlapping the two helm upgrades plus their internal readiness waits saves ~3-5 min of wall-clock time on CI.Tests continue to run sequentially downstream — parallelizing Playwright on the same CI pod caused OOMKilled in prior experiments (see #4469 for context).
Design
Implementation notes
_run_parallel_deploymentscaptures each background job's exit code viawait "$pid" || rc=$?and returns1if either deployment fails, so failures always propagate and are individually reported in the logs. This addresses the exit-code propagation concern raised on the previous attempt (feat(ci): parallelize deployments and readiness checks for showcase and showcase-rbac #4469).ocp-pull.sh,ocp-nightly.sh) — the helper is a drop-in replacement inside the existinginitiate_deployments*functions.Background
This is a fresh, rebased, and narrowed version of #4469, which was stale, drifted significantly from
main, and mixed the deployment parallelization with test-parallelization changes that later had to be reverted. This PR ships only the safe, high-value part.Test plan
ocp-pull) runs both helm deployments in parallel and logs show overlappinghelm upgradeoutputocp-nightly) runs both deployments in parallel on OCP and OSD-GCP pathsResolves: RHIDP-12296
🤖 Generated with Claude Code