Skip to content

Commit 52e62b5

Browse files
gustavoliraclaude
andauthored
feat(ci): parallelize base and RBAC helm deployments (#4679)
* feat(ci): parallelize base and RBAC helm deployments Run base_deployment and rbac_deployment concurrently in initiate_deployments() and initiate_deployments_osd_gcp(). Both target disjoint namespaces (NAME_SPACE vs NAME_SPACE_RBAC + NAME_SPACE_POSTGRES_DB), so there is no resource conflict. Overlapping the two helm upgrades and their internal readiness waits saves ~3-5 min of wall-clock time on CI. The _run_parallel_deployments helper captures each background job's exit code via `wait "$pid" || rc=$?` and returns 1 if either fails, so failures always propagate and are individually reported in the logs. Tests continue to run sequentially downstream (parallel Playwright would OOM given CI memory limits). Resolves: RHIDP-12296 * refactor(ci): add defensive checks and improved logging to parallel deployments Address code review findings from PR #4679: 1. **Namespace validation**: Add defensive check to ensure NAME_SPACE and NAME_SPACE_RBAC are different and non-empty before starting parallel deployments. Prevents subtle race conditions if misconfigured. 2. **Improved logging**: - Log both namespace names at the start for debugging - Add synchronization point logs ("Waiting for parallel deployments..." and "Parallel deployments finished") to make log flow clearer when output from both background jobs is interleaved 3. **Documentation**: Add "Requires" section to function header documenting the namespace disjointness requirement These changes improve debuggability and fail-fast behavior without changing the core parallelization logic. * fix(ci): eliminate race conditions for parallel deployments Address 3 critical race conditions identified by Qodo code review: **1. Shared temp files in helm::merge_values (Bug #1)** - BEFORE: Used fixed paths /tmp/step-without-plugins.yaml and /tmp/step-only-plugins.yaml - AFTER: Generate unique temp files per invocation via mktemp - WHY: Concurrent helm::merge_values calls would overwrite each other's intermediate files, corrupting merged values **2. Kubeconfig global state mutation (Bug #2)** - BEFORE: namespace::configure and apply_yaml_files both ran `oc config set-context --current --namespace=...` - AFTER: Removed all oc config set-context calls; added explicit --namespace flag to all oc/kubectl apply commands that lacked it - WHY: Parallel deployments racing on current context would apply resources to nondeterministic namespaces **3. In-place YAML file edits (Bug #3)** - BEFORE: apply_yaml_files used sed -i to modify shared YAML templates directly in the repo - AFTER: Copy templates to per-deployment tmpdir, patch copies, apply from tmpdir - WHY: Concurrent sed -i edits would clobber each other and apply manifests with wrong namespace fields All three bugs would have caused nondeterministic deployment failures when base_deployment and rbac_deployment run in parallel. Changes: - .ci/pipelines/lib/helm.sh: mktemp + trap cleanup for intermediate files - .ci/pipelines/lib/namespace.sh: Remove oc config set-context - .ci/pipelines/utils.sh: Tmpdir for YAML patches, explicit --namespace on all oc apply calls * fix(ci): address code review findings for parallel deployments 1. postgres-cred.yaml in-place edit race: configure_external_postgres_db now patches a temp copy instead of editing the shared template via sed_inplace. Same class of bug fixed in apply_yaml_files. 2. Unquoted ${NAME_SPACE} in namespace::configure calls: quote both occurrences (lines 553, 721) to prevent word-splitting. 3. mktemp in helm::merge_values overwrite path: move mktemp + trap inside the "merge" branch so the "overwrite" path doesn't create and immediately discard two unused temp files. 4. Remove redundant "Waiting for parallel deployments" log::section that fires right before the blocking wait calls — the bookend sections are sufficient. * fix(ci): replace trap RETURN with explicit cleanup trap RETURN leaks to the calling function in bash < 4.4, causing unbound variable errors when step_1_file/tmpdir are expanded in the caller's scope under set -o nounset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1b57c01 commit 52e62b5

3 files changed

Lines changed: 98 additions & 23 deletions

File tree

.ci/pipelines/lib/helm.sh

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,6 @@ helm::merge_values() {
3232
local base_file=$2
3333
local diff_file=$3
3434
local final_file=$4
35-
local step_1_file="/tmp/step-without-plugins.yaml"
36-
local step_2_file="/tmp/step-only-plugins.yaml"
3735

3836
if [[ -z "$plugin_operation" || -z "$base_file" || -z "$diff_file" || -z "$final_file" ]]; then
3937
log::error "Missing required parameters"
@@ -42,6 +40,10 @@ helm::merge_values() {
4240
fi
4341

4442
if [[ "$plugin_operation" == "merge" ]]; then
43+
local step_1_file step_2_file
44+
step_1_file=$(mktemp "${TMPDIR:-/tmp}/helm-merge-step1-XXXXXX.yaml")
45+
step_2_file=$(mktemp "${TMPDIR:-/tmp}/helm-merge-step2-XXXXXX.yaml")
46+
4547
# Step 1: Merge files, excluding the .global.dynamic.plugins key
4648
# Values from `diff_file` override those in `base_file`
4749
yq eval-all '
@@ -62,6 +64,8 @@ helm::merge_values() {
6264
select(fileIndex == 0) * select(fileIndex == 1) | del(.. | select(. == null))
6365
' "${step_2_file}" "${step_1_file}" > "${final_file}"
6466

67+
rm -f "${step_1_file}" "${step_2_file}"
68+
6569
elif [[ "$plugin_operation" == "overwrite" ]]; then
6670
yq eval-all '
6771
select(fileIndex == 0) * select(fileIndex == 1)

.ci/pipelines/lib/namespace.sh

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -83,12 +83,15 @@ namespace::setup_image_pull_secret() {
8383
# ==============================================================================
8484

8585
# Function: namespace::configure
86-
# Description: Deletes and recreates a namespace, then sets it as current context
86+
# Description: Deletes and recreates a namespace
8787
# Arguments:
8888
# $1 - project: The namespace/project name
8989
# Returns:
9090
# 0 - Success
9191
# Exits on failure
92+
# Notes:
93+
# Does not set kubeconfig current context to support parallel deployments.
94+
# All downstream oc/kubectl commands must use explicit --namespace flag.
9295
namespace::configure() {
9396
local project=$1
9497
log::warn "Deleting and recreating namespace: $project"
@@ -98,10 +101,6 @@ namespace::configure() {
98101
log::error "Error: Failed to create namespace ${project}" >&2
99102
exit 1
100103
fi
101-
if ! oc config set-context --current --namespace="${project}"; then
102-
log::error "Error: Failed to set context for namespace ${project}" >&2
103-
exit 1
104-
fi
105104

106105
echo "Namespace ${project} is ready."
107106
return 0

.ci/pipelines/utils.sh

Lines changed: 88 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -321,28 +321,34 @@ apply_yaml_files() {
321321
local rhdh_base_url=$3
322322
log::info "Applying YAML files to namespace ${project}"
323323

324-
oc config set-context --current --namespace="${project}"
324+
# Create temporary directory for namespace-patched YAMLs (parallel-deployment safe)
325+
local tmpdir
326+
tmpdir=$(mktemp -d "${TMPDIR:-/tmp}/apply-yaml-${project}-XXXXXX")
325327

328+
# Copy and patch YAML files that need namespace substitution
326329
local files=(
327330
"$dir/resources/service_account/service-account-rhdh.yaml"
328331
"$dir/resources/cluster_role_binding/cluster-role-binding-k8s.yaml"
329332
"$dir/resources/cluster_role/cluster-role-k8s.yaml"
330333
)
331334

332335
for file in "${files[@]}"; do
333-
common::sed_inplace "s/namespace:.*/namespace: ${project}/g" "$file"
336+
local basename
337+
basename=$(basename "$file")
338+
sed "s/namespace:.*/namespace: ${project}/g" "$file" > "${tmpdir}/${basename}"
334339
done
335340

336341
DH_TARGET_URL=$(common::base64_encode "test-backstage-customization-provider-${project}.${K8S_CLUSTER_ROUTER_BASE}")
337342
RHDH_BASE_URL=$(common::base64_encode "$rhdh_base_url")
338343
RHDH_BASE_URL_HTTP=$(common::base64_encode "${rhdh_base_url/https/http}")
339344
export DH_TARGET_URL RHDH_BASE_URL RHDH_BASE_URL_HTTP
340345

341-
oc apply -f "$dir/resources/service_account/service-account-rhdh.yaml" --namespace="${project}"
346+
# Apply YAMLs from tmpdir (patched) or original location (no patch needed)
347+
oc apply -f "${tmpdir}/service-account-rhdh.yaml" --namespace="${project}"
342348
oc apply -f "$dir/auth/service-account-rhdh-secret.yaml" --namespace="${project}"
343349

344-
oc apply -f "$dir/resources/cluster_role/cluster-role-k8s.yaml" --namespace="${project}"
345-
oc apply -f "$dir/resources/cluster_role_binding/cluster-role-binding-k8s.yaml" --namespace="${project}"
350+
oc apply -f "${tmpdir}/cluster-role-k8s.yaml" --namespace="${project}"
351+
oc apply -f "${tmpdir}/cluster-role-binding-k8s.yaml" --namespace="${project}"
346352

347353
envsubst < "${DIR}/auth/secrets-rhdh-secrets.yaml" | oc apply --namespace="${project}" -f -
348354

@@ -370,19 +376,21 @@ apply_yaml_files() {
370376
# Tekton tests are not executed in showcase-k8s or showcase-rbac-k8s projects
371377
if [[ "$JOB_NAME" != *"aks"* && "$JOB_NAME" != *"eks"* && "$JOB_NAME" != *"gke"* ]]; then
372378
# Create Pipeline run for tekton test case.
373-
oc apply -f "$dir/resources/pipeline-run/hello-world-pipeline.yaml"
374-
oc apply -f "$dir/resources/pipeline-run/hello-world-pipeline-run.yaml"
379+
oc apply -f "$dir/resources/pipeline-run/hello-world-pipeline.yaml" --namespace="${project}"
380+
oc apply -f "$dir/resources/pipeline-run/hello-world-pipeline-run.yaml" --namespace="${project}"
375381

376382
# Create Deployment and Pipeline for Topology test.
377-
oc apply -f "$dir/resources/topology_test/topology-test.yaml"
383+
oc apply -f "$dir/resources/topology_test/topology-test.yaml" --namespace="${project}"
378384
if [[ -z "${IS_OPENSHIFT}" || "${IS_OPENSHIFT}" == "false" ]]; then
379-
kubectl apply -f "$dir/resources/topology_test/topology-test-ingress.yaml"
385+
kubectl apply -f "$dir/resources/topology_test/topology-test-ingress.yaml" --namespace="${project}"
380386
else
381-
oc apply -f "$dir/resources/topology_test/topology-test-route.yaml"
387+
oc apply -f "$dir/resources/topology_test/topology-test-route.yaml" --namespace="${project}"
382388
fi
383389
else
384390
log::info "Skipping Tekton Pipeline and Topology resources for K8s deployment (${JOB_NAME})"
385391
fi
392+
393+
rm -rf "${tmpdir}"
386394
}
387395

388396
deploy_test_backstage_customization_provider() {
@@ -564,7 +572,7 @@ base_deployment() {
564572
common::require_vars "RELEASE_NAME" "TAG_NAME" "IMAGE_REGISTRY" "IMAGE_REPO" "K8S_CLUSTER_ROUTER_BASE" || return 1
565573
local artifacts_subdir=$1
566574

567-
namespace::configure ${NAME_SPACE}
575+
namespace::configure "${NAME_SPACE}"
568576

569577
deploy_redis_cache "${NAME_SPACE}"
570578

@@ -654,21 +662,84 @@ rbac_deployment() {
654662
fi
655663
}
656664

665+
# Run base_deployment and rbac_deployment in parallel.
666+
# Both target disjoint namespaces (NAME_SPACE vs NAME_SPACE_RBAC + NAME_SPACE_POSTGRES_DB)
667+
# so there is no resource conflict. Overlapping the two helm upgrades and their
668+
# internal readiness waits saves ~3-5 min of wall-clock time on CI.
669+
#
670+
# Args:
671+
# $1 - base_fn: function name for the base deployment
672+
# $2 - base_arg: artifacts_subdir passed to base_fn
673+
# $3 - rbac_fn: function name for the RBAC deployment
674+
# $4 - rbac_arg: artifacts_subdir passed to rbac_fn
675+
#
676+
# Returns:
677+
# 0 if both deployments succeed
678+
# 1 if either (or both) fail — failures are always reported individually
679+
#
680+
# Requires:
681+
# NAME_SPACE and NAME_SPACE_RBAC must be different to avoid resource conflicts
682+
_run_parallel_deployments() {
683+
local base_fn=$1
684+
local base_arg=$2
685+
local rbac_fn=$3
686+
local rbac_arg=$4
687+
688+
# Validate that namespaces are disjoint to prevent race conditions
689+
if [[ "${NAME_SPACE:-}" == "${NAME_SPACE_RBAC:-}" ]] || [[ -z "${NAME_SPACE:-}" ]] || [[ -z "${NAME_SPACE_RBAC:-}" ]]; then
690+
log::error "NAME_SPACE ('${NAME_SPACE:-}') and NAME_SPACE_RBAC ('${NAME_SPACE_RBAC:-}') must be different and non-empty for parallel deployment"
691+
return 1
692+
fi
693+
694+
log::section "Starting parallel deployments: base + RBAC"
695+
log::info "Base namespace: ${NAME_SPACE}, RBAC namespace: ${NAME_SPACE_RBAC}"
696+
697+
"${base_fn}" "${base_arg}" &
698+
local base_pid=$!
699+
log::info "Base deployment started in background (PID: ${base_pid})"
700+
701+
"${rbac_fn}" "${rbac_arg}" &
702+
local rbac_pid=$!
703+
log::info "RBAC deployment started in background (PID: ${rbac_pid})"
704+
705+
local base_rc=0 rbac_rc=0
706+
wait "${base_pid}" || base_rc=$?
707+
wait "${rbac_pid}" || rbac_rc=$?
708+
log::section "Parallel deployments finished — evaluating results"
709+
710+
if [[ ${base_rc} -eq 0 ]]; then
711+
log::success "Base deployment completed"
712+
else
713+
log::error "Base deployment failed (exit code: ${base_rc})"
714+
fi
715+
if [[ ${rbac_rc} -eq 0 ]]; then
716+
log::success "RBAC deployment completed"
717+
else
718+
log::error "RBAC deployment failed (exit code: ${rbac_rc})"
719+
fi
720+
721+
if [[ ${base_rc} -ne 0 || ${rbac_rc} -ne 0 ]]; then
722+
return 1
723+
fi
724+
return 0
725+
}
726+
657727
initiate_deployments() {
658728
local base_artifacts_subdir=$1
659729
local rbac_artifacts_subdir=$2
660730

661731
cd "${DIR}"
662-
base_deployment "${base_artifacts_subdir}"
663-
rbac_deployment "${rbac_artifacts_subdir}"
732+
_run_parallel_deployments \
733+
base_deployment "${base_artifacts_subdir}" \
734+
rbac_deployment "${rbac_artifacts_subdir}"
664735
}
665736

666737
# OSD-GCP specific deployment functions that merge diff files and skip orchestrator workflows
667738
base_deployment_osd_gcp() {
668739
common::require_vars "RELEASE_NAME" "TAG_NAME" "IMAGE_REGISTRY" "IMAGE_REPO" "K8S_CLUSTER_ROUTER_BASE" || return 1
669740
local artifacts_subdir=$1
670741

671-
namespace::configure ${NAME_SPACE}
742+
namespace::configure "${NAME_SPACE}"
672743

673744
deploy_redis_cache "${NAME_SPACE}"
674745

@@ -727,8 +798,9 @@ initiate_deployments_osd_gcp() {
727798
local rbac_artifacts_subdir=$2
728799

729800
cd "${DIR}"
730-
base_deployment_osd_gcp "${base_artifacts_subdir}"
731-
rbac_deployment_osd_gcp "${rbac_artifacts_subdir}"
801+
_run_parallel_deployments \
802+
base_deployment_osd_gcp "${base_artifacts_subdir}" \
803+
rbac_deployment_osd_gcp "${rbac_artifacts_subdir}"
732804
}
733805

734806
# install base RHDH deployment before upgrade

0 commit comments

Comments
 (0)