Harden production promotion image copy#755
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughA GitHub Actions workflow now sets ChangesStaging-to-Production Promotion Workflow
Sequence Diagram(s)sequenceDiagram
participant GitHubActions
participant cpln
participant cpflow
participant cpln_workload
GitHubActions->>cpln: CPLN_UPSTREAM_TOKEN + cpln image get (verify staging image)
GitHubActions->>cpflow: cpflow copy-image-from-upstream (retry loop using COPY_IMAGE_RETRIES/COPY_IMAGE_RETRY_INTERVAL)
cpflow-->>GitHubActions: copy success or failure code
GitHubActions->>cpln_workload: cpln workload update with spec.containers.<name>.image (rollback)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Quick Review App CommandsWelcome! Here are the commands you can use in this PR:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c45e3ac6c8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| break | ||
| fi | ||
|
|
||
| copy_status=$? |
There was a problem hiding this comment.
Preserve the failed copy exit status before leaving the if
When cpflow copy-image-from-upstream fails, this assignment records the exit status of the preceding if compound command, not the failed cpflow command; in bash, an if whose then branch does not run exits with status 0. In the scenario where every image-copy attempt fails, copy_status is reset to 0 on each failure, the final error block is skipped, and the workflow proceeds to deploy-image even though the staging image was never copied.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR hardens the production promotion workflow by adding a staging image preflight check, a configurable retry loop around the image copy step, and a fix to the rollback logic that switches from unsupported array-index paths (
Confidence Score: 4/5Safe to merge; all three changes address documented real failures and the core copy/rollback logic is correct. The preflight check, retry loop, and rollback path fix are all well-formed. The only thing worth a second look is that the per-attempt warning is suppressed on the final retry, leaving a small observability gap in the logs — but this has no effect on correctness or production safety. No files require special attention beyond the minor logging gap on the last retry attempt in the copy step. Important Files Changed
Sequence DiagramsequenceDiagram
participant GH as GitHub Actions
participant CPLN_S as CPLN Staging API
participant CPFLOW as cpflow CLI
participant CPLN_P as CPLN Production API
GH->>CPLN_S: cpln image get STAGING_IMAGE (preflight)
alt Image not found
CPLN_S-->>GH: error → step fails early
else Image exists
CPLN_S-->>GH: 200 OK
loop Retry up to COPY_IMAGE_RETRIES times
GH->>CPFLOW: copy-image-from-upstream
CPFLOW->>CPLN_S: pull image
CPFLOW->>CPLN_P: push image
alt Copy succeeds
CPLN_P-->>GH: success → break
else "Copy fails (attempt < max)"
GH->>GH: warning + sleep COPY_IMAGE_RETRY_INTERVAL
end
end
alt All retries exhausted
GH->>GH: error + exit copy_status
else Copy succeeded
GH->>CPLN_P: deploy-image
alt Deploy healthy
CPLN_P-->>GH: health check pass
else Deploy unhealthy
GH->>CPLN_P: cpln workload update spec.containers.NAME.image (rollback)
GH->>GH: wait for rollback readiness
end
end
end
Reviews (1): Last reviewed commit: "Harden production promotion image copy" | Re-trigger Greptile |
| copy_status=$? | ||
| if [[ "${attempt}" -lt "${COPY_IMAGE_RETRIES}" ]]; then | ||
| echo "::warning::Image copy attempt ${attempt}/${COPY_IMAGE_RETRIES} failed with exit ${copy_status}; retrying in ${COPY_IMAGE_RETRY_INTERVAL}s." | ||
| sleep "${COPY_IMAGE_RETRY_INTERVAL}" | ||
| fi |
There was a problem hiding this comment.
The last failed attempt is never explicitly logged before the final error message. When all retries are exhausted the loop exits silently on the last attempt (the inner
if [[ "${attempt}" -lt "${COPY_IMAGE_RETRIES}" ]] guard skips both the warning and the sleep), so log consumers see only the summary error with no per-attempt context for the final try. Emitting the warning unconditionally and guarding only the sleep makes every failure visible while still skipping the redundant sleep.
| copy_status=$? | |
| if [[ "${attempt}" -lt "${COPY_IMAGE_RETRIES}" ]]; then | |
| echo "::warning::Image copy attempt ${attempt}/${COPY_IMAGE_RETRIES} failed with exit ${copy_status}; retrying in ${COPY_IMAGE_RETRY_INTERVAL}s." | |
| sleep "${COPY_IMAGE_RETRY_INTERVAL}" | |
| fi | |
| copy_status=$? | |
| echo "::warning::Image copy attempt ${attempt}/${COPY_IMAGE_RETRIES} failed with exit ${copy_status}." | |
| if [[ "${attempt}" -lt "${COPY_IMAGE_RETRIES}" ]]; then | |
| echo "Retrying in ${COPY_IMAGE_RETRY_INTERVAL}s..." | |
| sleep "${COPY_IMAGE_RETRY_INTERVAL}" | |
| fi |
Code ReviewOverall: Solid defensive hardening. The preflight check and retry logic are well-structured, and the rollback path fix is a clear correctness improvement. A few things worth addressing before merge: Correctness
Edge Cases
Minor
|
| set -euo pipefail | ||
| cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}" | ||
|
|
||
| CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln image get "${STAGING_IMAGE}" --org "${CPLN_ORG_STAGING}" -o json >/dev/null |
There was a problem hiding this comment.
The preflight failure has no custom error message — the only output will be whatever cpln prints. The copy-failure path (a few lines below) has a clear ::error:: annotation; suggest matching that here so the Actions log is immediately actionable:
| CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln image get "${STAGING_IMAGE}" --org "${CPLN_ORG_STAGING}" -o json >/dev/null | |
| CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln image get "${STAGING_IMAGE}" --org "${CPLN_ORG_STAGING}" -o json >/dev/null \ | |
| || { echo "::error::Staging image '${STAGING_IMAGE}' not found in org '${CPLN_ORG_STAGING}'; aborting promotion." >&2; exit 1; } |
| CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln image get "${STAGING_IMAGE}" --org "${CPLN_ORG_STAGING}" -o json >/dev/null | ||
|
|
||
| copy_status=1 | ||
| for attempt in $(seq 1 "${COPY_IMAGE_RETRIES}"); do |
There was a problem hiding this comment.
If COPY_IMAGE_RETRIES is set to 0, seq 1 0 produces no output and the loop body never runs. copy_status stays at 1 (its initial value) and the job exits with "after 0 attempt(s)" — misleading since no attempt was made. Consider clamping to a minimum of 1:
| for attempt in $(seq 1 "${COPY_IMAGE_RETRIES}"); do | |
| for attempt in $(seq 1 "$(( COPY_IMAGE_RETRIES > 0 ? COPY_IMAGE_RETRIES : 1 ))"); do |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/cpflow-promote-staging-to-production.yml:
- Around line 353-369: The retry loop is incorrectly capturing the exit status
of the entire if compound (so copy_status can stay 0); modify the loop around
the cpflow command used with COPY_IMAGE_RETRIES so you record the command's exit
code immediately after running cpflow copy-image-from-upstream (store it into
copy_status or a temp var), then test that recorded code to decide success (set
copy_status=0 and break on 0, otherwise keep the non‑zero value and
retry/sleep); update the references to copy_status in the subsequent error/exit
logic accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b09ccd22-3cfc-4325-b009-ca8639ea68a6
📒 Files selected for processing (1)
.github/workflows/cpflow-promote-staging-to-production.yml
Code Review: Harden production promotion image copyOverviewThis PR makes three targeted improvements to the staging→production promotion workflow:
The motivation is well-justified by the linked failed production run, and each change is narrowly scoped. What works well
Issues / Suggestions1. Preflight failure is silent beyond
|
| set -euo pipefail | ||
| cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}" | ||
|
|
||
| CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln image get "${STAGING_IMAGE}" --org "${CPLN_ORG_STAGING}" -o json >/dev/null |
There was a problem hiding this comment.
The preflight is good, but if cpln image get fails the step exits immediately via set -e with only whatever cpln itself prints. Consider attaching an explicit ::error:: annotation so the failure is immediately obvious in the Actions summary without scrolling through raw cpln output:
| CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln image get "${STAGING_IMAGE}" --org "${CPLN_ORG_STAGING}" -o json >/dev/null | |
| CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln image get "${STAGING_IMAGE}" --org "${CPLN_ORG_STAGING}" -o json >/dev/null \ | |
| || { echo "::error::Staging image '${STAGING_IMAGE}' not found in org '${CPLN_ORG_STAGING}'; aborting promotion."; exit 1; } |
Minor UX improvement — not a blocker.
| copy_status=1 | ||
| for attempt in $(seq 1 "${COPY_IMAGE_RETRIES}"); do |
There was a problem hiding this comment.
Edge case: if COPY_IMAGE_RETRIES=0 (user misconfiguration), seq 1 0 yields nothing so the loop body never runs. copy_status stays at 1 and the step fails with "after 0 attempt(s)" — correct but confusing. A guard here makes the intent explicit:
| copy_status=1 | |
| for attempt in $(seq 1 "${COPY_IMAGE_RETRIES}"); do | |
| copy_status=1 | |
| [[ "${COPY_IMAGE_RETRIES}" -ge 1 ]] || { echo "::error::COPY_IMAGE_RETRIES must be >= 1 (got '${COPY_IMAGE_RETRIES}')."; exit 1; } | |
| for attempt in $(seq 1 "${COPY_IMAGE_RETRIES}"); do |
Low priority — the default of 3 protects the common case.
Code ReviewOverall: LGTM with minor nits. This is a solid reliability improvement that directly addresses a real production failure. The logic is correct, the shell scripting is careful, and the rollback fix is a meaningful simplification. What the PR does
Positives
Issues / suggestionsMinor (line 383): The "no attempts remain" message uses Nit (lines 341–342): See inline comments for specifics. |
| echo "::warning::Image copy attempt ${attempt}/${copy_image_attempts} failed with exit ${copy_status}; retrying in ${copy_image_retry_interval}s." | ||
| sleep "${copy_image_retry_interval}" | ||
| else | ||
| echo "::warning::Image copy attempt ${attempt}/${copy_image_attempts} failed with exit ${copy_status}; no attempts remain." |
There was a problem hiding this comment.
The final retry message uses ::warning:: even though the copy has definitively failed at this point (no attempts remain). Consider upgrading to ::error:: so it's visually grouped with the failure annotation on line 388 rather than looking like one of the transient retry warnings above it.
| echo "::warning::Image copy attempt ${attempt}/${copy_image_attempts} failed with exit ${copy_status}; no attempts remain." | |
| echo "::error::Image copy attempt ${attempt}/${copy_image_attempts} failed with exit ${copy_status}; no attempts remain." |
| CPLN_TOKEN_STAGING: ${{ secrets.CPLN_TOKEN_STAGING }} | ||
| CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }} |
There was a problem hiding this comment.
Both env vars resolve to the same secret (secrets.CPLN_TOKEN_STAGING). The duplication is intentional – CPLN_TOKEN_STAGING is used explicitly for the cpln image get preflight (CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln ...), while CPLN_UPSTREAM_TOKEN is consumed by cpflow copy-image-from-upstream internally. A short comment here would make this less surprising for future readers:
| CPLN_TOKEN_STAGING: ${{ secrets.CPLN_TOKEN_STAGING }} | |
| CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }} | |
| # CPLN_TOKEN_STAGING: used explicitly for the staging preflight (cpln image get). | |
| # CPLN_UPSTREAM_TOKEN: consumed internally by cpflow copy-image-from-upstream. | |
| CPLN_TOKEN_STAGING: ${{ secrets.CPLN_TOKEN_STAGING }} | |
| CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }} |
✅ Review App DeletedReview app for PR #755 is deleted |
Summary
Context
Follow-up to failed production promotion run: https://github.com/shakacode/react-webpack-rails-tutorial/actions/runs/26808757348
Verification
Note
Medium Risk
Touches production promotion and rollback paths in CI; misconfiguration could still block promotion or leave rollback ineffective, but changes reduce transient copy failures and fix broken rollback updates.
Overview
Hardens the staging → production promotion workflow so image copy and rollback are more reliable.
The Copy image from staging step now validates
COPY_IMAGE_RETRIES/COPY_IMAGE_RETRY_INTERVAL, preflights the staging image withcpln image getbefore copying, and retriescpflow copy-image-from-upstreamwith warnings between attempts instead of failing on the first error.On rollback, workload updates now set images via
spec.containers.<container_name>.image(name-based--setpaths) instead of array-index paths that the CPLN CLI did not support correctly.Reviewed by Cursor Bugbot for commit 3fc0314. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit