Skip to content

Use native Control Plane image copy in production promotion#756

Merged
justin808 merged 2 commits into
masterfrom
jg-codex/use-native-production-image-copy
Jun 2, 2026
Merged

Use native Control Plane image copy in production promotion#756
justin808 merged 2 commits into
masterfrom
jg-codex/use-native-production-image-copy

Conversation

@justin808

@justin808 justin808 commented Jun 2, 2026

Copy link
Copy Markdown
Member

Summary

  • replace the production promotion copy step with native cpln image copy
  • create an explicit temporary staging profile from CPLN_TOKEN_STAGING, authenticate Docker to the staging registry, and inspect the source manifest before copying
  • preserve retry handling, rollback behavior, and release summary output while reporting the copied production image tag

Why

Fresh runs after #754/#755 prove that CPLN_TOKEN_PRODUCTION is now visible and rollback works, but cpflow copy-image-from-upstream still fails opaquely during the staging registry pull. Control Plane docs recommend cpln image copy with --to-profile for cross-org copies with different credentials, which matches this promotion flow and gives Actions a clearer auth/copy boundary.

Observed failed runs:

Verification

  • git diff --check HEAD~1..HEAD
  • bin/conductor-exec bin/test-cpflow-github-flow

Note

Medium Risk
Changes only the production promotion workflow but alters cross-org registry auth, image naming, and what operators see as the deployed tag; a mis-copy or tag collision could block or mislabel production releases.

Overview
The Copy image from staging step no longer calls cpflow copy-image-from-upstream. It now builds a new production image name (next numeric tag under PRODUCTION_APP_NAME: plus the commit suffix from the staging tag), uses a throwaway staging cpln profile and docker manifest inspect on the staging registry ref, then copies with cpln image copy into the production org (--to-profile default). Retries and failure handling stay the same; the step exposes copy-image.outputs.image for downstream reporting.

The Promotion summary reports the copied production tag when the health check passes, and the previous production image when promotion fails—instead of echoing the staging image name as “deployed.”

Reviewed by Cursor Bugbot for commit 316555d. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • Chores
    • Improved production promotion workflow: stronger validation of staging images, automatic computation and assignment of the new production image tag, and more reliable image-copying with preserved retry behavior.
    • Ensures downstream steps receive the computed production image tag and cleans up temporary credentials/profiles created during promotion.

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 77ac868c-282b-4bca-8631-d08447883f9e

📥 Commits

Reviewing files that changed from the base of the PR and between 52f535e and 316555d.

📒 Files selected for processing (1)
  • .github/workflows/cpflow-promote-staging-to-production.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/cpflow-promote-staging-to-production.yml

Walkthrough

This PR refactors the production promotion workflow: the "Copy image from staging" step now validates staging tags, computes an incremented production image tag, authenticates with a temporary cpln upstream profile, copies the manifest into production with retries, and outputs the computed production image for reporting.

Changes

Production Image Promotion Flow

Layer / File(s) Summary
Image copy step environment and logic
.github/workflows/cpflow-promote-staging-to-production.yml
Environment now supplies only CPLN_TOKEN_STAGING. Copy logic validates and extracts _commit suffix from the staging image, queries production for the latest numeric tag, computes production_image as latest_number+1 plus staging_commit, creates a temporary cpln upstream profile scoped to the run, performs docker login and docker manifest inspect, and retries cpln image copy to the computed production tag until success. The step emits image=${production_image}.
Deployment summary reporting
.github/workflows/cpflow-promote-staging-to-production.yml
Promotion summary step now sources the deployed image from steps.copy-image.outputs.image (the computed production image) when the deployment health check succeeds.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Hop, hop—labels and tags I tally,
From staging suffix to production rally.
I count the latest, add one with cheer,
Make a temp profile, then copy it here.
The pipeline sings: the deployed image is clear.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Use native Control Plane image copy in production promotion' directly matches the main change—replacing the image copy mechanism with native Control Plane CLI commands in the production promotion workflow.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jg-codex/use-native-production-image-copy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

🚀 Quick Review App Commands

Welcome! Here are the commands you can use in this PR:
They require the repository to have cpflow review apps configured, including the CPLN_TOKEN_STAGING secret.

+review-app-deploy

Deploy your PR branch for testing.

+review-app-delete

Remove the review app when done.

+review-app-help

Show detailed instructions, environment setup, and configuration options.

Comment +review-app-help for full setup details.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52f535e099

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fi

latest_number="$(
cpln image query --org "${CPLN_ORG_PRODUCTION}" --prop "name~${PRODUCTION_APP_NAME}:" -o json |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Query all images before assigning the next tag

When the production org has more than the CLI's default result window of matching images, this query can miss the current highest numbered tag and compute a duplicate or stale production_image. I checked the Control Plane CLI common options, and query commands inherit --max with a default of 50; since production image retention is not configured in this repo, long-lived production orgs can exceed that and make promotions fail during cpln image copy (or deploy an unexpected older sequence). Add an explicit unbounded/large max when deriving latest_number.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/cpflow-promote-staging-to-production.yml:
- Line 390: Remove the argv-based secret by dropping the --token
"${CPLN_TOKEN_STAGING}" argument from the cpln profile create invocation and
instead set the CPLN_TOKEN environment variable to CPLN_TOKEN_STAGING for that
step (e.g., export or prefix the command with CPLN_TOKEN=${CPLN_TOKEN_STAGING})
so cpln profile create reads the token from CPLN_TOKEN rather than the command
line.
- Line 588: The workflow currently sets DEPLOYED_IMAGE to
steps.copy-image.outputs.image unconditionally, which misreports deployed state
after a failed promotion/rollback; change the workflow so DEPLOYED_IMAGE is set
only on the successful promotion path (e.g., when the promote step/job indicates
success) and on failure set DEPLOYED_IMAGE to the restored PREVIOUS_IMAGE (or
omit the deployed-image output) instead; update the assignment that references
DEPLOYED_IMAGE and steps.copy-image.outputs.image so it is gated by the
promotion success condition or overwritten with PREVIOUS_IMAGE in the
rollback/failure branch (refer to DEPLOYED_IMAGE,
steps.copy-image.outputs.image, and PREVIOUS_IMAGE to locate and adjust the
logic).
- Around line 391-392: The manifest probe using
`CPLN_PROFILE="${upstream_profile}" docker manifest inspect
"${source_image_ref}"` is outside the retry loop so transient registry/auth
errors can abort promotion before the copy retries run; move or duplicate the
`docker manifest inspect` step into the same retry loop used for the copy (use
the same `CPLN_PROFILE="${upstream_profile}"` context and exit-code handling) so
the manifest check is retried alongside the preserved-copy operations, ensuring
transient failures are recovered by the existing retry logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c888c9b-69c8-4ade-ae36-4e38152a6b64

📥 Commits

Reviewing files that changed from the base of the PR and between 8a2f155 and 52f535e.

📒 Files selected for processing (1)
  • .github/workflows/cpflow-promote-staging-to-production.yml

Comment thread .github/workflows/cpflow-promote-staging-to-production.yml Outdated
Comment thread .github/workflows/cpflow-promote-staging-to-production.yml Outdated
Comment thread .github/workflows/cpflow-promote-staging-to-production.yml Outdated
@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown

Greptile Summary

This PR replaces cpflow copy-image-from-upstream with a native cpln image copy call in the staging-to-production promotion workflow, adding explicit credential separation via a temporary cpln profile and a Docker manifest pre-flight check.

  • Creates a short-lived upstream-<run>-<attempt> cpln profile backed by CPLN_TOKEN_STAGING, authenticates Docker to the staging registry through it, inspects the source manifest, then runs cpln image copy with --profile/--to-profile default to keep staging and production credentials cleanly separated; the profile is deleted via an EXIT trap regardless of outcome.
  • Derives a new production image tag by querying the highest existing image number in the production org and incrementing it, then exposes the result as steps.copy-image.outputs.image; the Promotion summary step is updated to surface this production-side tag rather than the staging tag.
  • Adds a guard that validates the _<commit> suffix on the staging image name before constructing the destination tag; removes the now-unused CPLN_UPSTREAM_TOKEN env var.

Confidence Score: 4/5

Safe to merge; the cross-org credential separation is correctly structured and credential cleanup is handled by an EXIT trap. Two narrow edge cases with the --cleanup retry interaction and underscore-in-app-name extraction are unlikely to surface in normal operation.

The credential model is sound: staging token goes through a temporary profile, production uses the default profile set up earlier, and the trap ensures cleanup regardless of outcome. The --cleanup flag inside the retry loop relies on an undocumented assumption that cpln only deletes the source image after a successful copy — if that assumption is wrong, retries would silently fail with a different error. The ##*_ extraction also has a latent issue for app names with underscores, but CPLN app naming conventions make this a theoretical concern.

.github/workflows/cpflow-promote-staging-to-production.yml — specifically the retry loop around cpln image copy --cleanup and the staging_commit extraction logic.

Important Files Changed

Filename Overview
.github/workflows/cpflow-promote-staging-to-production.yml Replaces cpflow copy-image-from-upstream with cpln image copy; adds temporary staging profile creation, Docker login, manifest pre-flight, image number derivation, and correctly updates the DEPLOYED_IMAGE output reference to the new production image tag

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant CPLN_S as cpln (staging profile)
    participant DockerS as Docker (staging registry)
    participant CPLN_P as cpln (production/default profile)

    GHA->>CPLN_S: cpln image get staging image (verify exists)
    GHA->>CPLN_P: cpln image query production (get latest number)
    GHA->>CPLN_S: cpln profile create upstream-run-attempt
    GHA->>CPLN_S: cpln image docker-login --org staging
    CPLN_S-->>DockerS: writes Docker credentials
    GHA->>DockerS: docker manifest inspect source_image_ref (pre-flight)
    DockerS-->>GHA: manifest OK

    loop Up to copy_image_attempts
        GHA->>CPLN_S: cpln image copy STAGING_IMAGE --profile upstream --org staging
        CPLN_S->>CPLN_P: copy image cross-org to production --to-name production_image
        CPLN_P-->>GHA: success / failure
    end

    GHA->>CPLN_S: cpln profile delete upstream-run-attempt (trap EXIT)
    GHA->>GHA: "echo image=production_image >> GITHUB_OUTPUT"
Loading

Comments Outside Diff (1)

  1. .github/workflows/cpflow-promote-staging-to-production.yml, line 394-415 (link)

    P2 --cleanup flag semantics inside the retry loop

    --cleanup is passed on every retry attempt. If cpln image copy --cleanup deletes the source staging image after a failed attempt (rather than only after a successful one), attempts 2 and 3 would fail with "image not found" rather than the original copy error, making the retry logic silently ineffective. If the cpln CLI follows the common convention of only cleaning up on success, this is fine — but the current code has no comment documenting that assumption, and no local cpln docs confirm this behavior. If --cleanup can trigger on failure, the retry loop may deceptively surface a different error than the root cause.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "Use native Control Plane image copy in p..." | Re-trigger Greptile

Comment on lines +370 to +374
staging_commit="${STAGING_IMAGE##*_}"
if [[ "${staging_commit}" == "${STAGING_IMAGE}" || -z "${staging_commit}" ]]; then
echo "::error::Staging image '${STAGING_IMAGE}' does not include the expected '_<commit>' suffix."
exit 1
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 staging_commit extraction breaks for app names containing underscores

${STAGING_IMAGE##*_} strips the longest *_-prefixed match, meaning it operates on the entire string including the app-name portion before the colon. If the staging app name contains underscores (e.g., my_staging_app:3_abc1234), the extraction produces abc1234 (correct here), but for my_staging_app:latest it would produce app:latest — a value that passes the guard checks ("app:latest" != "my_staging_app:latest" and it is non-empty) but is an incorrect commit suffix embedded in production_image. CPLN app names conventionally use hyphens rather than underscores, so in practice this edge case is unlikely, but the extraction logic does not enforce this constraint.

Comment on lines +396 to +402
if cpln image copy "${STAGING_IMAGE}" \
--profile "${upstream_profile}" \
--org "${CPLN_ORG_STAGING}" \
--to-profile default \
--to-org "${CPLN_ORG_PRODUCTION}" \
--to-name "${production_image}" \
--cleanup; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Risk: --cleanup in a retry loop can destroy the source before retries complete.

If the first copy attempt starts but fails mid-transfer, --cleanup may delete the source staging image. Every subsequent retry will then fail with a "source not found" error rather than a transient network error, defeating the retry logic entirely.

Consider removing --cleanup from the retry invocation and adding an explicit cleanup after a confirmed success:

Suggested change
if cpln image copy "${STAGING_IMAGE}" \
--profile "${upstream_profile}" \
--org "${CPLN_ORG_STAGING}" \
--to-profile default \
--to-org "${CPLN_ORG_PRODUCTION}" \
--to-name "${production_image}" \
--cleanup; then
if cpln image copy "${STAGING_IMAGE}" \
--profile "${upstream_profile}" \
--org "${CPLN_ORG_STAGING}" \
--to-profile default \
--to-org "${CPLN_ORG_PRODUCTION}" \
--to-name "${production_image}"; then
copy_status=0
# Clean up source image only after confirmed success
cpln image delete "${STAGING_IMAGE}" --org "${CPLN_ORG_STAGING}" --profile "${upstream_profile}" >/dev/null 2>&1 || true
break

If --cleanup means something other than deleting the source (e.g., removing temp layers only), a brief comment explaining its semantics would help reviewers.

}
trap cleanup_upstream_profile EXIT

cpln profile create "${upstream_profile}" --token "${CPLN_TOKEN_STAGING}" >/dev/null

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential failure if a stale profile exists from a crashed prior attempt.

cpln profile create will fail if a profile with this name already exists. If a runner is killed before the trap cleanup_upstream_profile EXIT fires (e.g., a SIGKILL, timeout, or infrastructure failure), the next re-run of the same attempt will hit RUN_ID + RUN_ATTEMPT collision.

Add a pre-create delete to make this idempotent:

Suggested change
cpln profile create "${upstream_profile}" --token "${CPLN_TOKEN_STAGING}" >/dev/null
cpln profile delete "${upstream_profile}" >/dev/null 2>&1 || true
cpln profile create "${upstream_profile}" --token "${CPLN_TOKEN_STAGING}" >/dev/null

fi

latest_number="$(
cpln image query --org "${CPLN_ORG_PRODUCTION}" --prop "name~${PRODUCTION_APP_NAME}:" -o json |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRODUCTION_APP_NAME is interpolated directly into a regex filter.

The name~ operator likely interprets its value as a regular expression. If PRODUCTION_APP_NAME contains regex metacharacters (., +, [, etc.) the filter could match unintended images or cause a query error. Since you already have $prefix in the jq side for the exact startswith check, the server-side filter is just an optimistic pre-filter — consider using a literal prefix operator if the API supports one (e.g., name=^ or name~^${PRODUCTION_APP_NAME}:), or note that PRODUCTION_APP_NAME is expected to be alphanumeric-only.

@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR replaces cpflow copy-image-from-upstream with native cpln image copy for cross-org promotion, adds explicit production image tagging with a sequential counter, and introduces a run-scoped temporary staging profile for auth isolation. The approach is directionally correct and well-motivated by the observed opaque failures.

Positives

  • Removing the duplicate CPLN_UPSTREAM_TOKEN env var is a clean-up.
  • Run-scoped profile naming (upstream-${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}) avoids collisions across concurrent runs.
  • Pre-flight docker manifest inspect catches auth/availability failures before entering the retry loop.
  • Setting id: copy-image and emitting production_image (not the staging name) in the output is semantically correct — the summary now reflects what actually lives in production.
  • DEPLOYED_IMAGE switch from steps.staging-image.outputs.imagesteps.copy-image.outputs.image is the right fix for the summary accuracy issue.

Issues (see inline comments for details)

Severity Location Issue
Medium cpln image copy … --cleanup inside retry loop --cleanup may delete the source image on a partial first-attempt failure, making all retries fail with "source not found" instead of the original transient error.
Medium cpln profile create No pre-create delete: a stale profile from a killed runner (same RUN_ID+RUN_ATTEMPT) will cause profile create to fail and abort the step.
Low --prop "name~${PRODUCTION_APP_NAME}:" PRODUCTION_APP_NAME is interpolated into a regex filter; metacharacters could match unintended images. The jq startswith guard mitigates incorrect results, but a bad regex could still cause a query error.

Minor Notes

  • --to-profile default hardcodes the production-side profile name. This is fine if default is a firm convention here, but a comment or an env var reference would make it clearer to future maintainers.
  • The source_image_ref variable (used only for docker manifest inspect) is correctly separate from the cpln image copy invocation — no issue, but it reads a little ambiguously at first glance since inspect output is suppressed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 316555de33

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


cleanup_upstream_profile
CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln profile create "${upstream_profile}" >/dev/null
CPLN_PROFILE="${upstream_profile}" cpln image docker-login --org "${CPLN_ORG_STAGING}" >/dev/null

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Override CPLN_TOKEN when using the staging profile

In the normal promotion job, the earlier setup action leaves the production secret in CPLN_TOKEN for later steps, and Control Plane's auth docs say token precedence is --token, then CPLN_TOKEN, then the profile token (https://docs.controlplane.com/cli-reference/get-started/authentication#token-precedence). With separate staging/production credentials, setting only CPLN_PROFILE/--profile here still authenticates staging registry operations with the production token, so docker-login/image copy fails against CPLN_ORG_STAGING; unset CPLN_TOKEN or set it to CPLN_TOKEN_STAGING for the source-profile commands.

Useful? React with 👍 / 👎.

@claude

claude Bot commented Jun 2, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR replaces the opaque cpflow copy-image-from-upstream call with a multi-step sequence using cpln image copy, adding explicit staging-registry auth, commit-suffix validation, auto-incremented production tag computation, and a step output for the copied image. The motivation (upstream command failing silently) is sound, and the general approach is solid.

Positives

  • Trap-based cleanup of the temporary cpln profile is correct and robust — the pre-deletion before creation also handles stale profiles from previously crashed runs.
  • Commit-suffix validation (staging_commit check) adds a useful guard that surfaces naming-convention violations early.
  • Tag auto-increment via jq correctly handles first-ever promotions (max // 0) and non-conforming existing tags (try capture(...) catch empty).
  • Summary improvement — reporting the actual production tag on success and the rollback target on failure is clearer than echoing the staging image name.
  • Removing the redundant CPLN_UPSTREAM_TOKEN alias is a clean-up.

Issues

Medium — docker manifest inspect runs inside every retry iteration

The manifest inspect is used as a Docker-auth pre-flight check. However, if Docker auth is broken all retries will fail on the inspect before the copy is even attempted, making the retry loop useless for the common transient-network case. Moving the inspect (and the Docker login) outside the loop means broken auth is caught once and clearly; transient copy failures can then be retried independently.

Minor — CPLN_PROFILE prefix on docker manifest inspect is a no-op

Docker does not read CPLN_PROFILE. Credentials for the staging registry were already stored by cpln image docker-login. The env-var prefix is misleading and should be dropped.

Minor — cpln image query pipeline has no diagnostic error message

If the query or jq step fails (API outage, unexpected JSON shape), set -e will abort the script but the runner log will only show the raw tool error. An explicit error message would make CI failures easier to diagnose.

Minor — Empty Deployed image in summary when copy step fails

If copy-image exits before writing its output, COPIED_IMAGE is empty. In the success branch, this produces Deployed image: \`(empty backticks). A fallback like${COPIED_IMAGE:-unknown}` would be cleaner.

for attempt in $(seq 1 "${copy_image_attempts}"); do
if cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}"; then
if CPLN_PROFILE="${upstream_profile}" docker manifest inspect "${source_image_ref}" >/dev/null &&
cpln image copy "${STAGING_IMAGE}" \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPLN_PROFILE is not read by Docker — it's a cpln-specific env var. Docker uses the credential store already populated by cpln image docker-login above. The prefix is a no-op here and should be removed to avoid confusion.

Suggested change
cpln image copy "${STAGING_IMAGE}" \
if docker manifest inspect "${source_image_ref}" >/dev/null &&

Comment on lines 395 to +397
for attempt in $(seq 1 "${copy_image_attempts}"); do
if cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}"; then
if CPLN_PROFILE="${upstream_profile}" docker manifest inspect "${source_image_ref}" >/dev/null &&
cpln image copy "${STAGING_IMAGE}" \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker manifest inspect check runs on every retry attempt, but it only validates Docker auth — not something a retry would fix. If Docker auth is broken, every attempt fails on the inspect before the actual cpln image copy even runs, burning all retries for an error that cannot be recovered by retrying.

Consider moving the auth check (and the docker-login) outside the loop so it fails fast on auth errors while retries are preserved for transient copy failures:

Suggested change
for attempt in $(seq 1 "${copy_image_attempts}"); do
if cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}"; then
if CPLN_PROFILE="${upstream_profile}" docker manifest inspect "${source_image_ref}" >/dev/null &&
cpln image copy "${STAGING_IMAGE}" \
copy_status=1
if ! docker manifest inspect "${source_image_ref}" >/dev/null; then
echo "::error::Cannot reach staging image '${source_image_ref}' via Docker; check Docker login credentials."
exit 1
fi
for attempt in $(seq 1 "${copy_image_attempts}"); do
if cpln image copy "${STAGING_IMAGE}" \

Comment on lines +376 to +380
latest_number="$(
cpln image query --org "${CPLN_ORG_PRODUCTION}" --prop "name~${PRODUCTION_APP_NAME}:" -o json |
jq -r --arg prefix "${PRODUCTION_APP_NAME}:" \
'[.items[].name | select(startswith($prefix)) | (try capture("^[^:]+:(?<number>[0-9]+)") catch empty) | .number | tonumber] | max // 0'
)"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cpln image query fails (API outage, rate limit, unexpected response format), set -e aborts here but the runner log will only show the raw cpln/jq error — no GitHub Actions annotation and no context about what was being attempted. Consider wrapping with an explicit error message:

Suggested change
latest_number="$(
cpln image query --org "${CPLN_ORG_PRODUCTION}" --prop "name~${PRODUCTION_APP_NAME}:" -o json |
jq -r --arg prefix "${PRODUCTION_APP_NAME}:" \
'[.items[].name | select(startswith($prefix)) | (try capture("^[^:]+:(?<number>[0-9]+)") catch empty) | .number | tonumber] | max // 0'
)"
if ! latest_number="$(
cpln image query --org "${CPLN_ORG_PRODUCTION}" --prop "name~${PRODUCTION_APP_NAME}:" -o json |
jq -r --arg prefix "${PRODUCTION_APP_NAME}:" \
'[.items[].name | select(startswith($prefix)) | (try capture("^[^:]+:(?<number>[0-9]+)") catch empty) | .number | tonumber] | max // 0'
)"; then
echo "::error::Failed to query production image list to compute next tag number."
exit 1
fi

echo "✅ Status: deployment successful"
deployed_image="${COPIED_IMAGE}"
else
echo "❌ Status: deployment failed"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the copy-image step exits before emitting its output (e.g., fails during the cpln image query), COPIED_IMAGE will be empty. In that scenario HEALTHY is also not "true", so deployed_image would be set to PREVIOUS_IMAGE via the else branch — which is correct. However, if there's ever a path where HEALTHY == "true" but the output wasn't set, the summary would show Deployed image: \`` (empty backticks). A defensive fallback is cheap:

Suggested change
echo "❌ Status: deployment failed"
deployed_image="${COPIED_IMAGE:-unknown}"

@justin808 justin808 merged commit 7e04fed into master Jun 2, 2026
18 checks passed
@justin808 justin808 deleted the jg-codex/use-native-production-image-copy branch June 2, 2026 21:29
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

✅ Review App Deleted

Review app for PR #756 is deleted

🎮 Control Plane Console
📋 View Workflow Logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant