Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 38 additions & 4 deletions .github/workflows/cpflow-promote-staging-to-production.yml
Original file line number Diff line number Diff line change
Expand Up @@ -336,10 +336,10 @@ jobs:
echo "image=${staging_image}" >> "$GITHUB_OUTPUT"

- name: Copy image from staging
id: copy-image
env:
# Pass the upstream token via env rather than `-t` so it doesn't appear in /proc/<pid>/cmdline.
CPLN_TOKEN_STAGING: ${{ secrets.CPLN_TOKEN_STAGING }}
CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }}
PRODUCTION_APP_NAME: ${{ vars.PRODUCTION_APP_NAME }}
CPLN_ORG_STAGING: ${{ vars.CPLN_ORG_STAGING }}
CPLN_ORG_PRODUCTION: ${{ vars.CPLN_ORG_PRODUCTION }}
Expand Down Expand Up @@ -367,9 +367,39 @@ jobs:
exit 1
fi

staging_commit="${STAGING_IMAGE##*_}"
if [[ "${staging_commit}" == "${STAGING_IMAGE}" || -z "${staging_commit}" ]]; then
echo "::error::Staging image '${STAGING_IMAGE}' does not include the expected '_<commit>' suffix."
exit 1
fi
Comment on lines +370 to +374

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 staging_commit extraction breaks for app names containing underscores

${STAGING_IMAGE##*_} strips the longest *_-prefixed match, meaning it operates on the entire string including the app-name portion before the colon. If the staging app name contains underscores (e.g., my_staging_app:3_abc1234), the extraction produces abc1234 (correct here), but for my_staging_app:latest it would produce app:latest — a value that passes the guard checks ("app:latest" != "my_staging_app:latest" and it is non-empty) but is an incorrect commit suffix embedded in production_image. CPLN app names conventionally use hyphens rather than underscores, so in practice this edge case is unlikely, but the extraction logic does not enforce this constraint.


latest_number="$(
cpln image query --org "${CPLN_ORG_PRODUCTION}" --prop "name~${PRODUCTION_APP_NAME}:" -o json |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Query all images before assigning the next tag

When the production org has more than the CLI's default result window of matching images, this query can miss the current highest numbered tag and compute a duplicate or stale production_image. I checked the Control Plane CLI common options, and query commands inherit --max with a default of 50; since production image retention is not configured in this repo, long-lived production orgs can exceed that and make promotions fail during cpln image copy (or deploy an unexpected older sequence). Add an explicit unbounded/large max when deriving latest_number.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRODUCTION_APP_NAME is interpolated directly into a regex filter.

The name~ operator likely interprets its value as a regular expression. If PRODUCTION_APP_NAME contains regex metacharacters (., +, [, etc.) the filter could match unintended images or cause a query error. Since you already have $prefix in the jq side for the exact startswith check, the server-side filter is just an optimistic pre-filter — consider using a literal prefix operator if the API supports one (e.g., name=^ or name~^${PRODUCTION_APP_NAME}:), or note that PRODUCTION_APP_NAME is expected to be alphanumeric-only.

jq -r --arg prefix "${PRODUCTION_APP_NAME}:" \
'[.items[].name | select(startswith($prefix)) | (try capture("^[^:]+:(?<number>[0-9]+)") catch empty) | .number | tonumber] | max // 0'
)"
Comment on lines +376 to +380

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cpln image query fails (API outage, rate limit, unexpected response format), set -e aborts here but the runner log will only show the raw cpln/jq error — no GitHub Actions annotation and no context about what was being attempted. Consider wrapping with an explicit error message:

Suggested change
latest_number="$(
cpln image query --org "${CPLN_ORG_PRODUCTION}" --prop "name~${PRODUCTION_APP_NAME}:" -o json |
jq -r --arg prefix "${PRODUCTION_APP_NAME}:" \
'[.items[].name | select(startswith($prefix)) | (try capture("^[^:]+:(?<number>[0-9]+)") catch empty) | .number | tonumber] | max // 0'
)"
if ! latest_number="$(
cpln image query --org "${CPLN_ORG_PRODUCTION}" --prop "name~${PRODUCTION_APP_NAME}:" -o json |
jq -r --arg prefix "${PRODUCTION_APP_NAME}:" \
'[.items[].name | select(startswith($prefix)) | (try capture("^[^:]+:(?<number>[0-9]+)") catch empty) | .number | tonumber] | max // 0'
)"; then
echo "::error::Failed to query production image list to compute next tag number."
exit 1
fi

production_image="${PRODUCTION_APP_NAME}:$((latest_number + 1))_${staging_commit}"
source_image_ref="${CPLN_ORG_STAGING}.registry.cpln.io/${STAGING_IMAGE}"

upstream_profile="upstream-${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}"
cleanup_upstream_profile() {
cpln profile delete "${upstream_profile}" >/dev/null 2>&1 || true
}
trap cleanup_upstream_profile EXIT

cleanup_upstream_profile
CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln profile create "${upstream_profile}" >/dev/null
CPLN_PROFILE="${upstream_profile}" cpln image docker-login --org "${CPLN_ORG_STAGING}" >/dev/null

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Override CPLN_TOKEN when using the staging profile

In the normal promotion job, the earlier setup action leaves the production secret in CPLN_TOKEN for later steps, and Control Plane's auth docs say token precedence is --token, then CPLN_TOKEN, then the profile token (https://docs.controlplane.com/cli-reference/get-started/authentication#token-precedence). With separate staging/production credentials, setting only CPLN_PROFILE/--profile here still authenticates staging registry operations with the production token, so docker-login/image copy fails against CPLN_ORG_STAGING; unset CPLN_TOKEN or set it to CPLN_TOKEN_STAGING for the source-profile commands.

Useful? React with 👍 / 👎.


copy_status=1
for attempt in $(seq 1 "${copy_image_attempts}"); do
if cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}"; then
if CPLN_PROFILE="${upstream_profile}" docker manifest inspect "${source_image_ref}" >/dev/null &&
cpln image copy "${STAGING_IMAGE}" \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPLN_PROFILE is not read by Docker — it's a cpln-specific env var. Docker uses the credential store already populated by cpln image docker-login above. The prefix is a no-op here and should be removed to avoid confusion.

Suggested change
cpln image copy "${STAGING_IMAGE}" \
if docker manifest inspect "${source_image_ref}" >/dev/null &&

Comment on lines 395 to +397

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker manifest inspect check runs on every retry attempt, but it only validates Docker auth — not something a retry would fix. If Docker auth is broken, every attempt fails on the inspect before the actual cpln image copy even runs, burning all retries for an error that cannot be recovered by retrying.

Consider moving the auth check (and the docker-login) outside the loop so it fails fast on auth errors while retries are preserved for transient copy failures:

Suggested change
for attempt in $(seq 1 "${copy_image_attempts}"); do
if cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" --image "${STAGING_IMAGE}"; then
if CPLN_PROFILE="${upstream_profile}" docker manifest inspect "${source_image_ref}" >/dev/null &&
cpln image copy "${STAGING_IMAGE}" \
copy_status=1
if ! docker manifest inspect "${source_image_ref}" >/dev/null; then
echo "::error::Cannot reach staging image '${source_image_ref}' via Docker; check Docker login credentials."
exit 1
fi
for attempt in $(seq 1 "${copy_image_attempts}"); do
if cpln image copy "${STAGING_IMAGE}" \

--profile "${upstream_profile}" \
--org "${CPLN_ORG_STAGING}" \
--to-profile default \
--to-org "${CPLN_ORG_PRODUCTION}" \
--to-name "${production_image}"; then
copy_status=0
break
else
Expand All @@ -389,6 +419,8 @@ jobs:
exit "${copy_status}"
fi

echo "image=${production_image}" >> "$GITHUB_OUTPUT"

- name: Deploy image to production
env:
PRODUCTION_APP_NAME: ${{ vars.PRODUCTION_APP_NAME }}
Expand Down Expand Up @@ -553,21 +585,23 @@ jobs:
HEALTHY: ${{ steps.health-check.outputs.healthy }}
PREVIOUS_IMAGE: ${{ steps.capture-current.outputs.current_image }}
PREVIOUS_VERSION: ${{ steps.capture-current.outputs.current_version }}
DEPLOYED_IMAGE: ${{ steps.staging-image.outputs.image }}
COPIED_IMAGE: ${{ steps.copy-image.outputs.image }}
shell: bash
run: |
{
echo "## Promotion Summary"
echo
if [[ "${HEALTHY}" == "true" ]]; then
echo "✅ Status: deployment successful"
deployed_image="${COPIED_IMAGE}"
else
echo "❌ Status: deployment failed"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the copy-image step exits before emitting its output (e.g., fails during the cpln image query), COPIED_IMAGE will be empty. In that scenario HEALTHY is also not "true", so deployed_image would be set to PREVIOUS_IMAGE via the else branch — which is correct. However, if there's ever a path where HEALTHY == "true" but the output wasn't set, the summary would show Deployed image: \`` (empty backticks). A defensive fallback is cheap:

Suggested change
echo "❌ Status: deployment failed"
deployed_image="${COPIED_IMAGE:-unknown}"

deployed_image="${PREVIOUS_IMAGE}"
fi
echo
echo "Previous image: \`${PREVIOUS_IMAGE}\`"
echo "Previous version: ${PREVIOUS_VERSION}"
echo "Deployed image: \`${DEPLOYED_IMAGE}\`"
echo "Deployed image: \`${deployed_image}\`"
} >> "$GITHUB_STEP_SUMMARY"

create-github-release:
Expand Down
Loading