Skip to content

Commit 2ddc125

Browse files
authored
fix(cicd): allow skipped Deploy-to-Morpheus-P-Node to satisfy C-Node deploy needs (#716)
## Summary Promotes the #715 hotfix from `dev` → `test` so we can retry the v7 test deploy and actually exercise the new drain / hold / re-register sequence end-to-end. ## What's in this PR Single-commit delta ahead of `test`: - **`2b14c58` — fix(cicd): allow skipped Deploy-to-Morpheus-P-Node to satisfy C-Node deploy needs** (merged via #715) Everything else in `dev` is already on `test` from #714. ## Background First v7 test run ([Actions run #1447 / run id 24796856145](https://github.com/MorpheusAIs/Morpheus-Lumerin-Node/actions/runs/24796856145)) revealed a correctness gap in the workflow from #713/#714: - `Drain-Morpheus-C-Node` ran and successfully drained the dev C-Node targets from both NLB target groups. - `Deploy-to-Morpheus-P-Node` was correctly skipped (main-only). - `Deploy-to-Morpheus-C-Node` was **silently skipped** because the implicit `success()` guard propagated the skip from the P-Node job — GitHub Actions treats skipped `needs` as non-success, contrary to the inline comment in the previous PR. - Dev C-Node remained functional in ECS but without LB membership until manually re-registered. ## Fix (this PR carries) Explicit `if` guard on `Deploy-to-Morpheus-C-Node` that: - Requires real success from `GHCR-Build-and-Push` and `Drain-Morpheus-C-Node`. - Accepts either `success` or `skipped` as the outcome for `Deploy-to-Morpheus-P-Node`. - Wraps everything in `!cancelled()` to short-circuit on manual cancel. The misleading comment is replaced with an accurate explanation referencing this incident. ## Expected behavior after merge to test On push to `test` the sequence becomes: 1. Build + test + GHCR push (~10–20 min). 2. `Drain-Morpheus-C-Node` ✅ deregisters the existing dev C-Node targets. 3. `Deploy-to-Morpheus-P-Node` skipped (expected). 4. `Deploy-to-Morpheus-C-Node` ✅ **runs this time** because the new `if` accepts the P-Node skip: - `update-service` with `--health-check-grace-period-seconds 600`. - Poll new task ENI IP. - Deregister IP from TGs. - Sleep 90s (`cnode_rehydration_wait_secs`). - Re-register IP, `wait target-in-service`. - Public `/healthcheck` version match on `router.dev.mor.org:8082`. 5. `Deploy-to-Titan` + `Deploy-TEE-SecretVM-test` run in parallel to the deploy sequence (unchanged). On dev the rehydration hold is effectively a no-op because EFS-backed BadgerDB is persistent there — perfect for a first validation of the workflow plumbing without depending on rehydration correctness. ## Review focus - The new `if` block on `Deploy-to-Morpheus-C-Node` (lines ~1298–1320 of `build.yml`). - Reasoning captured in the inline comment. - No changes to deploy logic, drain logic, or sequencing — only the guard. ## Test plan (after you merge to test) 1. Merge triggers the deploy workflow against dev. 2. Confirm `Deploy-to-Morpheus-Consumer` enters `in_progress` (not skipped) after the drain completes. 3. Watch the controlled-traffic sequence: - New task gets an ENI IP. - `deregister-targets` runs. - 90s hold. - `register-targets` + `wait target-in-service` succeed. - `/healthcheck` version-match passes on the new image tag. 4. If green, proceed with `test` → `main` for the first prd exercise (existing open PR). ## Related - Previous merges on the new CICD flow: #713, #714, #715. - Companion IAM + planning doc already applied in `Morpheus-Infra`. Made with [Cursor](https://cursor.com)
2 parents 1a3b07d + 2b14c58 commit 2ddc125

1 file changed

Lines changed: 17 additions & 3 deletions

File tree

.github/workflows/build.yml

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1297,18 +1297,32 @@ jobs:
12971297
12981298
Deploy-to-Morpheus-C-Node:
12991299
name: Deploy to Morpheus Consumer via GitHub
1300+
# NOTE on `needs` + skipped dependencies:
1301+
# GitHub Actions' default job condition is implicit `success()`, which returns
1302+
# false when ANY needed job was skipped. P-Node is main-only (see its own `if`),
1303+
# so on a `test` deploy `Deploy-to-Morpheus-P-Node` is always skipped. Without
1304+
# an explicit `if` that tolerates that skip, the C-Node deploy gets skipped
1305+
# too — which is exactly what bit us on the first v7 test run: the drain job
1306+
# ran and deregistered the C-Node, then the actual deploy silently skipped,
1307+
# leaving dev without a replacement task.
1308+
#
1309+
# This `if` explicitly allows the skipped P-Node outcome, while still gating
1310+
# on drain + GHCR success and requiring we're on a deploy-eligible branch.
1311+
# `!cancelled()` keeps the guard working if a user cancels mid-run.
13001312
if: |
1313+
!cancelled() &&
13011314
github.repository == 'MorpheusAIs/Morpheus-Lumerin-Node' &&
1315+
needs.GHCR-Build-and-Push.result == 'success' &&
1316+
needs.Drain-Morpheus-C-Node.result == 'success' &&
1317+
(needs.Deploy-to-Morpheus-P-Node.result == 'success' || needs.Deploy-to-Morpheus-P-Node.result == 'skipped') &&
13021318
(
1303-
(github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/test'))||
1319+
(github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/test')) ||
13041320
(github.event_name == 'workflow_dispatch' && github.event.inputs.build_all_os == 'true' && github.event.inputs.create_deployment == 'true')
13051321
)
13061322
needs:
13071323
- Generate-Tag
13081324
- GHCR-Build-and-Push
13091325
- Drain-Morpheus-C-Node
1310-
# P-Node is main-only; on test this job is skipped and the `needs` is still
1311-
# satisfied (skipped jobs are treated as successful for dependency resolution).
13121326
- Deploy-to-Morpheus-P-Node
13131327
runs-on: ubuntu-latest
13141328
environment: ${{ github.ref_name }}

0 commit comments

Comments
 (0)