fix(cicd): allow skipped Deploy-to-Morpheus-P-Node to satisfy C-Node deploy needs#716
Merged
Conversation
…deploy needs
Root cause of the first v7 test-branch run (GH Actions run 24796856145):
the drain job ran and deregistered the dev C-Node from both NLB target
groups, then Deploy-to-Morpheus-C-Node was silently skipped because its
implicit success() guard propagated the skip from Deploy-to-Morpheus-P-Node
(which is main-only and intentionally skips on test pushes). That left
dev with the old C-Node task running in ECS but with no load-balancer
membership until we manually re-registered it.
Fix:
- Add an explicit `if` to Deploy-to-Morpheus-C-Node that:
- Requires GHCR build + drain success.
- Accepts Deploy-to-Morpheus-P-Node.result in {success, skipped}.
- Wraps in `!cancelled()` so manual cancels still short-circuit.
- Rewrite the inline comment that previously (incorrectly) claimed
skipped jobs are treated as successful for dependency resolution;
they are not.
No change to the deploy logic itself, the drain job, or any other
workflow sequencing. This is a pure `needs` / guard correctness fix.
Made-with: Cursor
…deploy needs (#715) ## Summary Hotfix for the first v7 `test`-branch deployment attempt ([Actions run #1447 / run id 24796856145](https://github.com/MorpheusAIs/Morpheus-Lumerin-Node/actions/runs/24796856145)). The previous PR (#714 → #713) introduced a new job sequence: ``` Drain-Morpheus-C-Node → Deploy-to-Morpheus-P-Node → Deploy-to-Morpheus-C-Node ``` On push to `test`, `Deploy-to-Morpheus-P-Node` is correctly **skipped** (its own `if` restricts it to `main` — no dev P-Node exists). That skip then propagated onto `Deploy-to-Morpheus-C-Node` through the implicit `success()` guard that every job has when no explicit `if` tolerates a skipped dependency. Net effect: - `Drain-Morpheus-C-Node` ✅ removed the running dev C-Node task from both NLB target groups. - `Deploy-to-Morpheus-C-Node` ❌ silently skipped — no new task registered, no re-register of the existing task. - Dev C-Node endpoints (`router.dev.mor.org:8082`/`:8545`) stopped serving traffic until we manually re-registered the old task to the TGs. ## What this PR changes `.github/workflows/build.yml`, `Deploy-to-Morpheus-C-Node` job only: ### Explicit `if` guard that tolerates the P-Node skip ```yaml if: | !cancelled() && github.repository == 'MorpheusAIs/Morpheus-Lumerin-Node' && needs.GHCR-Build-and-Push.result == 'success' && needs.Drain-Morpheus-C-Node.result == 'success' && (needs.Deploy-to-Morpheus-P-Node.result == 'success' || needs.Deploy-to-Morpheus-P-Node.result == 'skipped') && ( (github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/test')) || (github.event_name == 'workflow_dispatch' && github.event.inputs.build_all_os == 'true' && github.event.inputs.create_deployment == 'true') ) ``` - Still requires real success from the drain + GHCR jobs. - Explicitly accepts `success` OR `skipped` for the P-Node dependency. - `!cancelled()` short-circuits the job if someone cancels the run, so we don't try to redeploy a half-cancelled sequence. ### Comment rewrite The previous inline comment claimed skipped jobs are treated as successful for dependency resolution. That's the opposite of how GitHub Actions actually behaves — the note is replaced with an accurate explanation and a reference to this incident so future-us (or other maintainers) won't step on the same rake. ## What this PR does NOT change - No change to the drain job or the controlled-traffic C-Node deploy sequence (register→dereg→hold→rereg→wait). - No change to the P-Node job or its `if` gating. - No change to any other workflow or deploy script. This is a pure guard-correctness fix. ## Test plan - [x] YAML syntax validated (`yaml.safe_load`). - [x] Manually walked through `needs` outcomes: - On push to `test`: P-Node `skipped`, drain + GHCR `success` → C-Node **runs**. ✅ - On push to `main`: all three `success` → C-Node **runs**. ✅ - Drain fails → C-Node **skipped** (won't redeploy while the TG state is ambiguous). ✅ - GHCR fails → C-Node **skipped**. ✅ - P-Node fails on main (not skipped) → C-Node **skipped** (we want to bail rather than leave prd with a v-mismatch between providers and the consumer). ✅ - [ ] Merge to `dev`, promote to `test`, confirm `Deploy-to-Morpheus-Consumer` actually runs this time and completes the drain → hold → re-register → /healthcheck cycle end-to-end. - [ ] Once green on test, cut `test` → `main` for first prd exercise. ## Related - Previous PR: #714 (dev → test) carrying #713 (initial drain/sequence/hold). - Companion IAM + planning doc already applied in `Morpheus-Infra`. Made with [Cursor](https://cursor.com)
nomadicrogue
approved these changes
Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promotes the #715 hotfix from
dev→testso we can retry the v7 test deploy and actually exercise the new drain / hold / re-register sequence end-to-end.What's in this PR
Single-commit delta ahead of
test:2b14c58— fix(cicd): allow skipped Deploy-to-Morpheus-P-Node to satisfy C-Node deploy needs (merged via fix(cicd): allow skipped Deploy-to-Morpheus-P-Node to satisfy C-Node deploy needs #715)Everything else in
devis already ontestfrom #714.Background
First v7 test run (Actions run #1447 / run id 24796856145) revealed a correctness gap in the workflow from #713/#714:
Drain-Morpheus-C-Noderan and successfully drained the dev C-Node targets from both NLB target groups.Deploy-to-Morpheus-P-Nodewas correctly skipped (main-only).Deploy-to-Morpheus-C-Nodewas silently skipped because the implicitsuccess()guard propagated the skip from the P-Node job — GitHub Actions treats skippedneedsas non-success, contrary to the inline comment in the previous PR.Fix (this PR carries)
Explicit
ifguard onDeploy-to-Morpheus-C-Nodethat:GHCR-Build-and-PushandDrain-Morpheus-C-Node.successorskippedas the outcome forDeploy-to-Morpheus-P-Node.!cancelled()to short-circuit on manual cancel.The misleading comment is replaced with an accurate explanation referencing this incident.
Expected behavior after merge to test
On push to
testthe sequence becomes:Drain-Morpheus-C-Node✅ deregisters the existing dev C-Node targets.Deploy-to-Morpheus-P-Nodeskipped (expected).Deploy-to-Morpheus-C-Node✅ runs this time because the newifaccepts the P-Node skip:update-servicewith--health-check-grace-period-seconds 600.cnode_rehydration_wait_secs).wait target-in-service./healthcheckversion match onrouter.dev.mor.org:8082.Deploy-to-Titan+Deploy-TEE-SecretVM-testrun in parallel to the deploy sequence (unchanged).On dev the rehydration hold is effectively a no-op because EFS-backed BadgerDB is persistent there — perfect for a first validation of the workflow plumbing without depending on rehydration correctness.
Review focus
ifblock onDeploy-to-Morpheus-C-Node(lines ~1298–1320 ofbuild.yml).Test plan (after you merge to test)
Deploy-to-Morpheus-Consumerentersin_progress(not skipped) after the drain completes.deregister-targetsruns.register-targets+wait target-in-servicesucceed./healthcheckversion-match passes on the new image tag.test→mainfor the first prd exercise (existing open PR).Related
Morpheus-Infra.Made with Cursor