Skip to content

fix(flex): defer swarm follower wait until after runtime build#663

Merged
kojiromike merged 1 commit intoopenemr:masterfrom
Jmevorach:master
Apr 27, 2026
Merged

fix(flex): defer swarm follower wait until after runtime build#663
kojiromike merged 1 commit intoopenemr:masterfrom
Jmevorach:master

Conversation

@Jmevorach
Copy link
Copy Markdown
Contributor

@Jmevorach Jmevorach commented Apr 27, 2026

Flex containers in SWARM_MODE=start the follower alongside the leader. The follower was blocking on the leader before cloning/building OpenEMR, so its Docker health start_period was consumed during the wait and then again during npm/composer—often exceeding the health check window and failing CI.

  • Introduce SWARM_WAIT_DEFERRED: followers skip wait_for_swarm_completion until after the local flex build block, then run wait + prepare_swarm_leader.
  • Split wait vs leader prep into wait_for_swarm_completion and prepare_swarm_leader for a clear call order.
  • Fix leader takeover checks: try_become_leader always returns 0, so gate promotion on AUTHORITY=yes instead of treating the return code as success.

Tests:

  • Bats: assert_script_syntax on flex openemr.sh; smoke checks for swarm deferral.

Validated with: bats tests/bats/flex/openemr.bats and ./utilities/container_benchmarking/test_functionality.sh flex --test swarm_mode

Closes #661

Flex containers in SWARM_MODE=start the follower alongside the leader. The
follower was blocking on the leader before cloning/building OpenEMR, so its
Docker health start_period was consumed during the wait and then again during
npm/composer—often exceeding the health check window and failing CI.

- Introduce SWARM_WAIT_DEFERRED: followers skip wait_for_swarm_completion until
  after the local flex build block, then run wait + prepare_swarm_leader.
- Split wait vs leader prep into wait_for_swarm_completion and
  prepare_swarm_leader for a clear call order.
- Fix leader takeover checks: try_become_leader always returns 0, so gate
  promotion on AUTHORITY=yes instead of treating the return code as success.

Tests:
- Bats: assert_script_syntax on flex openemr.sh; smoke checks for swarm deferral.

Validated with: bats tests/bats/flex/openemr.bats and ./utilities/container_benchmarking/test_functionality.sh flex --test swarm_mode
@Jmevorach
Copy link
Copy Markdown
Contributor Author

@kojiromike closes #661

Copy link
Copy Markdown
Member

@kojiromike kojiromike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean refactor with a real bug fix (try_become_leader return-code gating) and a sensible deferred-wait pattern that lets flex followers parallelize their local build with the leader's shared-volume setup. CI green across the matrix. New assert_script_syntax test is a nice safety net.

@kojiromike kojiromike merged commit 53a7f8d into openemr:master Apr 27, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI flake: Production Docker (flex) sometimes fails with "application not healthy after 10m0s"

2 participants