Skip to content

Commit d8d80a0

Browse files
fix(deploy): add healthcheck, restart policies, and overlay driver to swarm stack
Backend healthcheck (fetch /api/health) lets Swarm detect when migrations are done — without it the worker can hit a pre-migration schema. Explicit restart_policy on every service replaces the implicit Swarm default; crash-looping services (worker, temporal) get max_attempts. Internal network gets driver: overlay for clarity.
1 parent a03b9a2 commit d8d80a0

1 file changed

Lines changed: 25 additions & 1 deletion

File tree

  • tools/deployment/ansible/deploy-application

tools/deployment/ansible/deploy-application/main.yml

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,11 +114,20 @@
114114
TRUST_PROXY: 'true'
115115
RATE_LIMIT_EXECUTE_PER_MINUTE: "{{ lookup('env', 'RATE_LIMIT_EXECUTE_PER_MINUTE') or '10' }}"
116116
RATE_LIMIT_EXECUTE_PER_DAY: "{{ lookup('env', 'RATE_LIMIT_EXECUTE_PER_DAY') or '50' }}"
117+
healthcheck:
118+
test: ['CMD', 'node', '-e', "fetch('http://127.0.0.1:3001/api/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))"]
119+
interval: 10s
120+
timeout: 5s
121+
retries: 6
122+
start_period: 15s
117123
networks:
118124
internal:
119-
# the web image's nginx proxies to http://backend:3001
120125
aliases: [backend]
121126
deploy:
127+
restart_policy:
128+
condition: any
129+
delay: 5s
130+
max_attempts: 10
122131
placement:
123132
constraints:
124133
- node.role==worker
@@ -134,6 +143,10 @@
134143
networks:
135144
internal:
136145
deploy:
146+
restart_policy:
147+
condition: any
148+
delay: 5s
149+
max_attempts: 20
137150
placement:
138151
constraints:
139152
- node.role==worker
@@ -150,6 +163,9 @@
150163
internal:
151164
aliases: [app-db]
152165
deploy:
166+
restart_policy:
167+
condition: any
168+
delay: 5s
153169
placement:
154170
constraints:
155171
- node.labels.ai-studio-data==true
@@ -166,6 +182,9 @@
166182
internal:
167183
aliases: [temporal-db]
168184
deploy:
185+
restart_policy:
186+
condition: any
187+
delay: 5s
169188
placement:
170189
constraints:
171190
- node.labels.ai-studio-data==true
@@ -182,6 +201,10 @@
182201
internal:
183202
aliases: [temporal]
184203
deploy:
204+
restart_policy:
205+
condition: any
206+
delay: 5s
207+
max_attempts: 20
185208
placement:
186209
constraints:
187210
- node.role==worker
@@ -192,6 +215,7 @@
192215
193216
networks:
194217
internal:
218+
driver: overlay
195219
traefik-host-external:
196220
external: true
197221

0 commit comments

Comments
 (0)