Skip to content

Commit 3fbe2da

Browse files
fix(dotcom): lower Zero kill_timeout to Fly's 5m API cap (tldraw#8656)
In order to unblock production deploys, this PR lowers the Zero RM/VS `kill_timeout` from `10m` back to `5m`. Fly's Machines API rejects anything over 5 minutes regardless of CPU kind: ``` Error: failed to update machine ...: invalid stop_config.timeout, cannot exceed 5 minutes ``` This contradicts Fly's [graceful VM exits guide](https://fly.io/blog/graceful-vm-exits-some-dials/), which suggests up to 24h on dedicated CPU. The 5m cap from the API is the authoritative limit today. Drain budget is now half what Rocicorp's CZ uses, but it's the ceiling Fly will accept. Follow-up to tldraw#8627. ### Change type - [x] `bugfix` ### Test plan 1. Merge to `production` → `deploy-dotcom.yml` should complete the VS/RM rolling update without the `invalid stop_config.timeout` error. 2. Verify generated `flyio-view-syncer.toml` and `flyio-replication-manager.toml` contain `kill_timeout = "5m"` at top level. ### Code changes | Section | LOC change | | -------------- | ---------- | | Config/tooling | +4 / -3 |
1 parent 5dfc426 commit 3fbe2da

1 file changed

Lines changed: 4 additions & 3 deletions

File tree

internal/scripts/deploy-dotcom.ts

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -289,8 +289,9 @@ const zeroQueryUrl = `${env.MULTIPLAYER_SERVER.replace(/^ws/, 'http')}/app/zero/
289289
// Production uses performance (dedicated) CPUs for both RM and VS; staging uses shared.
290290
// killTimeout: window between SIGTERM and SIGKILL on stop. Lets VS drain client
291291
// WebSockets and RM flush litestream / release /data handles before being killed.
292-
// Fly caps it at 5m on shared CPU and 24h on dedicated; production uses 10m to
293-
// match Rocicorp's CZ default, staging is pinned to the 5m shared-CPU ceiling.
292+
// Fly's API caps it at 5m regardless of CPU kind (the 24h dedicated-CPU figure
293+
// from their blog is not actually accepted by the Machines API today — deploys
294+
// fail with "invalid stop_config.timeout, cannot exceed 5 minutes").
294295
const zeroVmSizes = {
295296
staging: {
296297
rm: { cpus: 1, memory: '2gb', cpuKind: 'shared' },
@@ -304,7 +305,7 @@ const zeroVmSizes = {
304305
vs: { cpus: 4, memory: '8gb', cpuKind: 'performance' },
305306
volumeSize: '8gb',
306307
vsMinMachines: 4,
307-
killTimeout: '10m',
308+
killTimeout: '5m',
308309
},
309310
preview: { single: { cpus: 2, memory: '2gb' } },
310311
} as const

0 commit comments

Comments
 (0)