Commit 572fc02
authored
fix(cachewd): decouple scheduler context from signal context (#312)
The graceful shutdown PR (#307) passes the signal-notified context to
the scheduler, so SIGTERM immediately kills all workers even though
`server.Shutdown` keeps HTTP handlers alive for up to 150s. In-flight
jobs (snapshots, repacks, fetches) get context-cancelled mid-execution.
Gives the scheduler its own context via `context.WithoutCancel` (same
pattern already used for the HTTP server `BaseContext`) and cancels it
after `server.Shutdown` completes. Workers finish their current jobs
during the drain window instead of dying instantly. The drain is bounded
to 10s so long-running jobs don't block process exit past the pod's
`terminationGracePeriodSeconds`.
Production evidence from today's rolling deploys:
- Every SIGTERM produced 32+ simultaneous `Worker terminated` log lines
per pod
- Snapshot jobs killed mid-write: `tar failed: signal: killed`, `context
canceled`
- `cash-server` snapshot killed 50s in, `ios-register` mirror snapshot
killed 5m31s in
Shutdown order after this change:
1. SIGTERM → readiness flips to 503
2. `server.Shutdown` drains in-flight HTTP requests (up to 150s)
3. Scheduler context cancelled → workers finish current job and exit
4. Wait up to 10s for workers to drain, then exit1 parent 26f531a commit 572fc02
2 files changed
Lines changed: 79 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
114 | | - | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
115 | 119 | | |
116 | 120 | | |
117 | 121 | | |
| |||
163 | 167 | | |
164 | 168 | | |
165 | 169 | | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
166 | 173 | | |
167 | 174 | | |
168 | 175 | | |
| |||
191 | 198 | | |
192 | 199 | | |
193 | 200 | | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
194 | 221 | | |
195 | 222 | | |
196 | 223 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
338 | 389 | | |
339 | 390 | | |
340 | 391 | | |
| |||
0 commit comments