Skip to content

Commit 57b88f4

Browse files
committed
add celery worker resilience for database connection timeouts
- close stale db connections before each task via task_prerun signal (mirrors what django does for http requests) - add Restart=on-failure to worker and beat systemd services - add --loglevel info to worker service for diagnostics
1 parent 71c97d3 commit 57b88f4

3 files changed

Lines changed: 19 additions & 0 deletions

File tree

etc/systemd/system/patchman-celery-beat.service

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ After=network-online.target
55

66
[Service]
77
Type=simple
8+
Restart=on-failure
9+
RestartSec=10
810
User=patchman
911
Group=patchman
1012
Environment="REDIS_HOST=127.0.0.1"

etc/systemd/system/patchman-celery-worker@.service

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ After=network-online.target
55

66
[Service]
77
Type=simple
8+
Restart=on-failure
9+
RestartSec=10
810
User=patchman
911
Group=patchman
1012
Environment="REDIS_HOST=127.0.0.1"
@@ -19,6 +21,7 @@ ExecStart=/usr/bin/celery \
1921
--task-events \
2022
--pool ${CELERY_POOL_TYPE} \
2123
--concurrency ${CELERY_CONCURRENCY} \
24+
--loglevel info \
2225
--hostname patchman-celery-worker%i@%%h
2326

2427
[Install]

patchman/celery.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,24 @@
1717
import os
1818

1919
from celery import Celery
20+
from celery.signals import task_prerun
2021

2122
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'patchman.settings') # noqa
2223
from django.conf import settings # noqa
2324

2425
app = Celery('patchman')
2526
app.config_from_object('django.conf:settings', namespace='CELERY')
2627
app.autodiscover_tasks()
28+
29+
30+
@task_prerun.connect
31+
def close_stale_connections(**kwargs):
32+
"""Close stale DB connections before each task.
33+
34+
Django does this automatically for HTTP requests but not for Celery
35+
tasks. Without this, long-lived workers hit 'server has gone away'
36+
(MySQL) or 'server closed the connection unexpectedly' (PostgreSQL)
37+
when the DB server drops idle connections.
38+
"""
39+
from django import db
40+
db.close_old_connections()

0 commit comments

Comments
 (0)