Skip to content

Commit 963c42d

Browse files
cstocktonChris Stocktonsamrose
authored
fix: set restart limits to 0 to prevent being marked as failed (#1952)
* fix: set restart limits to 0 to prevent being marked as failed The systemd default is 10s / 5 for these values with a DefaultRestartUSec of 100ms. Most services set a RestartSec limit of 3, under most circumstances it takes 15s to restart 5 times so the limit of 10s is not exceeded. However if other system processes (salt, cloud init) restart it explicitly, or recovering system services within the --before chain trigger a restart the limit can be exceeded causing it to be marked as failed. Since no services mark gotrue.service as required it will remain offline until the next explicit restart is issued. Setting these values to 0 with Restart=always and RestartSec=3 will prevent gotrue from being marked as failed. * chore: set StartLimits for persistent services. I've noticed all !oneshot services set a `RestartSec` of `3s` and we use the systemd defaults of `StartLimitBurst=5` and `StartLimitInterval=10s`. Together this forms a property that under typical conditions a service will be restarted indefinitely until it comes back up due to `(3s * 5) > 10s`, but it is still possible for a service to enter a failed state under some scenarios. This change defensively sets them to 0/0 to keep them in restart loops. * chore: suffix to test * chore: bump to release --------- Co-authored-by: Chris Stockton <chris.stockton@supabase.io> Co-authored-by: Sam Rose <samuel@supabase.io>
1 parent 6e50974 commit 963c42d

8 files changed

Lines changed: 33 additions & 9 deletions

ansible/files/adminapi.service.j2

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,8 @@ Description=AdminAPI
33
Requires=network-online.target
44
After=network-online.target
55

6-
# Move this to the Service section if on systemd >=250
7-
StartLimitIntervalSec=60
8-
StartLimitBurst=10
6+
StartLimitIntervalSec=0
7+
StartLimitBurst=0
98

109
[Service]
1110
Type=simple

ansible/files/gotrue.service.j2

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,19 @@ After=network-online.target systemd-resolved.service
4040
Wants=postgresql.service
4141
After=postgresql.service
4242

43-
# Lower start limit ival and burst to prevent the noisy flapping
44-
StartLimitIntervalSec=10
45-
StartLimitBurst=5
43+
# The systemd default is 10s / 5 for these values with a DefaultRestartUSec of
44+
# 100ms. Most services set a RestartSec limit of 3, under most circumstances it
45+
# takes 15s to restart 5 times so the limit of 10s is not exceeded. However if
46+
# other system processes (salt, cloud init) restart it explicitly, or recovering
47+
# system services within the --before chain trigger a restart the limit can be
48+
# exceeded causing it to be marked as failed. Since no services mark
49+
# gotrue.service as required it will remain offline until the next explicit
50+
# restart is issued.
51+
#
52+
# Setting these values to 0 with Restart=always and RestartSec=3 will prevent
53+
# gotrue from being marked as failed.
54+
StartLimitIntervalSec=0
55+
StartLimitBurst=0
4656

4757
[Service]
4858
Type=exec

ansible/files/nginx.service.j2

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ Description=nginx server
33
After=postgrest.service gotrue.service adminapi.service
44
Wants=postgrest.service gotrue.service adminapi.service
55

6+
StartLimitIntervalSec=0
7+
StartLimitBurst=0
8+
69
[Service]
710
Type=forking
811
ExecStart=/usr/local/nginx/sbin/nginx -c /etc/nginx/nginx.conf

ansible/files/pg_egress_collect.service.j2

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
[Unit]
22
Description=Postgres Egress Collector
33

4+
StartLimitIntervalSec=0
5+
StartLimitBurst=0
6+
47
[Service]
58
Type=simple
69
ExecStart=/bin/bash -c "tcpdump -s 128 -Q out -nn -tt -vv -p -l 'tcp and (port 5432 or port 6543)' | perl /root/pg_egress_collect.pl"

ansible/files/postgres_exporter.service.j2

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
[Unit]
22
Description=Postgres Exporter
33

4+
StartLimitIntervalSec=0
5+
StartLimitBurst=0
6+
47
[Service]
58
Type=simple
69
ExecStart=/opt/postgres_exporter/postgres_exporter --disable-settings-metrics --extend.query-path="/opt/postgres_exporter/queries.yml" --disable-default-metrics --no-collector.locks --no-collector.replication --no-collector.replication_slot --no-collector.stat_bgwriter --no-collector.stat_database --no-collector.stat_user_tables --no-collector.statio_user_tables --no-collector.wal {% if qemu_mode is defined and qemu_mode %}--no-collector.database {% endif %}

ansible/files/postgrest.service.j2

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ Description=PostgREST
33
Requires=postgrest-optimizations.service
44
After=postgrest-optimizations.service
55

6+
StartLimitIntervalSec=0
7+
StartLimitBurst=0
8+
69
[Service]
710
Type=simple
811
# We allow the base config (sent from the worker) to override the generated config

ansible/files/vector.service.j2

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ Documentation=https://vector.dev
44
After=network-online.target
55
Requires=network-online.target
66

7+
StartLimitIntervalSec=0
8+
StartLimitBurst=0
9+
710
[Service]
811
User=vector
912
Group=vector

ansible/vars.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@ postgres_major:
1010

1111
# Full version strings for each major version
1212
postgres_release:
13-
postgresorioledb-17: "17.6.0.023-orioledb"
14-
postgres17: "17.6.1.066"
15-
postgres15: "15.14.1.066"
13+
postgresorioledb-17: "17.6.0.024-orioledb"
14+
postgres17: "17.6.1.067"
15+
postgres15: "15.14.1.067"
1616

1717
# Non Postgres Extensions
1818
pgbouncer_release: 1.19.0

0 commit comments

Comments
 (0)