Skip to content

Commit 0cbb9af

Browse files
ralyodioclaude
andauthored
fix(deploy): restart orphaned podcast worker + prevent start-limiter wedge (#115)
The bittorrented-podcast-worker systemd service is a separate long-running process that `next start` does not recreate. When it crash-looped during the June-7 mise/pnpm path drift it hit systemd's start limiter, went `failed`, and stayed dead for ~2 weeks: podcast episode ingestion stopped on 2026-06-07 because deploy-droplet.yml only restarts the main app + iptv worker, never the podcast worker, and its unit lacked StartLimitIntervalSec=0. - deploy-droplet.yml: reset-failed + restart bittorrented-podcast-worker on every deploy (and reset-failed the iptv worker for parity); report its status. - setup-server.sh: add StartLimitIntervalSec=0 to the podcast worker unit so a transient crash self-heals instead of wedging permanently. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 91c54dc commit 0cbb9af

2 files changed

Lines changed: 14 additions & 1 deletion

File tree

.github/workflows/deploy-droplet.yml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,9 +193,18 @@ jobs:
193193
sudo systemctl restart bittorrented || echo "Warning: Could not restart main service"
194194
echo "✓ Main service restart attempted"
195195
196+
sudo systemctl reset-failed bittorrented-iptv-worker || true
196197
sudo systemctl restart bittorrented-iptv-worker || echo "Warning: Could not restart IPTV worker"
197198
echo "✓ IPTV worker restart attempted"
198-
199+
200+
# The podcast worker is a long-running process that is NOT recreated by
201+
# `next start`; if it crash-loops (e.g. the mise/pnpm path drift) it hits
202+
# systemd's start limiter and stays `failed` forever because nothing here
203+
# restarts it. Reset the limiter and restart it on every deploy.
204+
sudo systemctl reset-failed bittorrented-podcast-worker || true
205+
sudo systemctl restart bittorrented-podcast-worker || echo "Warning: Could not restart podcast worker"
206+
echo "✓ Podcast worker restart attempted"
207+
199208
echo ""
200209
echo "=== Verifying deployment ==="
201210
# Give the unit a moment to either become active or crash so the
@@ -212,6 +221,7 @@ jobs:
212221
echo "failed" > "$STATUS_FILE"; exit 1
213222
fi
214223
systemctl is-active bittorrented-iptv-worker || echo "IPTV worker status unknown"
224+
systemctl is-active bittorrented-podcast-worker || echo "Podcast worker status unknown"
215225
216226
echo "Waiting for HTTP health check..."
217227
HEALTH_OK=false

scripts/setup-server.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -933,6 +933,9 @@ WorkingDirectory=${DEPLOY_PATH}
933933
ExecStart=/bin/bash -c 'set -a; source ${DEPLOY_PATH}/.env; set +a; exec ${PNPM_HOME}/pnpm podcast-worker'
934934
Restart=on-failure
935935
RestartSec=30
936+
# Never let crash-looping wedge the worker in systemd's start limiter; it must
937+
# keep retrying so it self-heals after a transient failure (e.g. feed/network).
938+
StartLimitIntervalSec=0
936939
Environment=NODE_ENV=production
937940
Environment=PATH=${PNPM_HOME}:/usr/local/bin:/usr/bin:/bin
938941

0 commit comments

Comments
 (0)