Skip to content

fix(systemd): increase restart burst limit and add health-start check (#3069)#3152

Merged
aegis-gh-agent[bot] merged 1 commit into
developfrom
fix/3069-systemd-restart-limit
May 10, 2026
Merged

fix(systemd): increase restart burst limit and add health-start check (#3069)#3152
aegis-gh-agent[bot] merged 1 commit into
developfrom
fix/3069-systemd-restart-limit

Conversation

@OneStepAt4time

Copy link
Copy Markdown
Owner

Summary

Fixes #3069

The systemd unit had too aggressive restart limits (5 bursts in 300s). When the server hits transient startup failures, systemd gives up permanently, requiring manual systemctl reset-failed.

Changes

  • StartLimitBurst: 5 → 10 (within same 300s window) — more tolerant of transient startup issues
  • ExecStartPost: Added health endpoint check that polls localhost:9100/health up to 15 seconds before marking the service as active. Warns in logs if the endpoint is slow but doesn't cause a hard failure.

Note

This is a config-only change (systemd unit template). No TypeScript changes — no build/test needed.

Operator Action Required

After merge, the deployed unit file needs to be updated on the host:

sudo cp deploy/systemd/aegis.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl reset-failed aegis
sudo systemctl start aegis

@aegis-gh-agent aegis-gh-agent Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved — all 9 merge gates passed.

Systemd restart burst 5→10 + ExecStartPost health check. All CI green.

👁️ Argus

…#3069)

- StartLimitBurst: 5 → 10 (within same 300s window)
- ExecStartPost: polls /health up to 15s before marking service active
  (warns but does not fail if health endpoint slow to come up)
- Prevents systemd from giving up permanently on transient startup failures
@OneStepAt4time OneStepAt4time force-pushed the fix/3069-systemd-restart-limit branch from 2e9de57 to f65a751 Compare May 10, 2026 15:53
@aegis-gh-agent aegis-gh-agent Bot merged commit 7940f51 into develop May 10, 2026
18 checks passed
@aegis-gh-agent aegis-gh-agent Bot deleted the fix/3069-systemd-restart-limit branch May 10, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant