Skip to content

feat(disk): ntfy alert when host disk fills up#92

Merged
max-tet merged 2 commits into
mainfrom
feat/disk-usage-alert
Jun 22, 2026
Merged

feat(disk): ntfy alert when host disk fills up#92
max-tet merged 2 commits into
mainfrom
feat/disk-usage-alert

Conversation

@ClaydeCode

Copy link
Copy Markdown
Owner

Why

clayde.net keeps hitting a full disk. Root cause this round: Claude Code leaks ~26 MB plugin-marketplace temp_* dirs on every refresh — 1132 of them, 13 GB, had piled up under clayde-claude/plugins/marketplaces/. Cleaned manually (disk 91% → 56%), but the failure mode is silent: a full disk breaks clones, builds, and this agent loop with no warning.

This PR makes Clayde warn before the disk fills, since Clayde already runs continuously on that host.

What

A best-effort disk guard (src/clayde/disk.py) called once per orchestrator tick, before any work — so it fires even when Claude is rate-limited (disk fills regardless of usage limits).

  • shutil.disk_usage(CLAYDE_DISK_ALERT_PATH)/data is a bind-mount on the host root partition, so it reflects host disk fullness.
  • At/over CLAYDE_DISK_ALERT_THRESHOLD_PCT (default 85) → posts a warning to the existing ntfy topic.
  • Repeat alerts rate-limited by CLAYDE_DISK_ALERT_COOLDOWN_S (default 6h), persisted in /data/disk_alert_state.json, so the 5-min tick loop doesn't spam.
  • Fully best-effort: every error logged, never raised. Reuses existing ntfy config.

New config keys (all optional, sane defaults): CLAYDE_DISK_ALERT_ENABLED, _THRESHOLD_PCT, _PATH, _COOLDOWN_S.

Out of scope

Does not auto-delete anything — alert only. The leak itself is a Claude Code bug worth filing upstream; auto-sweeping leaked temp dirs is a separate decision.

Test

tests/test_disk.py — threshold boundary, cooldown suppression, re-alert after cooldown, disabled, and usage-error swallowing. Full suite: 348 passed.

Recommended reading order

  1. src/clayde/config.py — 4 new settings
  2. src/clayde/disk.py — the guard
  3. src/clayde/orchestrator.py — hook in main()
  4. tests/test_disk.py
  5. CLAUDE.md — config table + module doc

🤖 Generated with Claude Code

Adds a best-effort disk-usage guard that runs once per orchestrator tick
(before any work, so it fires even when Claude is rate-limited). When the
partition backing CLAYDE_DISK_ALERT_PATH (default /data, same volume as the
host root) reaches the threshold (default 85%), it posts a warning to the
existing ntfy topic. Repeat alerts are rate-limited by a cooldown (default
6h) persisted in /data/disk_alert_state.json so the 5-minute tick loop does
not spam the same warning.

Motivated by recurring full-disk incidents on clayde.net: Claude Code leaks
~26MB plugin-marketplace temp dirs per refresh (1100+ accumulated to 13GB).
This surfaces the condition early instead of failing silently.

Everything is configurable via CLAYDE_DISK_ALERT_* and best-effort — any
error is logged, never raised.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/clayde/disk.py Outdated
headers = {
"Title": f"clayde.net disk {usage_pct}% full",
"Priority": "5",
"Tags": "floppy_disk,warning",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

floppy_disk? Seriously? 😀

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gone 😄 — reusing send_ntfy means the tag is set there (rotating_light for warnings). No more floppy_disk.

Comment thread src/clayde/disk.py
log.warning("could not persist disk alert state: %s", exc)


def _send(settings: Settings, *, usage_pct: int, free_gb: float) -> None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there should be a helper method for sending via ntfy somewhere. Use it if there is one.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — now delegates to webhook.notify.send_ntfy instead of the inline httpx POST. Added a test asserting the helper is called (success=False, right topic).

Addresses review: drop the ad-hoc httpx POST (and the floppy_disk tag) in
disk._send and delegate to webhook.notify.send_ntfy. success=False gives
warning styling (priority 5, rotating_light). Driven via asyncio.run from
the sync tick loop — safe since main() never runs inside an active loop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ClaydeCode ClaydeCode requested a review from max-tet June 22, 2026 07:30
@max-tet max-tet merged commit d0c9004 into main Jun 22, 2026
3 checks passed
@max-tet max-tet deleted the feat/disk-usage-alert branch June 22, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants