Skip to content

Commit ae238b6

Browse files
authored
fix(skills/housekeep): surface and clean worktree disk bloat (#1214)
* fix(skills/housekeep): surface and clean worktree disk bloat Worktrees accumulate per-worktree node_modules/target/dist build artifacts (~3GB each) but were invisible to /housekeep because the skill only flagged worktrees as stale by branch/merge criteria. Add: - Phase 1a: always report total .claude/worktrees/ size, flag if >5GB - Phase 1c: detect orphaned dirs (on disk but not in worktree list) and sub-agent agent-<hex> worktrees as stale candidates - Phase 1d: detect bloated active worktrees (>500MB build artifacts) and offer to clean only the four regeneratable paths Updates Phase 7 report to always include the worktree total and adds a rule forbidding deletion of source files in worktrees you don't own. * fix(skills/housekeep): address Greptile review feedback (#1214) - Replace GNU-only sort -h with portable du -sk | sort -n | awk pipeline for macOS BSD sort compatibility (P1) - Require git status --short check on orphaned dirs before rm -rf to prevent data loss on dirs with uncommitted work (P1) - Aggregate bloat detection per-worktree with subtotal + breakdown so the 500MB threshold is directly visible (P2) - Measure only graph.db/graph.db-journal in .codegraph/ (matches what cleanup actually removes) to keep freed-space estimates accurate (P2)
1 parent 0664b19 commit ae238b6

1 file changed

Lines changed: 113 additions & 12 deletions

File tree

.claude/skills/housekeep/SKILL.md

Lines changed: 113 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,42 +27,137 @@ Clean up the local repo: remove stale worktrees, delete dirt/temp files, sync wi
2727
4. Record current git status: `git status --short`
2828
5. Warn the user if there are uncommitted changes — housekeeping works best from a clean state
2929

30-
## Phase 1 — Clean Stale Worktrees
30+
## Phase 1 — Audit & Clean Worktrees
3131

32-
### 1a. List all worktrees
32+
> **Always report disk usage first.** Worktree bloat (per-worktree `node_modules/`, `target/`, `dist/`) is the single largest source of disk waste in this repo — a fresh worktree with `npm install` + a Rust build is ~3GB. Even when no worktree is technically "stale" by branch criteria, the disk footprint must be surfaced so the user can decide what to keep.
33+
34+
### 1a. Total worktree disk usage
35+
36+
Always print this, even on `--dry-run`. Use `du -sk` (kilobytes) so the pipeline is portable across BSD (macOS) and GNU (Linux) — `sort -h` is a GNU coreutils extension and is rejected by stock macOS `sort`.
37+
38+
```bash
39+
du -sh .claude/worktrees 2>/dev/null
40+
# Portable per-worktree sort: kilobytes through sort -n, then format back to human-readable.
41+
du -sk .claude/worktrees/*/ 2>/dev/null | sort -n | awk '{
42+
k=$1; $1=""; sub(/^ /, "");
43+
if (k >= 1048576) printf "%.1fG\t%s\n", k/1048576, $0;
44+
else if (k >= 1024) printf "%.1fM\t%s\n", k/1024, $0;
45+
else printf "%dK\t%s\n", k, $0;
46+
}'
47+
```
48+
49+
If the total exceeds **5GB**, raise it as a finding in the report regardless of whether any individual worktree is stale.
50+
51+
### 1b. List git-tracked worktrees
3352

3453
```bash
3554
git worktree list
3655
```
3756

38-
### 1b. Identify stale worktrees
57+
Cross-reference against `.claude/worktrees/*` on disk — directories there that aren't in `git worktree list` are **orphaned** (prunable). Worktrees in the list whose directory is missing are also prunable.
58+
59+
### 1c. Identify stale worktrees
3960

4061
A worktree is stale if:
41-
- Its directory no longer exists on disk (prunable)
62+
- Its directory no longer exists on disk, OR it exists on disk but is not in `git worktree list` (orphaned)
4263
- It has no uncommitted changes AND its branch has been merged to main
4364
- Its branch has no commits ahead of `origin/main` AND the branch's last commit is more than 7 days old
4465
(check: `git log -1 --format=%ci <branch>``git worktree list` does not expose creation timestamps)
66+
- It matches the sub-agent pattern `.claude/worktrees/agent-<hex>` AND has no uncommitted changes AND its branch has no commits ahead of `origin/main` (sub-agent worktrees are typically ephemeral and orphaned after the agent finishes)
67+
68+
### 1d. Identify bloated worktrees (NEW)
69+
70+
A worktree is **bloated** if it is not stale (so we can't just remove it) but contains regeneratable build artifacts taking significant disk space. Check each non-stale worktree for:
71+
72+
- `node_modules/` (typically ~1.8GB)
73+
- `target/` (Rust build cache, typically ~1.4GB)
74+
- `dist/` (compiled TS output)
75+
- `.codegraph/graph.db*` (rebuildable via `codegraph build`) — measure **only the `graph.db` and `graph.db-journal` files**, not the whole `.codegraph/` directory, because cleanup in §1e only removes those files. Measuring the whole directory would overstate the freed space.
76+
77+
For each worktree, sum the artifact sizes and emit a per-worktree subtotal so the 500MB threshold can be evaluated without manually regrouping flat output. Uses `du -sk` (kilobytes) with `sort -n` for portability — `sort -h` is GNU-only and breaks on stock macOS.
78+
79+
```bash
80+
for wt in .claude/worktrees/*/; do
81+
total_kb=0
82+
breakdown=""
83+
for sub in node_modules target dist; do
84+
if [ -d "$wt$sub" ]; then
85+
sz=$(du -sk "$wt$sub" 2>/dev/null | awk '{print $1}')
86+
[ -n "$sz" ] && total_kb=$((total_kb + sz)) && breakdown="$breakdown $sub=${sz}K"
87+
fi
88+
done
89+
# .codegraph: only measure the two files we will actually remove
90+
for f in "$wt.codegraph/graph.db" "$wt.codegraph/graph.db-journal"; do
91+
if [ -f "$f" ]; then
92+
sz=$(du -sk "$f" 2>/dev/null | awk '{print $1}')
93+
[ -n "$sz" ] && total_kb=$((total_kb + sz)) && breakdown="$breakdown $(basename "$f")=${sz}K"
94+
fi
95+
done
96+
[ "$total_kb" -gt 0 ] && printf "%d\t%s\t%s\n" "$total_kb" "$wt" "$breakdown"
97+
done | sort -n | awk -F'\t' '{
98+
k=$1;
99+
if (k >= 1048576) printf "%.1fG\t%s%s\n", k/1048576, $2, $3;
100+
else if (k >= 1024) printf "%.1fM\t%s%s\n", k/1024, $2, $3;
101+
else printf "%dK\t%s%s\n", k, $2, $3;
102+
}'
103+
```
104+
105+
Flag any worktree whose combined build artifact size exceeds **500MB** (512000 kilobytes).
106+
107+
### 1e. Clean up
108+
109+
**For orphaned directories** (on disk but not in `git worktree list`):
110+
111+
> **Critical: orphaned directories may still contain uncommitted work.** A worktree's git registration can be dropped (failed `git worktree add`, manual `git worktree prune`, etc.) while the user's source edits remain on disk. `rm -rf` on such a directory is permanent data loss.
112+
113+
Before offering removal, run `git -C <path> status --short` to check for uncommitted changes:
45114

46-
Check `.claude/worktrees/` for Claude Code worktrees specifically.
115+
```bash
116+
for dir in $ORPHANED_DIRS; do
117+
if [ -d "$dir/.git" ] || [ -f "$dir/.git" ]; then
118+
changes=$(git -C "$dir" status --short 2>/dev/null)
119+
if [ -n "$changes" ]; then
120+
echo "SKIP $dir — has uncommitted changes:"
121+
echo "$changes" | sed 's/^/ /'
122+
continue
123+
fi
124+
fi
125+
# Safe to offer removal — confirm with user first
126+
echo "ORPHANED (clean): $dir"
127+
done
128+
```
47129

48-
### 1c. Clean up
130+
Only after confirming the directory is clean (no uncommitted changes) AND the user has explicitly approved removal, run `rm -rf <path>`. Then run `git worktree prune` to clear any dangling refs. Apply the same "Never force-remove a worktree with uncommitted changes" rule that protects stale worktrees in `git worktree list` — orphaned dirs get the same guardrail.
49131

50-
For prunable worktrees (missing directory):
132+
**For prunable worktrees** (in list but directory missing):
51133
```bash
52134
git worktree prune
53135
```
54136

55-
For stale worktrees with merged branches:
56-
- List them and **always ask the user for confirmation before removing**, regardless of `--full`
137+
**For stale worktrees with merged branches:**
138+
- List them with their disk size and **always ask the user for confirmation before removing**, regardless of `--full`
57139
- If confirmed:
58140
```bash
59141
git worktree remove <path>
60142
git branch -d <branch> # only if fully merged
61143
```
62144

63-
**If `DRY_RUN`:** Just list what would be removed, don't do it.
145+
**For bloated (non-stale) worktrees:**
146+
- List them with a per-artifact size breakdown
147+
- Ask the user whether to **clean build artifacts only** (keep the source) — these regenerate on the next `npm install` / `cargo build` / `codegraph build`
148+
- If confirmed, for each selected worktree:
149+
```bash
150+
rm -rf <worktree>/node_modules
151+
rm -rf <worktree>/target
152+
rm -rf <worktree>/dist
153+
rm -f <worktree>/.codegraph/graph.db <worktree>/.codegraph/graph.db-journal
154+
```
155+
- **Never run `npm install` / `cargo clean` inside the target worktree** — it may be in use by another Claude Code session
64156

65-
> **Never force-remove** a worktree with uncommitted changes. List it as "has uncommitted work" and skip.
157+
**If `DRY_RUN`:** List everything that would be removed with sizes, don't do it.
158+
159+
> **Never force-remove** a worktree with uncommitted changes. List it as "has uncommitted work" and skip — but still report its disk size so the user knows what it's costing.
160+
> **Never delete source files** in a bloated worktree — only delete the four regeneratable artifact paths above.
66161
67162
## Phase 2 — Delete Dirt Files
68163

@@ -252,7 +347,9 @@ Print a summary to the console (no file needed — this is a local maintenance t
252347
```
253348
=== Housekeeping Report ===
254349
255-
Worktrees: removed 2 stale, 1 has uncommitted work (skipped)
350+
Worktrees: total .claude/worktrees/ size 57G (32 worktrees)
351+
removed 2 stale (4.2G freed), 1 has uncommitted work (skipped)
352+
cleaned build artifacts in 3 active worktrees (9.6G freed)
256353
Dirt files: cleaned 5 temp files (12KB), 1 large untracked flagged
257354
Branches: pruned 3 merged branches, 2 remote refs
258355
Main sync: up to date (or: 4 commits behind — merge suggested)
@@ -264,6 +361,8 @@ Git: OK
264361
Status: CLEAN ✓
265362
```
266363

364+
> **Always include the worktree total** at the top of the Worktrees line, even when no worktrees were removed. This is the metric that surfaces hidden disk bloat — without it, multi-GB worktree accumulations go invisible to the user.
365+
267366
**If `DRY_RUN`:** prefix with `[DRY RUN]` and show what would happen without doing it.
268367

269368
## Rules
@@ -272,6 +371,8 @@ Status: CLEAN ✓
272371
- **Never rebase** — sync with main via merge only (per project rules)
273372
- **Never delete tracked files** — only clean untracked/ignored dirt
274373
- **Never delete worktrees with uncommitted changes** — warn and skip
374+
- **Always report worktree disk usage** — even when nothing is removed, the total must appear in the report. Worktree bloat is the #1 source of disk waste in this repo
375+
- **Bloated-but-active worktrees:** only delete the four regeneratable artifact paths (`node_modules/`, `target/`, `dist/`, `.codegraph/graph.db*`). Never touch source files in a worktree you don't own
275376
- **Ask before deleting large untracked files** — they might be intentional
276377
- **This is a local-only operation** — no pushes, no remote modifications, no PR creation
277378
- **Idempotent** — running twice should be safe (second run finds nothing to clean)

0 commit comments

Comments
 (0)