|
| 1 | +<!-- SPDX-License-Identifier: AGPL-3.0-or-later --> |
| 2 | +# Cruft Audit Playbook |
| 3 | + |
| 4 | +A procedure for systematically removing dead text from prompts (AGENTS.md, playbooks, hook summaries) without trimming the content that's actually doing work. Cruft accumulates over time as transitions happen ("we used to X, now we Y"), workarounds outlive the bugs they worked around, and session-anchored references point at sessions readers can't access. The hard part is distinguishing dead text from prose that anchors a rule, supplies rationale, or teaches a hard-to-articulate skill. |
| 5 | + |
| 6 | +## When to run |
| 7 | + |
| 8 | +- **Periodic, ~quarterly.** Cruft accumulates silently — a calendar trigger surfaces it. |
| 9 | +- **After a transition lands** that deprecates a workaround (e.g., a structural fix replacing a hand-cleanup procedure). Search the playbooks for "until X lands" / "until that ships" referring to the now-shipped X. |
| 10 | +- **When a prompt feels stale.** Subjective signal but real — when reading a playbook produces "wait, is this still right?" more than once, audit it. |
| 11 | +- **When the human says "is there cruft we could remove?"** Common request after a season of fast change. |
| 12 | + |
| 13 | +NOT a substitute for the conceptual-leak audit (`what-if-we-dont-know-what-the-fuck-we-are-talking-about-audit`). Cruft audit asks "is this text dead?". Concept audit asks "is this category coherent?". Different question, different mode. |
| 14 | + |
| 15 | +## The methodology |
| 16 | + |
| 17 | +Two complementary passes, **both** mediated by interactive interview with the human: |
| 18 | + |
| 19 | +### Pass 1: Syntactic grep |
| 20 | + |
| 21 | +Cheap, surfaces obvious candidates: |
| 22 | + |
| 23 | +```bash |
| 24 | +# History markers |
| 25 | +grep -rn -E "\b(deprecated|previously|no longer|originally|formerly|legacy|obsolete)\b" \ |
| 26 | + .agent/agent_playbooks_protocols_sops_skills/ AGENTS.md |
| 27 | + |
| 28 | +# Temporal qualifiers (require word boundaries — many real "now"s exist in prose) |
| 29 | +grep -rn -E "\bnow\b" .agent/agent_playbooks_protocols_sops_skills/ AGENTS.md |
| 30 | + |
| 31 | +# Session-anchored references |
| 32 | +grep -rn -E "this (session|PR|investigation|run)|earlier today" \ |
| 33 | + .agent/agent_playbooks_protocols_sops_skills/ AGENTS.md |
| 34 | + |
| 35 | +# Stale "until X" workarounds |
| 36 | +grep -rn -E "until [^.]*(ships|lands|merges|fixes)" \ |
| 37 | + .agent/agent_playbooks_protocols_sops_skills/ AGENTS.md |
| 38 | + |
| 39 | +# FIXMEs / TODOs in prompt text |
| 40 | +grep -rn -E "\b(FIXME|TODO|XXX)\b" .agent/agent_playbooks_protocols_sops_skills/ AGENTS.md |
| 41 | +``` |
| 42 | + |
| 43 | +Verify referenced things still exist / are still in their claimed state: |
| 44 | +- Files referenced (`docs/X.md`, `scripts/Y`) — `test -f` |
| 45 | +- Tracker items cited as "until <ID> lands" — `scripts/tracker show <ID>` to confirm status |
| 46 | +- Configuration knobs and worker module paths — confirm they exist and are still referenced from the code |
| 47 | + |
| 48 | +### Pass 2: Semantic read |
| 49 | + |
| 50 | +What the grep won't catch — overexplanation, dead anecdotes, hypothetical cases that no longer happen, fallbacks for files that now always exist. Read each playbook section and ask: |
| 51 | + |
| 52 | +1. **Reachability.** In this repo's actual configuration, is this branch / option / fallback ever entered? |
| 53 | +2. **Recoverable referent.** When the prose mentions "this session" / "the user said" / "earlier", does the reader have the content (quote, link, extracted pattern), or is it self-citation pointing at a session they can't access? |
| 54 | +3. **Teaching content.** Does the prose extract a recoverable pattern shape, supply rationale, or describe a hard-to-articulate generative skill? Or is it pure historical record of a specific past event? |
| 55 | +4. **Rule already encoded.** Does the surrounding concrete rule already deliver the lesson this prose is restating? |
| 56 | + |
| 57 | +### Critical: interactive interview mediates both passes |
| 58 | + |
| 59 | +Neither pass alone produces accurate verdicts. The same word is sometimes cruft and sometimes load-bearing — `(deprecated)` next to a tracker status is anchoring a term that still appears in live data; `(deprecated)` next to a feature that's been removed entirely is dead text. You cannot tell from the regex hit alone. |
| 60 | + |
| 61 | +**Do not auto-apply syntactic-pass hits.** Surface them as candidates with full context (file path, line range, **5 lines of surrounding context**, why-flagged tag) and have the human classify each. |
| 62 | + |
| 63 | +## Calibration loop |
| 64 | + |
| 65 | +The taxonomy below is the synthesis of one such session. The session itself was a calibration loop: |
| 66 | + |
| 67 | +1. **Round 1: 5-7 candidates, no opinion from auditor.** The human classifies (`cruft` / `not cruft` / `explain`) and supplies the reasoning when classifying. The auditor updates its mental model. |
| 68 | +2. **Round 2: 5-7 candidates, with auditor opinion + reasoning.** The human confirms or corrects. Auditor refines. |
| 69 | +3. **Round 3 (optional): 5 more, with opinions.** If the hit rate is high (≥4/5 confirmed), proceed. |
| 70 | +4. **Autonomous pass.** Auditor produces the full report applying the calibrated taxonomy. |
| 71 | + |
| 72 | +Two procedural notes from the session: |
| 73 | + |
| 74 | +- **Show 5 lines of surrounding context, not just the candidate.** A flagged sentence in isolation reads ambiguously; in context the verdict is often clear. |
| 75 | +- **Read every word in the flagged area, not just the snippet.** In one round the auditor flagged a workaround for a fixed bug but missed the session-anchored prefix sitting two lines above the flagged content. |
| 76 | +- **One candidate at a time during calibration.** Batches of 6 overwhelmed the human reviewer; one-at-a-time was the workable cadence. |
| 77 | + |
| 78 | +## The calibrated taxonomy |
| 79 | + |
| 80 | +### Cruft (delete) |
| 81 | + |
| 82 | +- **Stale `until X lands` workarounds** where X has shipped. Verify by tracker lookup or by checking whether the bug being worked around can still be reproduced. |
| 83 | +- **Workarounds for problems the tooling now handles automatically.** The skill being taught lives inside the code; the prose is redundant scaffolding. |
| 84 | +- **Session-anchored references with no recoverable referent.** "The user's repeated pushback this session was a signal — every new item costs queue-management overhead." Drop the prefix; keep the lesson. |
| 85 | +- **Pure historical records of specific past events with no extracted teaching shape.** "ADR-0025 and ADR-0026 were originally filed as ADRs in error and have been reclassified" — the renamed artifacts are referenced just above, the rule lives in another doc, the NOTE is pure history. |
| 86 | + |
| 87 | +### Trim (one-word / short-phrase removal, surrounding content fine) |
| 88 | + |
| 89 | +- **Stale temporal qualifiers** ("now", "currently", "still", "recently") that presuppose reader memory of a transition. |
| 90 | + - Example: "(`cycle` now includes reflect automatically; use `--skip-reflect` for fast iteration only.)" — drop "now"; the rest is current API affordance. |
| 91 | +- **Session-anchor prefixes/parentheticals** when the surrounding sentence already extracts the recoverable pattern. |
| 92 | + - Example: "The most expensive mistake of this session was a hand-rolled `KNOWN_LANGS` set that omitted `jsonnet` and `rst`, producing 3,000+ false-flag invalid-language nodes." — the recoverable shape (KNOWN_LANGS / jsonnet+rst / 3,000+) is the teaching; the "of this session" qualifier is dead. Replace with de-anchored framing. |
| 93 | +- **Truism reminders** that are redundant with concrete surrounding rules. |
| 94 | + - Example: a "Diminishing returns are real." sentence inside a budget rule that already says "good enough after 3 rounds, stop at 20 minutes." |
| 95 | + - Test: removing the truism, would the reader lose information? If no, trim. Truisms ARE valuable in isolation — they are trim candidates only when the concrete encoding is already present. |
| 96 | + |
| 97 | +### Not cruft (keep) |
| 98 | + |
| 99 | +- **Worked-example anchors** that survive future state changes. "first UAT, do not modify" remains accurate even after subsequent UAT campaigns ship. |
| 100 | +- **Rationale or consequences beyond the rule.** "The full transcript lives on disk; you can search it freely; you never have to re-run the command" — repeats the rule reductively, but supplies *why* it matters and what an agent gains. |
| 101 | +- **Concrete-situation illustrations.** Heuristic checklists where every bullet resolves to the same action are doing pattern-recognition work, not redundancy. "Will the output fit on one screen? If no, **redirect to a file**." The bullets prime the agent to recognize entry-shapes. |
| 102 | +- **Live-data anchors.** Terms that still appear in extant tracker items, archives, or code — even when the term is "deprecated" in policy. The reader will encounter the term and need a referent. |
| 103 | +- **Generative/teaching prose** for hard-to-articulate skills. "If `docs/blind-spots.md` does not yet exist, take 5 minutes to consider what the new frame *almost* assumes — what edge cases or alternative shapes the new structure makes harder to express." Even when the file currently exists, the fallback teaches *how* to do the activity de novo. |
| 104 | +- **Fallbacks** for templatization / fork / fresh-clone scenarios — reachable under plausible future configurations. |
| 105 | +- **Borderline cases default to keep.** When cost is small (one phrase) and the case could go either way, the default is keep. The audit revisits next cycle. |
| 106 | + |
| 107 | +### Doc consistency issues (separate category — only flag when likely to derail) |
| 108 | + |
| 109 | +A numbering mismatch (overview says "seven phases", body has eight) is a doc bug, not cruft. Only flag if (a) the inconsistency would meaningfully derail an agent reading the doc AND (b) the fix has bounded ripple effects. Cost-benefit usually doesn't favor making it a finding. |
| 110 | + |
| 111 | +## What this audit does NOT cover |
| 112 | + |
| 113 | +- **Conceptual leaks** — single field smuggling unrelated information. That's the fundamental-concept-audit. |
| 114 | +- **Stale rules in code/tooling** that the prose accurately describes — those are tooling work items, not prompt cruft. |
| 115 | +- **Coverage gaps** in the playbook system — that's a different audit ("are we missing a playbook for X?"). |
| 116 | + |
| 117 | +## Output format |
| 118 | + |
| 119 | +Produce an audit report (do not auto-apply). The report file lives at `~/hypergumbo_lab_notebook/cruft_audit_<MMDDYYYY_HHMM>.md`. Structure: |
| 120 | + |
| 121 | +1. **Summary.** Coverage scope, count by verdict (cruft / trim / borderline / consistency). |
| 122 | +2. **Methodology.** Brief restatement of taxonomy applied. |
| 123 | +3. **Findings.** Per-finding sections with: |
| 124 | + - File path and line range |
| 125 | + - Current text (verbatim, with surrounding context) |
| 126 | + - Why this is the verdict it got |
| 127 | + - Action (delete / specific replacement / no change) |
| 128 | + - "After" text where useful |
| 129 | +4. **Items considered and rejected.** Representative not-cruft cases with the reasoning. Transparency about why these did NOT make the report — protects against over-trimming on future runs. |
| 130 | +5. **Aggregate scale.** Total lines audited / lines changed. |
| 131 | +6. **Apply / hold.** The human decides whether to apply. |
| 132 | + |
| 133 | +When the human approves application, ship as a single PR touching only the playbooks involved. `.agent/**` is governance, so a governance approval is required for the apply step (the audit itself is read-only and does not need governance approval). |
| 134 | + |
| 135 | +## Anti-patterns |
| 136 | + |
| 137 | +- **Auto-applying syntactic-pass hits.** Same word is sometimes cruft, sometimes load-bearing. The grep surfaces candidates; only the interactive interview produces verdicts. |
| 138 | +- **Skipping the calibration rounds.** Going straight to autonomous mode produces over-aggressive trimming. The first session used six examples to discover that "could be terser" is the wrong bar. |
| 139 | +- **Reductive logical interpretation.** "These four bullets all resolve to the same action — they're redundant." In a prompt, redundancy that anchors pattern recognition is doing work the reductive read misses. |
| 140 | +- **Trimming worked examples to abstract them.** "Specific repo names, dates, and PR numbers will rot" — true, but they're also what makes the rule recognizable. Strip the session anchor that points at unrecoverable context; keep the concrete shape. |
| 141 | +- **Failing to verify referenced tracker IDs / file paths.** "Until WI-X lands" might be cruft if WI-X is closed, or load-bearing if it's open. You can't tell without checking. |
0 commit comments