Skill bloat: SKILL.md eats ~22k tokens per session — split into core + lazy-loaded references

## Problem

The embedded `internal/app/skill/SKILL.md` is **~22k tokens** (1939 lines). Because Claude Code loads the full skill body on activation, every flow-bound session pays this cost up front — before the user has typed anything.

Measured on a fresh `flow do --here <slug>` bootstrap (cl100k_base, ballpark):

| Source | Tokens |
|---|---|
| flow SKILL.md | ~22,413 |
| Other (CLAUDE.md, project/task briefs, harness reminders, tool boilerplate) | ~12k |
| **Baseline before first user prompt** | **~34k+** |

In practice the user observed the context jumping to ~90k tokens immediately on session start. The skill is the single biggest controllable contributor.

Most of SKILL.md is workflow detail that only fires for specific intents:
- §4.2 add-task interview (~200 lines)
- §4.7 mark done (~94 lines)
- §4.11 scope-creep detection (~88 lines)
- §4.13 playbook run (~140 lines)
- §7 brief format (~110 lines)
- §8 anti-patterns (~92 lines)
- §9 bootstrap contract (~85 lines)
- …and more

A typical session triggers 0–2 of these. The rest is dead weight in context.

## Proposed fix

Split SKILL.md into a small **core** + a `references/` subdir that the model reads on demand (Read tool), following the same pattern superpowers and other skills already use.

### Target layout

```
internal/app/skill/
  SKILL.md                          # core: intent triage, command cheat-sheet,
                                    # §4.10 scoop mode, §11 dispatch, pointer table
  references/
    add-task-interview.md           # §4.2
    add-project.md                  # §4.3
    work-dir.md                     # §6
    waiting.md                      # §4.6
    done.md                         # §4.7
    archive.md                      # §4.8
    weekly-review.md                # §4.9
    scope-creep.md                  # §4.11
    playbooks.md                    # §4.12 + §4.13
    substantive-unrelated.md        # §4.14
    upgrade.md                      # §4.15
    tagging.md                      # §4.16a
    bind-session.md                 # §4.16
    brief-format.md                 # §7
    anti-patterns.md                # §8
    bootstrap-contract.md           # §9
```

Core SKILL.md keeps only what fires every session: intent triage, command cheat-sheet, scoop-mode (4.10 — common), dispatch routing, and a pointer table of the form *"for workflow X, read `references/X.md`"*.

### Code changes

1. `internal/app/skill.go:12` — change `//go:embed skill/SKILL.md` to `//go:embed all:skill` and store the embed as `embed.FS` instead of `[]byte`.
2. `skillInstall` — walk the embedded FS and write each file under `~/.claude/skills/flow/`, preserving the `references/` subdir.
3. `maybeAutoUpgradeSkill` — same walk on refresh. Before writing, remove the existing skill dir so stale single-file installs (and any obsolete reference files) get cleaned up.
4. `internal/app/skill_test.go` — assert `references/` is installed and that auto-upgrade replaces the whole tree, not just `SKILL.md`.

### Expected impact

- Core SKILL.md target: **~5–6k tokens** (down from ~22k).
- **Net save: ~16k tokens per session.**
- References pulled only when their workflow triggers — usually 0–2 reads per session, each cheap relative to the baseline.

### Trade-offs

- Reduced cohesion: workflow detail lives across multiple files. Mitigation: clear pointer table at the top of core SKILL.md so the model knows exactly which reference to load for each intent.
- Auto-upgrade must handle migration from the old single-file install. Detect via missing `references/` dir, remove the old install, write the new tree.
- One extra Read per workflow hit. Negligible vs the 22k baseline.

### Suggested rollout

1. Land the embed/install/upgrade code changes first with the *current* SKILL.md still in one piece (no behavior change, just `embed.FS` plumbing + tests).
2. Then split the content in a follow-up PR so the diff is reviewable as a pure content move.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skill bloat: SKILL.md eats ~22k tokens per session — split into core + lazy-loaded references #39

Problem

Proposed fix

Target layout

Code changes

Expected impact

Trade-offs

Suggested rollout

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Source	Tokens
flow SKILL.md	~22,413
Other (CLAUDE.md, project/task briefs, harness reminders, tool boilerplate)	~12k
Baseline before first user prompt	~34k+

Skill bloat: SKILL.md eats ~22k tokens per session — split into core + lazy-loaded references #39

Description

Problem

Proposed fix

Target layout

Code changes

Expected impact

Trade-offs

Suggested rollout

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions