Skip to content

Commit 880710d

Browse files
committed
docs: correct COMMIT_STRATEGY warm-start claims to match the seeding behavior
The doc claimed a pkl cache miss 'just falls back to a cold LSP pass' with the PR 'still diffing correctly' — the assumption the incremental-fallback bug hid behind. The engine refuses cluster-driven incremental without the pkl's cluster baseline; document the seed step that now guarantees it.
1 parent 0705dae commit 880710d

1 file changed

Lines changed: 7 additions & 2 deletions

File tree

docs/COMMIT_STRATEGY.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The engine writes these under `.codeboarding/`:
2121
-`health/health_report.json` — required for warnings in the extension/webview. Small text.
2222

2323
**Do NOT commit (binary, bloat):**
24-
-`static_analysis.pkl` — binary, MB-scale, noisy diffs, repo bloat. It is a *rebuildable speed cache*, not display data. Keep it in **`actions/cache` keyed by the base SHA** (or a backend). A cache miss just falls back to a cold (full) LSP pass — slower but correct, and the committed `analysis.json` still drives the diagram.
24+
-`static_analysis.pkl` — binary, MB-scale, noisy diffs, repo bloat. It is a *rebuildable speed cache*, not display data. Keep it in **`actions/cache` keyed by the base SHA** (or a backend). On a cache miss the review action **seeds it deterministically** (LSP + clustering, no LLM calls) — the pkl is not optional for incremental: the engine refuses to run incrementally without the cluster baseline stored inside it.
2525
- `static_analysis.sha` — commit **only** if the pkl is kept reachable (cache/backend); on its own it's harmless but unused.
2626

2727
> **Principle:** version-control the *source-of-truth display data* (text, small); *cache* the *rebuildable speed artifacts* (binary, large). This is exactly what keeps the repo clean — the thing that bloats (`.pkl`) never enters git.
@@ -40,7 +40,12 @@ The engine writes these under `.codeboarding/`:
4040

4141
## Warm-start tradeoff (the `.pkl`)
4242

43-
The warm-start needs the pkl **and** its `.sha`. When the review action has to generate a base analysis, it saves that generated base artifact directory in `actions/cache` keyed by base SHA / depth / engine ref, then seeds the head analysis from that directory. When a committed `analysis.json` already exists but no matching cache exists, the PR still diffs correctly but may run a cold LSP pass. This keeps the repo clean; the cache improves speed but is not required for correctness.
43+
The warm-start — and the engine's incremental path itself — needs the pkl **and** its `.sha`: the cluster baseline that drives incremental lives only inside the pkl, so a committed `analysis.json` alone forces the head run into a full (LLM) fallback. The review action therefore guarantees the pair exists for the base SHA:
44+
45+
- **No committed baseline:** the generated base analysis writes the pkl as a side effect; the artifact dir is saved in `actions/cache` keyed by base SHA / depth / engine ref.
46+
- **Committed baseline, cache miss:** the action *seeds* the pkl deterministically (`cb_engine.py seed`: LSP indexing + the same clustering call a full run makes — **no LLM calls**), then saves it to the same cache. Seeding is fail-open: if it fails, the head run falls back to a full analysis.
47+
48+
Either way the head analysis is seeded from that directory and runs incrementally. This keeps the repo clean — the pkl never enters git — while the cache + seeding make incremental work from the first PR run.
4449

4550
## Summary
4651

0 commit comments

Comments
 (0)