Skip to content

Commit 6fb43d3

Browse files
authored
docs: snapshot & parallel-worktree audit (omnigraph comparison) (#999)
* docs: snapshot & parallel-worktree incremental-build audit Research pass comparing codegraph to omnigraph.dev. Documents three filed correctness bugs (#995 snapshot TOCTOU, #996 unlocked journal appends, #997 watcher never updates journal header) and records why a content-addressed parse cache was considered and rejected: parsing is ~10% of full-build time, so the cache would save ~10% on first- build-in-new-worktree and near-zero on incremental/watch. Also corrects the initial "structural failure in multi-worktree incremental updates" framing — each linked worktree gets its own DB (verified), and incremental builds within a worktree work correctly. The real gap is that first-build-in-new-worktree doesn't amortize work other worktrees already did, which is an optimization opportunity, not a structural failure. * docs: renumber recommendations to close R2 gap (#999) * docs: anchor enrich-context.sh reference in §2.4 (#999)
1 parent b075f5a commit 6fb43d3

1 file changed

Lines changed: 192 additions & 0 deletions

File tree

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# Snapshot & parallel-worktree incremental-build audit
2+
3+
**Date:** 2026-04-21
4+
**Framing:** comparison pass against [omnigraph.dev](https://www.omnigraph.dev/), focused on our snapshot design and incremental-build behavior when multiple Claude Code worktrees run against the same repo in parallel.
5+
**Outcome:** 3 correctness bugs filed (#995, #996, #997). One proposed optimization (content-addressed parse cache) evaluated and rejected against the current performance profile. No architectural rewrite justified.
6+
7+
---
8+
9+
## 0. TL;DR
10+
11+
- **Each worktree does get its own `.codegraph/graph.db`**`git rev-parse --show-toplevel` returns the *worktree* path in linked worktrees. Verified locally. So "two worktrees corrupt each other's DB" is not a real risk.
12+
- **The framing of "structural failure in incremental updates" was overblown.** Incremental builds within a single worktree work correctly. What's actually imperfect: concurrent writers to the same worktree's journal/snapshot files, and first-build-in-new-worktree doesn't amortize work other worktrees already did.
13+
- **3 real bugs found and filed** — all about non-atomic file operations:
14+
- #995 — snapshot save TOCTOU race
15+
- #996 — journal append isn't locked
16+
- #997 — journal header lags appended entries during watch → silent performance cliff
17+
- **Omnigraph's "commits are snapshots, branches are references" model does not fit us** — it couples graph freshness to git commits, which breaks the "graph tracks your working tree live" invariant that is codegraph's whole point.
18+
- **Content-addressed parse cache (R1 in §4)** was the most promising omnigraph-inspired optimization, but parsing is ~10% of build time. A cache would save ~10% on first-build-in-new-worktree and near-zero on incremental/watch. **Not worth the engineering cost.** Rejected.
19+
- **Honest scope limit:** omnigraph's public docs are thin on internals. Claims about their concurrency model and merge semantics come from marketing pages, not source. Calibrate the "what to learn" findings accordingly.
20+
21+
---
22+
23+
## 1. What I verified vs. what I inferred
24+
25+
| Claim | Source | Confidence |
26+
|---|---|---|
27+
| Linked worktree gets its own DB | Ran `git rev-parse --show-toplevel` in worktree → returned worktree path | **Verified** |
28+
| Worktree and main each have independent ~30 MB `.codegraph/` dirs | `ls -la` in both locations | **Verified** |
29+
| Advisory lock warns but never blocks | `src/db/connection.ts:112-130` — reads PID, calls `warn()`, writes own PID, proceeds | **Verified** |
30+
| Snapshots are `VACUUM INTO` to `.codegraph/snapshots/<name>.db` | `src/features/snapshot.ts:18-60` | **Verified** |
31+
| Snapshots have no git identity | Name regex `^[a-zA-Z0-9_-]+$`; no git calls in the file | **Verified** |
32+
| Journal header + append are separate writes, with no coordination | `src/domain/graph/journal.ts:62-83` vs `85-105` | **Verified** |
33+
| Parsing is ~10% of full-build time | User-provided benchmark | **Verified (external)** |
34+
| Omnigraph uses Lance + Arrow + DataFusion + Cedar | Omnigraph landing page | **Per marketing** |
35+
| Omnigraph branches are copy-on-write, "no locks" | Omnigraph landing page | **Per marketing — no source dive** |
36+
| Omnigraph merge conflict semantics || **Not documented publicly** |
37+
38+
---
39+
40+
## 2. What's actually in our code
41+
42+
### 2.1 Snapshot (`src/features/snapshot.ts`)
43+
44+
```text
45+
snapshotSave(name) → VACUUM INTO .codegraph/snapshots/<name>.db (lines 27-61)
46+
snapshotRestore(name) → copyFileSync over graph.db + wipe WAL/SHM (lines 67-88)
47+
snapshotList/Delete → flat dir listing (lines 97-131)
48+
```
49+
50+
Real correctness issue: `existsSync``unlinkSync``VACUUM INTO` is a three-step TOCTOU race. Filed as **#995**.
51+
52+
Separately, snapshot name is free-form — no HEAD/branch/dirty-state binding. Two worktrees running `snapshot save main` from different commits silently overwrite each other with `--force`. This is a UX gap, not a correctness bug.
53+
54+
### 2.2 Change journal (`src/domain/graph/journal.ts`)
55+
56+
Format:
57+
```
58+
# codegraph-journal v1 <timestamp>
59+
path/to/changed-file.ts
60+
DELETED path/to/gone.ts
61+
```
62+
63+
Two real issues:
64+
65+
- **#996**`appendJournalEntries()` uses unlocked `fs.appendFileSync`. Concurrent writers (e.g. watcher + manual build) can interleave lines, producing corrupt entries.
66+
- **#997** — Watcher appends entries but never updates the header timestamp; only builder's finalize does. Result: after a watch session, journal entries are newer than the header, Tier 0 journal-check bails, next build falls through to the expensive Tier 1/2 scan. Silent performance cliff.
67+
68+
### 2.3 Advisory "lock" (`src/db/connection.ts:112-130`)
69+
70+
```ts
71+
if (pid && pid !== process.pid && isProcessAlive(pid)) {
72+
warn(`Another process (PID ${pid}) may be using this database. Proceeding with caution.`);
73+
}
74+
// ... then writes own PID anyway
75+
```
76+
77+
Not a lock — a log line. SQLite's WAL `busy_timeout = 5000` (`connection.ts:169`) provides actual DB-level serialization. The informational lock file doesn't cover surrounding file operations (journal, snapshot directory). Discussed in #996.
78+
79+
### 2.4 Per-worktree isolation is correct — but each worktree starts cold
80+
81+
- Every linked worktree has its own `.codegraph/` (verified). DB isolation is fine. Our own context-enrichment hook already relies on this — `.claude/hooks/enrich-context.sh:32-34` derives the DB path from `git rev-parse --show-toplevel`, so each worktree's hook naturally reads that worktree's DB.
82+
- But every new worktree re-parses every file, re-resolves every import, re-hashes, and re-embeds — even though most files are byte-identical to another worktree's parse output.
83+
- For a 3-file branch off main, first build costs the same as indexing from scratch.
84+
- This is an amortization gap, not a structural failure. It is the thing omnigraph's "copy-on-write branches" would address if ported — see §3.
85+
86+
### 2.5 Registry (`src/infrastructure/registry.ts`)
87+
88+
Keyed by `path.basename(absRoot)` with `-2`/`-3` auto-suffix collision handling. Two worktrees register as two unrelated "repos" rather than one repo with two worktrees. UX imperfection, not a bug.
89+
90+
---
91+
92+
## 3. What omnigraph does differently
93+
94+
Sourced from [omnigraph.dev](https://www.omnigraph.dev/) and [ModernRelay/omnigraph README](https://github.com/ModernRelay/omnigraph). Deep internals not publicly documented; quotations below are from their marketing pages.
95+
96+
### 3.1 Storage model
97+
98+
- **Lance** immutable versioned columnar format — every write produces a new version, old versions stay addressable.
99+
- **Arrow** for in-memory columnar execution.
100+
- **DataFusion** for query planning.
101+
- Net effect: snapshots, versioning, time-travel are the storage format, not features layered on top.
102+
103+
### 3.2 Git-style semantics
104+
105+
```bash
106+
omnigraph branch create --from main feature-x ./repo.omni
107+
omnigraph branch merge feature-x --into main ./repo.omni
108+
```
109+
110+
- Branches are references to versions, not disk copies ("copy-on-write branching eliminates locks," per marketing — mechanism not public).
111+
- Every mutation is a commit.
112+
- Merge exists as a first-class operation (conflict semantics not documented).
113+
114+
### 3.3 Why the model doesn't fit us
115+
116+
Codegraph's core value prop is **"the graph tracks your on-disk working tree in real time, committed or not."** You edit a file, save, and `codegraph audit --quick <target>` reflects the change. You can run `fn-impact` on uncommitted code.
117+
118+
Omnigraph's model couples graph state to commits. To port it faithfully, we'd have to either:
119+
- Auto-commit on every save (bad — pollutes git history), or
120+
- Accept a gap between disk and graph state (defeats dogfooding).
121+
122+
So the full model is off the table. The question became: **can we take a single useful piece** without breaking the live-dev invariant? See next section.
123+
124+
---
125+
126+
## 4. Proposed R1 — and why it's rejected
127+
128+
### The idea
129+
130+
Content-addressed parse cache at `~/.codegraph/cache/symbols/<sha256>.json`:
131+
132+
1. Hash each file.
133+
2. Look up `sha256 + parser_version` in cache.
134+
3. Hit → skip tree-sitter, reuse cached symbols/edges.
135+
4. Miss → parse, extract, store in cache.
136+
137+
Key property: cache key is content hash of on-disk bytes, **not** a commit SHA. This preserves the "graph tracks working tree live" invariant — dirty files cache just as well as committed ones. No git coupling.
138+
139+
### Why I initially ranked it #1
140+
141+
The "first-build-in-new-worktree is cold" problem is real, and tree-sitter parses are content-pure (same bytes → same AST), so the cache is a correct-by-construction optimization with no invalidation complexity.
142+
143+
### Why it's rejected
144+
145+
**Parsing is ~10% of full-build time** (user-provided benchmark).
146+
147+
- First-build-in-new-worktree: cache saves ~10%. Real but modest.
148+
- Incremental builds: cache saves ~0% — incremental already skips unchanged files, so the parses it runs are genuine cache misses too.
149+
- Watch mode: ~0% — same reason.
150+
151+
A content-addressed cache needs: directory layout, hash scheme with parser-version keying, LRU/TTL for size management, cross-platform path handling, test coverage. Call it a few hundred LOC plus ongoing ops concerns. Shipping that for ≤10% on one narrow scenario is a bad trade.
152+
153+
### If first-build-in-new-worktree latency is the actual pain point
154+
155+
The remaining 90% lives in: import resolution (`domain/graph/resolve.ts`, 6-level priority resolver), DB writes (batch size, transaction boundaries, indexes), and embeddings (if enabled, often dominates). Profile one of those and there's a real target. Separate investigation from this audit.
156+
157+
---
158+
159+
## 5. Recommendations
160+
161+
| ID | Action | Status |
162+
|---|---|---|
163+
| **A1** | Fix #995 — snapshot save TOCTOU race. Use `<name>.db.tmp-<pid>` + atomic rename, or a real file lock. | **Do** |
164+
| **A2** | Fix #996 — lock journal mutations (or replace with a per-line append-only log). | **Do** |
165+
| **A3** | Fix #997 — watcher updates journal header in the same critical section as entry appends. | **Do** |
166+
| ~~R1~~ | ~~Content-addressed parse cache.~~ (evaluated and rejected in §4) | **Rejected** — 10% parse share |
167+
| R2 | Bind snapshot names to git identity + metadata + GC. | Nice-to-have. Not urgent. |
168+
| R3 | Share more than parses across worktrees (resolved edges, embeddings) by content hash. | Speculative — conditional on verifying content-purity. Not recommended now. |
169+
| R4 | Replace SQLite-as-materialized-graph with append-only versioned store. | Long-horizon, out of scope. |
170+
171+
---
172+
173+
## 6. Honest gaps
174+
175+
- I did not read omnigraph's source. Their concurrency model, merge semantics, and on-disk layout are guessed from marketing pages.
176+
- The 10% parse-time figure is one data point, not a profile. Embedding-enabled builds will skew differently.
177+
- I did not audit whether resolved edges are content-pure (R3 hinges on this). Probably not — resolution depends on sibling files.
178+
- The 3 filed bugs are individually small. Together they matter because the "multi-worktree + watcher + manual build" workflow is the documented use case (CLAUDE.md "Parallel Sessions"), and that's exactly when the races fire.
179+
180+
---
181+
182+
## Sources
183+
184+
- [Omnigraph landing page](https://www.omnigraph.dev/)
185+
- [Omnigraph docs index](https://www.omnigraph.dev/docs)
186+
- [ModernRelay/omnigraph on GitHub](https://github.com/ModernRelay/omnigraph)
187+
- Filed issues: [#995](https://github.com/optave/ops-codegraph-tool/issues/995), [#996](https://github.com/optave/ops-codegraph-tool/issues/996), [#997](https://github.com/optave/ops-codegraph-tool/issues/997)
188+
- Local code:
189+
- `src/features/snapshot.ts:18-131`
190+
- `src/domain/graph/journal.ts:5-105`
191+
- `src/db/connection.ts:102-141, 160-225, 262-299`
192+
- `.claude/hooks/enrich-context.sh:32-34`

0 commit comments

Comments
 (0)