|
| 1 | +# Snapshot & parallel-worktree incremental-build audit |
| 2 | + |
| 3 | +**Date:** 2026-04-21 |
| 4 | +**Framing:** comparison pass against [omnigraph.dev](https://www.omnigraph.dev/), focused on our snapshot design and incremental-build behavior when multiple Claude Code worktrees run against the same repo in parallel. |
| 5 | +**Outcome:** 3 correctness bugs filed (#995, #996, #997). One proposed optimization (content-addressed parse cache) evaluated and rejected against the current performance profile. No architectural rewrite justified. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## 0. TL;DR |
| 10 | + |
| 11 | +- **Each worktree does get its own `.codegraph/graph.db`** — `git rev-parse --show-toplevel` returns the *worktree* path in linked worktrees. Verified locally. So "two worktrees corrupt each other's DB" is not a real risk. |
| 12 | +- **The framing of "structural failure in incremental updates" was overblown.** Incremental builds within a single worktree work correctly. What's actually imperfect: concurrent writers to the same worktree's journal/snapshot files, and first-build-in-new-worktree doesn't amortize work other worktrees already did. |
| 13 | +- **3 real bugs found and filed** — all about non-atomic file operations: |
| 14 | + - #995 — snapshot save TOCTOU race |
| 15 | + - #996 — journal append isn't locked |
| 16 | + - #997 — journal header lags appended entries during watch → silent performance cliff |
| 17 | +- **Omnigraph's "commits are snapshots, branches are references" model does not fit us** — it couples graph freshness to git commits, which breaks the "graph tracks your working tree live" invariant that is codegraph's whole point. |
| 18 | +- **Content-addressed parse cache (R1 in §4)** was the most promising omnigraph-inspired optimization, but parsing is ~10% of build time. A cache would save ~10% on first-build-in-new-worktree and near-zero on incremental/watch. **Not worth the engineering cost.** Rejected. |
| 19 | +- **Honest scope limit:** omnigraph's public docs are thin on internals. Claims about their concurrency model and merge semantics come from marketing pages, not source. Calibrate the "what to learn" findings accordingly. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## 1. What I verified vs. what I inferred |
| 24 | + |
| 25 | +| Claim | Source | Confidence | |
| 26 | +|---|---|---| |
| 27 | +| Linked worktree gets its own DB | Ran `git rev-parse --show-toplevel` in worktree → returned worktree path | **Verified** | |
| 28 | +| Worktree and main each have independent ~30 MB `.codegraph/` dirs | `ls -la` in both locations | **Verified** | |
| 29 | +| Advisory lock warns but never blocks | `src/db/connection.ts:112-130` — reads PID, calls `warn()`, writes own PID, proceeds | **Verified** | |
| 30 | +| Snapshots are `VACUUM INTO` to `.codegraph/snapshots/<name>.db` | `src/features/snapshot.ts:18-60` | **Verified** | |
| 31 | +| Snapshots have no git identity | Name regex `^[a-zA-Z0-9_-]+$`; no git calls in the file | **Verified** | |
| 32 | +| Journal header + append are separate writes, with no coordination | `src/domain/graph/journal.ts:62-83` vs `85-105` | **Verified** | |
| 33 | +| Parsing is ~10% of full-build time | User-provided benchmark | **Verified (external)** | |
| 34 | +| Omnigraph uses Lance + Arrow + DataFusion + Cedar | Omnigraph landing page | **Per marketing** | |
| 35 | +| Omnigraph branches are copy-on-write, "no locks" | Omnigraph landing page | **Per marketing — no source dive** | |
| 36 | +| Omnigraph merge conflict semantics | — | **Not documented publicly** | |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +## 2. What's actually in our code |
| 41 | + |
| 42 | +### 2.1 Snapshot (`src/features/snapshot.ts`) |
| 43 | + |
| 44 | +```text |
| 45 | +snapshotSave(name) → VACUUM INTO .codegraph/snapshots/<name>.db (lines 27-61) |
| 46 | +snapshotRestore(name) → copyFileSync over graph.db + wipe WAL/SHM (lines 67-88) |
| 47 | +snapshotList/Delete → flat dir listing (lines 97-131) |
| 48 | +``` |
| 49 | + |
| 50 | +Real correctness issue: `existsSync` → `unlinkSync` → `VACUUM INTO` is a three-step TOCTOU race. Filed as **#995**. |
| 51 | + |
| 52 | +Separately, snapshot name is free-form — no HEAD/branch/dirty-state binding. Two worktrees running `snapshot save main` from different commits silently overwrite each other with `--force`. This is a UX gap, not a correctness bug. |
| 53 | + |
| 54 | +### 2.2 Change journal (`src/domain/graph/journal.ts`) |
| 55 | + |
| 56 | +Format: |
| 57 | +``` |
| 58 | +# codegraph-journal v1 <timestamp> |
| 59 | +path/to/changed-file.ts |
| 60 | +DELETED path/to/gone.ts |
| 61 | +``` |
| 62 | + |
| 63 | +Two real issues: |
| 64 | + |
| 65 | +- **#996** — `appendJournalEntries()` uses unlocked `fs.appendFileSync`. Concurrent writers (e.g. watcher + manual build) can interleave lines, producing corrupt entries. |
| 66 | +- **#997** — Watcher appends entries but never updates the header timestamp; only builder's finalize does. Result: after a watch session, journal entries are newer than the header, Tier 0 journal-check bails, next build falls through to the expensive Tier 1/2 scan. Silent performance cliff. |
| 67 | + |
| 68 | +### 2.3 Advisory "lock" (`src/db/connection.ts:112-130`) |
| 69 | + |
| 70 | +```ts |
| 71 | +if (pid && pid !== process.pid && isProcessAlive(pid)) { |
| 72 | + warn(`Another process (PID ${pid}) may be using this database. Proceeding with caution.`); |
| 73 | +} |
| 74 | +// ... then writes own PID anyway |
| 75 | +``` |
| 76 | + |
| 77 | +Not a lock — a log line. SQLite's WAL `busy_timeout = 5000` (`connection.ts:169`) provides actual DB-level serialization. The informational lock file doesn't cover surrounding file operations (journal, snapshot directory). Discussed in #996. |
| 78 | + |
| 79 | +### 2.4 Per-worktree isolation is correct — but each worktree starts cold |
| 80 | + |
| 81 | +- Every linked worktree has its own `.codegraph/` (verified). DB isolation is fine. Our own context-enrichment hook already relies on this — `.claude/hooks/enrich-context.sh:32-34` derives the DB path from `git rev-parse --show-toplevel`, so each worktree's hook naturally reads that worktree's DB. |
| 82 | +- But every new worktree re-parses every file, re-resolves every import, re-hashes, and re-embeds — even though most files are byte-identical to another worktree's parse output. |
| 83 | +- For a 3-file branch off main, first build costs the same as indexing from scratch. |
| 84 | +- This is an amortization gap, not a structural failure. It is the thing omnigraph's "copy-on-write branches" would address if ported — see §3. |
| 85 | + |
| 86 | +### 2.5 Registry (`src/infrastructure/registry.ts`) |
| 87 | + |
| 88 | +Keyed by `path.basename(absRoot)` with `-2`/`-3` auto-suffix collision handling. Two worktrees register as two unrelated "repos" rather than one repo with two worktrees. UX imperfection, not a bug. |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +## 3. What omnigraph does differently |
| 93 | + |
| 94 | +Sourced from [omnigraph.dev](https://www.omnigraph.dev/) and [ModernRelay/omnigraph README](https://github.com/ModernRelay/omnigraph). Deep internals not publicly documented; quotations below are from their marketing pages. |
| 95 | + |
| 96 | +### 3.1 Storage model |
| 97 | + |
| 98 | +- **Lance** immutable versioned columnar format — every write produces a new version, old versions stay addressable. |
| 99 | +- **Arrow** for in-memory columnar execution. |
| 100 | +- **DataFusion** for query planning. |
| 101 | +- Net effect: snapshots, versioning, time-travel are the storage format, not features layered on top. |
| 102 | + |
| 103 | +### 3.2 Git-style semantics |
| 104 | + |
| 105 | +```bash |
| 106 | +omnigraph branch create --from main feature-x ./repo.omni |
| 107 | +omnigraph branch merge feature-x --into main ./repo.omni |
| 108 | +``` |
| 109 | + |
| 110 | +- Branches are references to versions, not disk copies ("copy-on-write branching eliminates locks," per marketing — mechanism not public). |
| 111 | +- Every mutation is a commit. |
| 112 | +- Merge exists as a first-class operation (conflict semantics not documented). |
| 113 | + |
| 114 | +### 3.3 Why the model doesn't fit us |
| 115 | + |
| 116 | +Codegraph's core value prop is **"the graph tracks your on-disk working tree in real time, committed or not."** You edit a file, save, and `codegraph audit --quick <target>` reflects the change. You can run `fn-impact` on uncommitted code. |
| 117 | + |
| 118 | +Omnigraph's model couples graph state to commits. To port it faithfully, we'd have to either: |
| 119 | +- Auto-commit on every save (bad — pollutes git history), or |
| 120 | +- Accept a gap between disk and graph state (defeats dogfooding). |
| 121 | + |
| 122 | +So the full model is off the table. The question became: **can we take a single useful piece** without breaking the live-dev invariant? See next section. |
| 123 | + |
| 124 | +--- |
| 125 | + |
| 126 | +## 4. Proposed R1 — and why it's rejected |
| 127 | + |
| 128 | +### The idea |
| 129 | + |
| 130 | +Content-addressed parse cache at `~/.codegraph/cache/symbols/<sha256>.json`: |
| 131 | + |
| 132 | +1. Hash each file. |
| 133 | +2. Look up `sha256 + parser_version` in cache. |
| 134 | +3. Hit → skip tree-sitter, reuse cached symbols/edges. |
| 135 | +4. Miss → parse, extract, store in cache. |
| 136 | + |
| 137 | +Key property: cache key is content hash of on-disk bytes, **not** a commit SHA. This preserves the "graph tracks working tree live" invariant — dirty files cache just as well as committed ones. No git coupling. |
| 138 | + |
| 139 | +### Why I initially ranked it #1 |
| 140 | + |
| 141 | +The "first-build-in-new-worktree is cold" problem is real, and tree-sitter parses are content-pure (same bytes → same AST), so the cache is a correct-by-construction optimization with no invalidation complexity. |
| 142 | + |
| 143 | +### Why it's rejected |
| 144 | + |
| 145 | +**Parsing is ~10% of full-build time** (user-provided benchmark). |
| 146 | + |
| 147 | +- First-build-in-new-worktree: cache saves ~10%. Real but modest. |
| 148 | +- Incremental builds: cache saves ~0% — incremental already skips unchanged files, so the parses it runs are genuine cache misses too. |
| 149 | +- Watch mode: ~0% — same reason. |
| 150 | + |
| 151 | +A content-addressed cache needs: directory layout, hash scheme with parser-version keying, LRU/TTL for size management, cross-platform path handling, test coverage. Call it a few hundred LOC plus ongoing ops concerns. Shipping that for ≤10% on one narrow scenario is a bad trade. |
| 152 | + |
| 153 | +### If first-build-in-new-worktree latency is the actual pain point |
| 154 | + |
| 155 | +The remaining 90% lives in: import resolution (`domain/graph/resolve.ts`, 6-level priority resolver), DB writes (batch size, transaction boundaries, indexes), and embeddings (if enabled, often dominates). Profile one of those and there's a real target. Separate investigation from this audit. |
| 156 | + |
| 157 | +--- |
| 158 | + |
| 159 | +## 5. Recommendations |
| 160 | + |
| 161 | +| ID | Action | Status | |
| 162 | +|---|---|---| |
| 163 | +| **A1** | Fix #995 — snapshot save TOCTOU race. Use `<name>.db.tmp-<pid>` + atomic rename, or a real file lock. | **Do** | |
| 164 | +| **A2** | Fix #996 — lock journal mutations (or replace with a per-line append-only log). | **Do** | |
| 165 | +| **A3** | Fix #997 — watcher updates journal header in the same critical section as entry appends. | **Do** | |
| 166 | +| ~~R1~~ | ~~Content-addressed parse cache.~~ (evaluated and rejected in §4) | **Rejected** — 10% parse share | |
| 167 | +| R2 | Bind snapshot names to git identity + metadata + GC. | Nice-to-have. Not urgent. | |
| 168 | +| R3 | Share more than parses across worktrees (resolved edges, embeddings) by content hash. | Speculative — conditional on verifying content-purity. Not recommended now. | |
| 169 | +| R4 | Replace SQLite-as-materialized-graph with append-only versioned store. | Long-horizon, out of scope. | |
| 170 | + |
| 171 | +--- |
| 172 | + |
| 173 | +## 6. Honest gaps |
| 174 | + |
| 175 | +- I did not read omnigraph's source. Their concurrency model, merge semantics, and on-disk layout are guessed from marketing pages. |
| 176 | +- The 10% parse-time figure is one data point, not a profile. Embedding-enabled builds will skew differently. |
| 177 | +- I did not audit whether resolved edges are content-pure (R3 hinges on this). Probably not — resolution depends on sibling files. |
| 178 | +- The 3 filed bugs are individually small. Together they matter because the "multi-worktree + watcher + manual build" workflow is the documented use case (CLAUDE.md "Parallel Sessions"), and that's exactly when the races fire. |
| 179 | + |
| 180 | +--- |
| 181 | + |
| 182 | +## Sources |
| 183 | + |
| 184 | +- [Omnigraph landing page](https://www.omnigraph.dev/) |
| 185 | +- [Omnigraph docs index](https://www.omnigraph.dev/docs) |
| 186 | +- [ModernRelay/omnigraph on GitHub](https://github.com/ModernRelay/omnigraph) |
| 187 | +- Filed issues: [#995](https://github.com/optave/ops-codegraph-tool/issues/995), [#996](https://github.com/optave/ops-codegraph-tool/issues/996), [#997](https://github.com/optave/ops-codegraph-tool/issues/997) |
| 188 | +- Local code: |
| 189 | + - `src/features/snapshot.ts:18-131` |
| 190 | + - `src/domain/graph/journal.ts:5-105` |
| 191 | + - `src/db/connection.ts:102-141, 160-225, 262-299` |
| 192 | + - `.claude/hooks/enrich-context.sh:32-34` |
0 commit comments