Branch: spike/sqlite-single-file. Companion to SPIKE-SQLITE-GOAL.md.
This is the phased path from today's two-native-binding architecture to the zero-dep, one-file end state. Each phase is independently reviewable and leaves the tree green. The spike (this branch) has executed Phase 0 and the load-bearing slice of Phase 1.
Files added (storage package only — nothing else touched):
packages/storage/src/sqlite-adapter.ts—SqliteStore, the representative slice ofIGraphStore+ITemporalStoreover onenode:sqlitefile.packages/storage/src/sqlite-adapter.test.ts— twonode:testcases, both green.
Verification run (reproduce):
npx tsc -b packages/storage/tsconfig.json # 0 errors
node --test --experimental-sqlite \
./packages/storage/dist/sqlite-adapter.test.js # 2 pass, 0 failProven: graph round-trip from one on-disk file, exact-f32 embedding round-trip +
cosine ranking, recursive-CTE traversal (impact up / blast-radius down,
depth-bounded, path-tracked), WAL engaged on a real file, no .lbug/.duckdb
sidecars.
Note: tests run with --experimental-sqlite. On Node 24.17 node:sqlite is
behind that flag; Phase 1 must confirm the flag-free version on our shipping
Node (or set the flag in the CLI shebang / bin wrapper). This is the one
runtime assumption to nail down before committing to the migration.
GraphNode is a 37-member discriminated union. The lbug adapter uses a wide
polymorphic column set (NODE_COLUMNS). The spike instead uses one nodes
table: typed columns for the universal base (id, kind, name, file_path, start_line, end_line) plus a payload JSON-overflow column carrying the
kind-specific fields, rehydrated on read.
- Pro: trivial schema, no per-kind migration, new node kinds need no DDL.
- Con to validate: kind-filtered finders (
listNodesByKind,listDependencies,listRoutes,listFindings) must filter onkind+ occasionally reach into JSON (payload->>'$.ecosystem'). SQLite has good JSON operators, but the conformance/graphHashparity suite is the real judge — Phase 2 runs it.
Edges are one polymorphic edges table keyed by the (from,to,type,step) dedup
tuple, mirroring KnowledgeGraph's edgeDedupKey.
Prove node:sqlite can do graph + vectors + temporal in one WAL file behind the
existing interface seam. Output: the adapter + tests above.
Fill in every method the spike stubbed (NotImplementedError today):
- Graph finders:
listNodesByKind,listEdges,listEdgesByType,listFindings,listDependencies,listRoutes,getRepoNode,listNodesByName,listNodesByEntryPoint,countNodesByKind,countEdgesByType,listConsumerProducerEdges,search(BM25 — use SQLite FTS5, built in),traverseAncestors/traverseDescendants,setMeta,listEmbeddingHashes. - Temporal:
exec(the--sqlescape hatch — portsql-guard.ts/cypher-guardread-only enforcement),bulkLoadCochanges+ lookups,bulkLoadSymbolSummaries- lookups,
countSymbolSummaries.
- lookups,
- Honor the sentinel coercions (step-0 drop, empty
languageStats→NULL, Repo nullablenullnotundefined, deadness underscore↔hyphen) — required forgraphHashparity (seecolumn-encode.ts,interface.ts:24-62). - Pin down the
--experimental-sqliteflag question (above).
Exit: SqliteStore implements both interfaces with no stubs; unit tests per
method.
Run assertIGraphStoreConformance (@opencodehub/storage/test-utils) against
SqliteStore. This is the byte-identical graphHash round-trip the lbug adapter
passes. If the generic-node-table design loses any field or ordering, it fails
here. Fix until green. This phase is the real go/no-go on the design.
openStore(packages/storage/src/index.ts): return oneSqliteStoreinstance as bothgraphandtemporalviews over one<repo>/.codehub/store.sqlite. Delete the two-filecomposeArtifactPathsgraph.lbug/temporal.duckdb split and the ordered-close dance.- The MCP
sqltool exposes a Cypher arg today (routed to lbug). Decide: drop Cypher (SQL-only--sql), or keep a thin Cypher-ish shim. Recommend drop —dialectbecomes"sql"(widenGraphDialectininterface.ts:85), and CLAUDE.md / ADR 0016 get superseded by a new ADR. - Update
open-store.ts,doctor.ts,analyze.tscall sites.
exportEmbeddingsToParquet is DuckDB's one genuinely hard-to-replace feature
(it backs the byte-identical Parquet embeddings sidecar in
pack/embeddings-sidecar.ts). Decided: option (a). SqliteStore
.exportEmbeddingsToParquet() now lazily await import("./duckdb-adapter.js")
inside the method and delegates to a throwaway in-memory DuckDbStore for the
deterministic COPY … (FORMAT PARQUET, COMPRESSION ZSTD). DuckDB is therefore
off the install hot path — only an embeddings-pack invocation loads it;
analyze/query/impact and an embedding-free pack never do. The
pack/embeddings-sidecar.test.ts byte-identity test passes unchanged, and a
direct probe emits a valid PAR1 Parquet file (2 rows, version pinned).
- (b) Write Parquet in JS (
parquet-wasm/ hand-rolled) remains the fast-follow that kills the last native dep entirely. Deferred — it carries its own byte-identical-determinism contract and must not block the install win.
Remove @ladybugdb/core and @duckdb/node-api from all package.json (modulo
Phase-4 option (a)'s lazy DuckDB). Delete graphdb-adapter.ts,
graphdb-pool.ts, graphdb-schema.ts, duckdb-adapter.ts and their tests.
Net deletion should dwarf the addition. Update CHANGELOGs; write the superseding
ADR (0017?: "single-file SQLite storage; supersedes 0016").
On a clean machine / container with only Node 24: npm i -g @opencodehub/cli,
then codehub analyze a sample repo, then codehub query/impact/pack.
Confirm no native build, no Docker, no second process. Update README's install
section to the one-liner. This is the deliverable the whole spike exists for.
| Risk | Likelihood | Mitigation |
|---|---|---|
--experimental-sqlite flag required on shipping Node |
Med | Set flag in bin wrapper; or wait for unflagged (track Node release notes). Resolve in Phase 1. |
Generic node table fails graphHash parity |
Med | Phase 2 is the gate; payload JSON is canonical-sorted already via the existing canonicalJson. Fall back to wider typed columns if a field needs SQL-level filtering. |
| Brute-force KNN too slow on a giant monorepo | Low | sqlite-vec via loadExtension (seam proven). Repo-scale is fine without it. |
| Losing the Parquet sidecar breaks pack determinism | Med | Phase 4 option (a) keeps DuckDB lazily for export only. |
Concurrent writers (parallel analyze) |
Low | WAL gives one-writer/many-reader; OCH indexes single-writer per repo anyway. |
| Phase | State | Evidence |
|---|---|---|
| P0 de-risk | ✅ | spike adapter + 2 tests (commit 3663cd4) |
| "flag" | ✅ | node:sqlite is default-on at Node ≥24.15 — no flag needed to run; added a dependency-free guard that silences the one-shot ExperimentalWarning on stderr (matters for the MCP stdio channel). commit 8ee504b |
| P1 surface | ✅ | full IGraphStore+ITemporalStore, only exportEmbeddingsToParquet was stubbed (commit 1f8fbcd) |
| P2 graphHash gate | ✅ GREEN | sqlite-parity.test.ts: small+medium fixtures, all 4 sentinels, every edge kind, 2-store determinism. Verified SQLite is byte-correct against the lbug reference (commit 1f8fbcd) |
| P3 openStore rewire | ✅ | one SqliteStore as both views; 52 call sites unchanged; live analyze→query→impact on one store.sqlite; storage 178/0, mcp 209/0, monorepo tsc clean (commit 806e8e3) |
| P4 Parquet | ✅ option (a) | lazy DuckDB import at pack time only; sidecar test green; PAR1 file emitted |
| P5 rip bindings | ⛔ needs Laith | large irreversible deletion (~3k lines, ADR 0016 supersede) — left as a decision, not done autonomously |
| P6 clean-machine install | ⛔ pending P5 | — |
- bulkLoad ignored
opts.mode— always full-replaced.ingest-sarif(run insideanalyze) callsbulkLoad(graph,{mode:"upsert"})with an empty SARIF graph; the second call'sDELETE FROM nodeswiped the 15 real nodes. Unit + parity tests only exercised single-instance replace-mode, so they were green while the product was broken. Fixed: honormode; stampstore_metafrom actual post-write counts. Lesson: a passing parity test ≠ a working CLI; the analyze→query→impact loop is the real gate. - tsup
removeNodeProtocol:truestrippednode:sqlite→baresqlite, unresolvable at runtime.tscwas clean; only a livecodehub analyzesurfaced it. Fixed withremoveNodeProtocol:false.
- Greenlight P5? Ripping lbug + the graphdb adapters is the irreversible step and the moment ADR 0016 gets superseded. The thesis is fully proven; this is a "do you want to commit the architecture" call, not a technical unknown.
- Latent finding (separate from the spike): the existing
graphdb-roundtrip.test.tsall-kinds test passes only because its TEST-LOCAL rebuild helper re-attachesstep:0; through the PUBLICrebuildFromStoreharness,GraphDbStorebreaks on astep:0edge identically to SQLite, sincegraphHashemits"step":0butlistEdgesdrops it on every adapter. Ingestion only ever emitsstep≥1, so it's latent — but it's a real gap in the conformance contract worth closing (either rejectstep:0at ingest, or makegraphHashdrop it). Your call whether that's in-scope.