|
| 1 | +# SQLite Journal Mode (VFS v2) |
| 2 | + |
| 3 | +## Goals |
| 4 | + |
| 5 | +- One SQLite handle per actor, single-writer, no in-process reader-writer concurrency |
| 6 | +- Capture each commit as a single shippable artifact for storage in our KV (DELTA blob in FDB) and for future cold archival (PITR) |
| 7 | +- Avoid maintaining a local SQLite-on-disk file. The KV is the file. |
| 8 | +- Keep the VFS implementation as small as possible |
| 9 | + |
| 10 | +## Decision |
| 11 | + |
| 12 | +Use `journal_mode = DELETE` with `locking_mode = EXCLUSIVE`. |
| 13 | + |
| 14 | +Pragmas (set at open in `rivetkit-typescript/packages/sqlite-native/src/v2/vfs.rs`): |
| 15 | + |
| 16 | +- `PRAGMA journal_mode = DELETE` |
| 17 | +- `PRAGMA locking_mode = EXCLUSIVE` |
| 18 | +- `PRAGMA synchronous = NORMAL` |
| 19 | +- `PRAGMA temp_store = MEMORY` |
| 20 | +- `PRAGMA auto_vacuum = NONE` |
| 21 | +- `PRAGMA page_size = 4096` |
| 22 | + |
| 23 | +Capture dirty pages at xWrite/xSync and assemble one LTX V3 DELTA blob per commit, written atomically to FDB. |
| 24 | + |
| 25 | +We do **not** use `journal_mode = WAL`. We do not implement xShmMap/xShmLock and have no SQLite WAL stream to sniff. |
| 26 | + |
| 27 | +## Evaluation |
| 28 | + |
| 29 | +### journal_mode = WAL (rejected) |
| 30 | + |
| 31 | +WAL mode is the natural choice when SQLite operates against a real local file with concurrent readers and a writer. It is rejected for v2 because: |
| 32 | + |
| 33 | +- The substrate is a remote KV, not a filesystem. There is no `db.sqlite` or `db.sqlite-wal`; both would have to be virtualized over KV. |
| 34 | +- Requires implementing `-shm` (shared-memory WAL index): xShmMap, xShmLock, xShmBarrier, xShmUnmap. Non-trivial over KV; unnecessary for a single-writer-per-actor model. |
| 35 | +- Requires driving SQLite's checkpoint loop. Our SHARD set is already the checkpointed state, so checkpoint has no useful target to write into. |
| 36 | +- Requires WAL recovery on open. Our durability story is "the prior commit either reached FDB atomically or it did not"; takeover handles fencing. There is no half-written WAL state to recover. |
| 37 | +- WAL frames captured via VFS sniffing would still need to be re-packaged into a per-commit page-set blob for dense storage in FDB. The end artifact is identical to today's DELTA blob, reached via a longer path. |
| 38 | +- The headline benefit of WAL mode (concurrent readers + one writer without writer-blocks-readers) is moot because we hold an EXCLUSIVE lock per actor. |
| 39 | + |
| 40 | +The DELTA blob is therefore not a re-implementation of WAL. It is the captured commit page-set from a non-WAL-mode SQLite. Functionally equivalent to WAL frames at a commit boundary with same-page overwrites collapsed, by construction. |
| 41 | + |
| 42 | +### journal_mode = DELETE (chosen) |
| 43 | + |
| 44 | +- No `-shm` file. No checkpoint loop. No WAL recovery semantics to virtualize. |
| 45 | +- xWrite captures dirty pages directly. xSync (or xLock state transitions) marks the commit boundary. |
| 46 | +- Rollback journal is short-lived per transaction; with our VFS it lives in memory only. |
| 47 | +- Maps cleanly to "one DELTA blob per commit, atomic FDB write." |
| 48 | + |
| 49 | +### journal_mode = MEMORY / OFF (rejected) |
| 50 | + |
| 51 | +- MEMORY: rollback journal in RAM. Functionally similar to DELETE for our purposes, but loses crash-safety semantics SQLite depends on for its own recovery during a transaction. Not worth the deviation from defaults given DELETE already costs us nothing extra. |
| 52 | +- OFF: disables the rollback journal entirely. SQLite cannot roll back a failed transaction; partial writes can corrupt the DB. Unacceptable. |
| 53 | + |
| 54 | +### locking_mode = EXCLUSIVE |
| 55 | + |
| 56 | +- One SQLite handle per actor; only the actor's own process writes. |
| 57 | +- Avoids the overhead of repeated SHARED/RESERVED/EXCLUSIVE lock transitions per transaction. |
| 58 | +- Removes the need for any cross-process file locking primitives in the VFS. |
| 59 | + |
| 60 | +## Implications for future work |
| 61 | + |
| 62 | +- **PITR**: The DELTA blob stream is our equivalent of LiteFS Cloud / Cloudflare DO WAL streams. Cold archival can ship DELTAs to object storage with the same semantics they ship WAL frames, without changing journal mode. |
| 63 | +- **A future v3 with a local-file hot tier** would re-open the WAL question. If we ever want microsecond local-disk reads instead of FDB-RTT reads, switching to WAL mode against a real local file becomes the natural choice and Cloudflare's hot-path design is the right reference. |
0 commit comments