|
| 1 | +# SQLite Worker Executor Spec Review |
| 2 | + |
| 3 | +Source spec: `.agent/specs/sqlite-worker-executor-read-pool.md` |
| 4 | + |
| 5 | +## Critical |
| 6 | + |
| 7 | +1. **Reader handles must be recycled after writer commits.** |
| 8 | + - Current spec only calls out recycling after schema-changing writes. |
| 9 | + - Risk: SQLite readonly connections have their own pager/schema caches. Sharing the VFS `moka` page cache does not prove reused reader handles observe committed writer changes. |
| 10 | + - Needed decision: require closing/reopening all reader SQLite handles after every writer commit unless a tested SQLite lock/change-counter design proves reuse is safe. |
| 11 | + |
| 12 | +2. **Idle readwrite connection semantics are unsafe/unclear.** |
| 13 | + - Current spec implies a long-lived writer worker owns a readwrite `sqlite3*`. |
| 14 | + - Risk: current code uses `PRAGMA locking_mode = EXCLUSIVE` and VFS lock callbacks are no-ops. An idle open writer connection plus reader connections may violate SQLite/VFS assumptions. |
| 15 | + - Needed decision: close writer handle between jobs when read pooling is enabled, make idle writer block readers, or implement a real lock ladder. |
| 16 | + |
| 17 | +3. **Reader VFS reads must not see the dirty write buffer.** |
| 18 | + - Current spec says readers must not mutate the dirty buffer, but does not explicitly forbid reading it. |
| 19 | + - Risk: current VFS resolves `write_buffer.dirty` before committed cache pages. A reader overlapping a dirty writer path could observe uncommitted pages. |
| 20 | + - Needed decision: role-aware read resolution. Reader handles read committed state only; writer handles may read dirty buffer. |
| 21 | + |
| 22 | +4. **VFS unregister after close timeout can be unsafe.** |
| 23 | + - Current spec says active jobs finish or hit a bounded close timeout, then workers close and VFS unregisters. |
| 24 | + - Risk: without `sqlite3_interrupt`, timed-out workers may still be inside SQLite/VFS. Dropping or unregistering the VFS would risk use-after-unregister. |
| 25 | + - Needed decision: close timeout may return an error, but the VFS must remain alive until every worker thread has actually exited, or the design must intentionally leak/retain the VFS and mark it dead. |
| 26 | + |
| 27 | +5. **`SELECT` is not enough for reader routing.** |
| 28 | + - Current spec treats parser-proven `SELECT` as reader-candidate. |
| 29 | + - Risk: read-only SQL can be connection-affine. Examples: `last_insert_rowid()`, `changes()`, `total_changes()`, temp schema reads, or other session-state queries. |
| 30 | + - Needed decision: reject connection-affine functions and temp schema reads from the reader allowlist, or route all connection-affine/session-state SQL to writer. |
| 31 | + |
| 32 | +## High |
| 33 | + |
| 34 | +6. **VFS role tagging needs a concrete mechanism.** |
| 35 | + - Current spec says reader-owned and writer-owned file handles, but not how callbacks know the role. |
| 36 | + - Risk: role checks are aspirational unless `VfsFile` stores role and callbacks enforce it. |
| 37 | + - Needed decision: store role on `VfsFile` at `xOpen`, derived from open flags plus coordinator-issued capability/epoch, and check it in read/write/file-control/delete/truncate/sync paths. |
| 38 | + |
| 39 | +7. **Aux/temp file behavior is underspecified.** |
| 40 | + - Current spec shares the aux-file registry and says reader aux creation should fail if it implies writes. |
| 41 | + - Risk: readonly SELECTs can still require temp storage. Shared aux files are mutable VFS state. |
| 42 | + - Needed decision: require `PRAGMA temp_store = MEMORY` on readers and fail all reader aux writes, or make aux files per-connection/private. |
| 43 | + |
| 44 | +8. **Manual transaction routing wording can deadlock.** |
| 45 | + - Current spec says reads under writer pressure/manual transaction may “wait or route writer,” then later says manual transaction routes all work to writer. |
| 46 | + - Risk: `BEGIN; INSERT; SELECT; COMMIT` can block itself if the SELECT waits behind the transaction. |
| 47 | + - Needed decision: manual transaction pins all work to writer until autocommit returns. No waiting/read-lane routing inside the manual transaction. |
| 48 | + |
| 49 | +9. **Queued-read behavior under writer pressure is too loose.** |
| 50 | + - Current spec says pending writer stops new reader admission. |
| 51 | + - Risk: reads already in `pending_reads` could still drain ahead of a writer. |
| 52 | + - Needed decision: once `pending_writer_count > 0`, no queued read dispatches to reader workers until the writer completes. |
| 53 | + |
| 54 | +10. **Backpressure needs exact bounded-queue semantics.** |
| 55 | + - Current spec says bounded command queue and queue-full returns `actor.overloaded`. |
| 56 | + - Risk: awaiting channel capacity creates hidden backpressure instead of explicit overload. |
| 57 | + - Needed decision: public sends use `try_reserve`/`try_send`; internal pending queues and worker queues are bounded; full public or worker queues return `actor.overloaded`. |
| 58 | + |
| 59 | +11. **Cancellation must update coordinator state.** |
| 60 | + - Current spec says queued requests with dropped receivers can be dropped before dispatch. |
| 61 | + - Risk: cancelling a queued writer may leave `pending_writer_count` elevated, freezing readers. |
| 62 | + - Needed decision: cancellation decrements writer/read counters, frees queue capacity, wakes coordinator, and recomputes writer pressure. |
| 63 | + |
| 64 | +12. **Prepare errors must not become classifier mismatches.** |
| 65 | + - Current spec could classify all reader-worker classification disagreement as internal mismatch. |
| 66 | + - Risk: `sqlparser` can parse SQL that SQLite later rejects. That is a user SQL error, not an internal parser bug. |
| 67 | + - Needed decision: mismatch only when SQLite prepare succeeds but tail/readonly/authorizer says non-reader-eligible. Prepare errors remain normal SQLite/user errors. |
| 68 | + |
| 69 | +13. **Writer-pressure routing needs exact state rules.** |
| 70 | + - Current spec says “wait or route writer according to current semantics.” |
| 71 | + - Risk: ambiguous ordering and hidden behavior changes. |
| 72 | + - Needed decision: specify behavior for pending writer, active writer, idle writer, and manual transaction separately. |
| 73 | + |
| 74 | +14. **Migration from `NativeConnectionManager` needs a compatibility checklist.** |
| 75 | + - Current spec says keep old manager as fallback but not how `NativeDatabaseHandle` switches. |
| 76 | + - Risk: clone semantics, idempotent close, `take_last_kv_error`, preload hints, metrics, initialization, and test-only snapshots can regress. |
| 77 | + - Needed decision: require a backend enum/trait plus compatibility tests for old manager versus worker executor. |
| 78 | + |
| 79 | +## Medium |
| 80 | + |
| 81 | +15. **The `sqlparser` allowlist needs exact AST rules.** |
| 82 | + - Current spec says `Statement::Query`/SELECT in prose. |
| 83 | + - Risk: accepting broad query AST nodes admits unsupported or connection-affine SQL. |
| 84 | + - Needed decision: define exact accepted `sqlparser` AST variants, recursive checks for CTEs/subqueries/set ops/table factors/expressions, and route unknown AST nodes to writer. |
| 85 | + |
| 86 | +16. **Mismatch/error metadata is too thin.** |
| 87 | + - Current spec metadata has only parser route, readonly, tail, and authorizer write flag. |
| 88 | + - Risk: hard to distinguish parser drift, user SQL error, VFS role bug, readonly open failure, query-only failure, or authorizer denial. |
| 89 | + - Needed decision: include prepare status, SQLite code/message class, denied authorizer action, failure phase, and route decision metadata. |
| 90 | + |
| 91 | +17. **Rollout flag matrix is underspecified.** |
| 92 | + - Current spec adds `RIVETKIT_SQLITE_OPT_WORKER_EXECUTOR`, plus existing read-pool flags. |
| 93 | + - Risk: `worker_executor=1` with `read_pool_enabled=false` or `max_readers=0` could accidentally use legacy path or error. |
| 94 | + - Needed decision: define precedence. Likely: worker executor enabled with read pool disabled means writer-only worker executor. |
| 95 | + |
| 96 | +18. **Default-on gate is too broad.** |
| 97 | + - Current spec says native driver tests, depot-client VFS tests, wasm/remote tests, parity and stress tests. |
| 98 | + - Risk: important failure modes are not named as release blockers. |
| 99 | + - Needed decision: add explicit gates for fault/chaos tests, close during active read/write, close with queued work, close timeout behavior, schema change plus reader reuse/recycle, worker panic/channel close, metrics parity, and wasm dependency checks. |
| 100 | + |
| 101 | +19. **Worker thread implementation choice is still open.** |
| 102 | + - Current spec leaves OS threads versus stable `spawn_blocking` loops open. |
| 103 | + - Risk: cancellation, joining, panic handling, and runtime shutdown behavior differ. |
| 104 | + - Needed decision: pick one for v1 or define acceptance criteria for either. |
| 105 | + |
| 106 | +20. **Read-only PRAGMAs need a later allowlist decision.** |
| 107 | + - Current spec routes PRAGMAs to writer in v1. |
| 108 | + - Risk: conservative but may leave performance on table. |
| 109 | + - Needed decision: keep writer-only in v1; add a follow-up only after parser/SQLite/connection-affinity behavior is tested. |
| 110 | + |
| 111 | +21. **Cache invalidation on truncate needs explicit tests.** |
| 112 | + - Current spec asks whether `moka` invalidation on truncate is sufficient. |
| 113 | + - Risk: stale pages after truncate or file-size changes. |
| 114 | + - Needed decision: add truncate tests covering reader recycle and page-cache invalidation after writer work. |
| 115 | + |
| 116 | +22. **Reader open must be proven network-free.** |
| 117 | + - Current design assumes fresh reader opens are cheap because they reuse the shared VFS and cache. |
| 118 | + - Risk: SQLite open/setup could accidentally trigger depot/envoy transport through page fetch, schema preload, or VFS bootstrap behavior. |
| 119 | + - Needed decision: add a debug/test-only VFS no-network guard around readonly reader open and setup. Any `get_pages` or commit transport during that region should panic in tests or return an internal assertion error in debug builds. |
0 commit comments