Skip to content

Commit 8ffd1e8

Browse files
committed
chore(depot-client): migrate to worker thread
1 parent 6377945 commit 8ffd1e8

201 files changed

Lines changed: 21288 additions & 19926 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# SQLite Worker Executor Spec Review
2+
3+
Source spec: `.agent/specs/sqlite-worker-executor-read-pool.md`
4+
5+
## Critical
6+
7+
1. **Reader handles must be recycled after writer commits.**
8+
- Current spec only calls out recycling after schema-changing writes.
9+
- Risk: SQLite readonly connections have their own pager/schema caches. Sharing the VFS `moka` page cache does not prove reused reader handles observe committed writer changes.
10+
- Needed decision: require closing/reopening all reader SQLite handles after every writer commit unless a tested SQLite lock/change-counter design proves reuse is safe.
11+
12+
2. **Idle readwrite connection semantics are unsafe/unclear.**
13+
- Current spec implies a long-lived writer worker owns a readwrite `sqlite3*`.
14+
- Risk: current code uses `PRAGMA locking_mode = EXCLUSIVE` and VFS lock callbacks are no-ops. An idle open writer connection plus reader connections may violate SQLite/VFS assumptions.
15+
- Needed decision: close writer handle between jobs when read pooling is enabled, make idle writer block readers, or implement a real lock ladder.
16+
17+
3. **Reader VFS reads must not see the dirty write buffer.**
18+
- Current spec says readers must not mutate the dirty buffer, but does not explicitly forbid reading it.
19+
- Risk: current VFS resolves `write_buffer.dirty` before committed cache pages. A reader overlapping a dirty writer path could observe uncommitted pages.
20+
- Needed decision: role-aware read resolution. Reader handles read committed state only; writer handles may read dirty buffer.
21+
22+
4. **VFS unregister after close timeout can be unsafe.**
23+
- Current spec says active jobs finish or hit a bounded close timeout, then workers close and VFS unregisters.
24+
- Risk: without `sqlite3_interrupt`, timed-out workers may still be inside SQLite/VFS. Dropping or unregistering the VFS would risk use-after-unregister.
25+
- Needed decision: close timeout may return an error, but the VFS must remain alive until every worker thread has actually exited, or the design must intentionally leak/retain the VFS and mark it dead.
26+
27+
5. **`SELECT` is not enough for reader routing.**
28+
- Current spec treats parser-proven `SELECT` as reader-candidate.
29+
- Risk: read-only SQL can be connection-affine. Examples: `last_insert_rowid()`, `changes()`, `total_changes()`, temp schema reads, or other session-state queries.
30+
- Needed decision: reject connection-affine functions and temp schema reads from the reader allowlist, or route all connection-affine/session-state SQL to writer.
31+
32+
## High
33+
34+
6. **VFS role tagging needs a concrete mechanism.**
35+
- Current spec says reader-owned and writer-owned file handles, but not how callbacks know the role.
36+
- Risk: role checks are aspirational unless `VfsFile` stores role and callbacks enforce it.
37+
- Needed decision: store role on `VfsFile` at `xOpen`, derived from open flags plus coordinator-issued capability/epoch, and check it in read/write/file-control/delete/truncate/sync paths.
38+
39+
7. **Aux/temp file behavior is underspecified.**
40+
- Current spec shares the aux-file registry and says reader aux creation should fail if it implies writes.
41+
- Risk: readonly SELECTs can still require temp storage. Shared aux files are mutable VFS state.
42+
- Needed decision: require `PRAGMA temp_store = MEMORY` on readers and fail all reader aux writes, or make aux files per-connection/private.
43+
44+
8. **Manual transaction routing wording can deadlock.**
45+
- Current spec says reads under writer pressure/manual transaction may “wait or route writer,” then later says manual transaction routes all work to writer.
46+
- Risk: `BEGIN; INSERT; SELECT; COMMIT` can block itself if the SELECT waits behind the transaction.
47+
- Needed decision: manual transaction pins all work to writer until autocommit returns. No waiting/read-lane routing inside the manual transaction.
48+
49+
9. **Queued-read behavior under writer pressure is too loose.**
50+
- Current spec says pending writer stops new reader admission.
51+
- Risk: reads already in `pending_reads` could still drain ahead of a writer.
52+
- Needed decision: once `pending_writer_count > 0`, no queued read dispatches to reader workers until the writer completes.
53+
54+
10. **Backpressure needs exact bounded-queue semantics.**
55+
- Current spec says bounded command queue and queue-full returns `actor.overloaded`.
56+
- Risk: awaiting channel capacity creates hidden backpressure instead of explicit overload.
57+
- Needed decision: public sends use `try_reserve`/`try_send`; internal pending queues and worker queues are bounded; full public or worker queues return `actor.overloaded`.
58+
59+
11. **Cancellation must update coordinator state.**
60+
- Current spec says queued requests with dropped receivers can be dropped before dispatch.
61+
- Risk: cancelling a queued writer may leave `pending_writer_count` elevated, freezing readers.
62+
- Needed decision: cancellation decrements writer/read counters, frees queue capacity, wakes coordinator, and recomputes writer pressure.
63+
64+
12. **Prepare errors must not become classifier mismatches.**
65+
- Current spec could classify all reader-worker classification disagreement as internal mismatch.
66+
- Risk: `sqlparser` can parse SQL that SQLite later rejects. That is a user SQL error, not an internal parser bug.
67+
- Needed decision: mismatch only when SQLite prepare succeeds but tail/readonly/authorizer says non-reader-eligible. Prepare errors remain normal SQLite/user errors.
68+
69+
13. **Writer-pressure routing needs exact state rules.**
70+
- Current spec says “wait or route writer according to current semantics.”
71+
- Risk: ambiguous ordering and hidden behavior changes.
72+
- Needed decision: specify behavior for pending writer, active writer, idle writer, and manual transaction separately.
73+
74+
14. **Migration from `NativeConnectionManager` needs a compatibility checklist.**
75+
- Current spec says keep old manager as fallback but not how `NativeDatabaseHandle` switches.
76+
- Risk: clone semantics, idempotent close, `take_last_kv_error`, preload hints, metrics, initialization, and test-only snapshots can regress.
77+
- Needed decision: require a backend enum/trait plus compatibility tests for old manager versus worker executor.
78+
79+
## Medium
80+
81+
15. **The `sqlparser` allowlist needs exact AST rules.**
82+
- Current spec says `Statement::Query`/SELECT in prose.
83+
- Risk: accepting broad query AST nodes admits unsupported or connection-affine SQL.
84+
- Needed decision: define exact accepted `sqlparser` AST variants, recursive checks for CTEs/subqueries/set ops/table factors/expressions, and route unknown AST nodes to writer.
85+
86+
16. **Mismatch/error metadata is too thin.**
87+
- Current spec metadata has only parser route, readonly, tail, and authorizer write flag.
88+
- Risk: hard to distinguish parser drift, user SQL error, VFS role bug, readonly open failure, query-only failure, or authorizer denial.
89+
- Needed decision: include prepare status, SQLite code/message class, denied authorizer action, failure phase, and route decision metadata.
90+
91+
17. **Rollout flag matrix is underspecified.**
92+
- Current spec adds `RIVETKIT_SQLITE_OPT_WORKER_EXECUTOR`, plus existing read-pool flags.
93+
- Risk: `worker_executor=1` with `read_pool_enabled=false` or `max_readers=0` could accidentally use legacy path or error.
94+
- Needed decision: define precedence. Likely: worker executor enabled with read pool disabled means writer-only worker executor.
95+
96+
18. **Default-on gate is too broad.**
97+
- Current spec says native driver tests, depot-client VFS tests, wasm/remote tests, parity and stress tests.
98+
- Risk: important failure modes are not named as release blockers.
99+
- Needed decision: add explicit gates for fault/chaos tests, close during active read/write, close with queued work, close timeout behavior, schema change plus reader reuse/recycle, worker panic/channel close, metrics parity, and wasm dependency checks.
100+
101+
19. **Worker thread implementation choice is still open.**
102+
- Current spec leaves OS threads versus stable `spawn_blocking` loops open.
103+
- Risk: cancellation, joining, panic handling, and runtime shutdown behavior differ.
104+
- Needed decision: pick one for v1 or define acceptance criteria for either.
105+
106+
20. **Read-only PRAGMAs need a later allowlist decision.**
107+
- Current spec routes PRAGMAs to writer in v1.
108+
- Risk: conservative but may leave performance on table.
109+
- Needed decision: keep writer-only in v1; add a follow-up only after parser/SQLite/connection-affinity behavior is tested.
110+
111+
21. **Cache invalidation on truncate needs explicit tests.**
112+
- Current spec asks whether `moka` invalidation on truncate is sufficient.
113+
- Risk: stale pages after truncate or file-size changes.
114+
- Needed decision: add truncate tests covering reader recycle and page-cache invalidation after writer work.
115+
116+
22. **Reader open must be proven network-free.**
117+
- Current design assumes fresh reader opens are cheap because they reuse the shared VFS and cache.
118+
- Risk: SQLite open/setup could accidentally trigger depot/envoy transport through page fetch, schema preload, or VFS bootstrap behavior.
119+
- Needed decision: add a debug/test-only VFS no-network guard around readonly reader open and setup. Any `get_pages` or commit transport during that region should panic in tests or return an internal assertion error in debug builds.

0 commit comments

Comments
 (0)