Skip to content

[WIP][POC] Implement SQLite for Web#776

Draft
fabioh8010 wants to merge 25 commits intoExpensify:mainfrom
callstack-internal:feature/sqlite-web
Draft

[WIP][POC] Implement SQLite for Web#776
fabioh8010 wants to merge 25 commits intoExpensify:mainfrom
callstack-internal:feature/sqlite-web

Conversation

@fabioh8010
Copy link
Copy Markdown
Contributor

Details

Successor of #733. I'm basically removing the testing/benchmark stuff, updating branch with Onyx at version 3.0.60 and implementing the new getAll storage function that was missing in original PR.

Related Issues

Expensify/App#80245

Automated Tests

Manual Tests

Author Checklist

  • I linked the correct issue in the ### Related Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android / native
    • Android / Chrome
    • iOS / native
    • iOS / Safari
    • MacOS / Chrome / Safari
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that the left part of a conditional rendering a React component is a boolean and NOT a string, e.g. myBool && <MyComponent />.
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.js or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If a new component is created I verified that:
    • A similar component doesn't exist in the codebase
    • All props are defined accurately and each prop has a /** comment above it */
    • The file is named correctly
    • The component has a clear name that is non-ambiguous and the purpose of the component can be inferred from the name alone
    • The only data being stored in the state is data necessary for rendering and nothing else
    • If we are not using the full Onyx data that we loaded, I've added the proper selector in order to ensure the component only re-renders when the data it is using changes
    • For Class Components, any internal methods passed to components event handlers are bound to this properly so there are no scoping issues (i.e. for onClick={this.submit} the method this.submit should be bound to this in the constructor)
    • Any internal methods bound to this are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this); if this.submit is never passed to a component event handler like onClick)
    • All JSX used for rendering exists in the render method
    • The component has the minimum amount of code necessary for its purpose, and it is broken down into smaller components in order to separate concerns and functions
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.
  • I have checked off every checkbox in the PR author checklist, including those that don't apply to this PR.

Screenshots/Videos

Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari

roryabraham and others added 25 commits February 9, 2026 12:59
Add vitest, @vitest/browser, @vitest/browser-playwright, playwright,
and tinybench as dev dependencies. Configure vitest for browser-mode
benchmarks running in headless Chromium with real IndexedDB via the
IDBKeyValProvider storage backend.

Add npm scripts: bench, bench:compare, bench:save.

Co-authored-by: Cursor <cursoragent@cursor.com>
Create data generators that produce realistic Onyx store data modeled
after App's ONYXKEYS structure, with four configurable tiers:

- small:   50 reports, 500 actions, 50 txns (~1 MB)
- modest:  250 reports, 2.5k actions, 250 txns (~5 MB)
- heavy:   1k reports, 10k actions, 1k txns (~20 MB)
- extreme: 5k reports, 50k actions, 5k txns (~100 MB)

Includes factory functions for reports, report actions, transactions,
policies, and personal details with realistic field shapes. Also adds
shared setup/teardown helpers for initializing and seeding Onyx in
benchmark runs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add tinybench-based benchmarks for all perf-sensitive Onyx methods,
each running across four data tiers (small/modest/heavy/extreme):

- set.bench.ts:     set(), multiSet(), setCollection()
- merge.bench.ts:   merge(), mergeCollection(), update() (mixed ops)
- connect.bench.ts: connect() registration, collection subscribers,
                    notification throughput
- init.bench.ts:    init() with initialKeyStates
- clear.bench.ts:   clear() at each scale

Add scripts/compareBenchmarks.sh which automates baseline-vs-current
branch comparison using vitest's --outputJson and --compare flags.

Co-authored-by: Cursor <cursoragent@cursor.com>
Document the benchmark suite, data tiers, how to run benchmarks,
and how to compare performance across branches.

Co-authored-by: Cursor <cursoragent@cursor.com>
Create SQLiteQueries.ts with all SQL query strings as named constants,
shared by both native and web providers. Move the native SQLiteProvider
from a flat file to SQLiteProvider/index.native.ts to prepare for the
web SQLiteProvider in the same directory. Refactor native provider to
import queries from the shared module instead of inlining SQL strings.

Co-authored-by: Cursor <cursoragent@cursor.com>
Introduce a DirtyMap class that coalesces rapid successive writes to the
same key, deferring persistence via requestIdleCallback. This allows
set/multiSet to return near-instantly by staging values in memory while
the actual storage flush happens asynchronously in batches. Reads check
the dirty map first for consistency. Merge operations flush pending
writes before delegating to the provider to ensure correct semantics.

Benchmarks show 94-99% improvement on set/multiSet/merge operations
compared to the baseline IDB implementation.

Co-authored-by: Cursor <cursoragent@cursor.com>
Implement the web SQLiteProvider backed by @sqlite.org/sqlite-wasm using
the opfs-sahpool VFS for OPFS persistence without COOP/COEP headers. All
database operations run in a dedicated Web Worker to keep the main thread
free. The provider uses prepared statements for all queries and shares
SQL constants with the native provider via SQLiteQueries.ts.

Replace localStorage-based InstanceSync with BroadcastChannel for more
reliable cross-tab communication. After the worker persists a batch, it
broadcasts changed keys so other tabs can update their caches.

Update web platform selection to prefer SQLiteProvider when OPFS and
Workers are available, with automatic fallback to IDBKeyValProvider for
older browsers.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the single-type DirtyMap with a two-type staging layer that
distinguishes between SET entries (full values, flushed via multiSet)
and MERGE entries (accumulated patches, flushed via multiMerge). This
preserves JSON_PATCH efficiency on SQLite while eliminating the
flushNow() call that caused mergeCollection() and update() to regress.

All write operations (set, merge, multiSet, multiMerge) now return
immediately after staging in the DirtyMap. Merges on keys with a
pending SET apply the patch in-memory; merges on keys with a pending
MERGE accumulate patches. On flush, SET entries use reference identity
to handle concurrent mutations, while MERGE entries are removed at
flush start to prevent double-application.

Benchmarks show mergeCollection() improved from +228% regression to
-90% improvement at modest scale (56ms -> 5.6ms), and from +2% to
-99% at extreme scale (1.91s -> 27.8ms).

Co-authored-by: Cursor <cursoragent@cursor.com>
- scripts/generateBenchReport.ts: parse Vitest benchmark JSON, output color-coded HTML
- scripts/benchAndReport.sh: run benchmarks and generate report (single run, branch compare, or multi-config)
- npm run bench:report and bench:report:compare
- Add tsx dev dependency for report generator
- Document in README; add bench-results.html to .gitignore

Co-authored-by: Cursor <cursoragent@cursor.com>
After each benchmark run, detect results with high variance (RME > 50%
on operations > 1ms, or extreme outlier spikes shifting mean > 2ms/20%)
and re-run only the affected benchmark files. Merge improved results and
repeat up to --max-retries times (default 3).

Criteria carefully tuned to avoid false positives:
- Sub-ms operations (clear, connect register) are excluded from outlier
  detection since GC spikes don't affect their cross-run stability
- Low sample count alone doesn't trigger re-runs (slow operations like
  multiSet 5000 inherently get ~10 samples but are consistent)

Integrated into benchAndReport.sh (on by default, --no-stabilize to skip).

Co-authored-by: Cursor <cursoragent@cursor.com>
tinybench already handles statistical rigor by running each benchmark
for a time budget and collecting many samples. The stabilization layer
added complexity without clear value — it produced false positives,
caused the multi-config pipeline to abort (exit code 1 from noisy
results killed the script under set -e), and re-runs consistently
returned the same results.

Also fixes benchAndReport.sh argument parsing (IFS→parameter expansion)
and simplifies run_bench() to call vitest directly.

Co-authored-by: Cursor <cursoragent@cursor.com>
Three-config comparison:
- Baseline: Rory-Benchmarks branch (no DirtyMap, no workers, IDB only)
- DM+IDB: DirtyMap + workers + IndexedDB
- DM+SQLite: DirtyMap + workers + SQLite WASM

Results show 93-99% improvements across write operations.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the SQLiteProvider-specific worker architecture with a single
unified worker that supports both SQLite and IDB backends:

- Create SQLiteProvider/index.web.ts: StorageProvider implementation for
  SQLite WASM, reshaped from the old worker.ts handle* functions
- Create lib/storage/worker.ts: provider-agnostic worker that dynamically
  imports the appropriate StorageProvider based on init message
- Create WorkerStorageProvider.ts: generic main-thread proxy replacing
  both SQLiteProvider/index.ts and IDBKeyValProvider web usage
- Update platforms/index.ts: use WorkerStorageProvider with backend
  selection based on OPFS availability
- Remove old SQLiteProvider/index.ts (web proxy) and worker.ts

Both IDB and SQLite now run off the main thread in the same unified
worker, with BroadcastChannel cross-tab sync handled at the worker level.

Co-authored-by: Cursor <cursoragent@cursor.com>
Update the Architecture Overview diagram, Proposed Solution section, and
Phase 3 implementation description to document the new provider-agnostic
unified worker that runs both SQLite and IDB backends off the main thread.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Rename DirtyMap to WriteBuffer throughout the codebase
- Upgrade worker.ts to broadcast actual values (not just key names) on
  BroadcastChannel after persistence, using typed message formats:
  set (full values), merge (raw patches), remove, clear
- Rewrite InstanceSync to handle value-bearing messages directly:
  set -> onStorageKeyChanged(key, value) with no storage read,
  merge -> fastMerge(cachedValue, patch) against the in-memory cache,
  remove/clear -> notify with null
- Remove all InstanceSync send-side calls from storage/index.ts since
  the unified worker handles broadcasting for all backends
- Remove handlesBroadcast flag (unnecessary given unified worker)
- Update PROPOSAL_DRAFT.md with Phase 5 documentation and revised
  cross-tab sync description

Co-authored-by: Cursor <cursoragent@cursor.com>
…cture

Replace the custom C++ SQLite/WASM build with official packages:
- Web: @sqlite.org/sqlite-wasm with opfs-sahpool VFS (IDB fallback)
- Native: react-native-nitro-sqlite with sync APIs

Key changes:
- Restore SQLiteProvider implementations for web and native from git history
- Shared SQLiteQueries.ts as single source of truth for all SQL
- Unified worker.ts with dynamic backend selection and graceful WASM fallback
- NativeFlushWorker using react-native-worklets-core Worker Runtime
- NativeBufferStore simplified to pure shared_mutex-protected AnyMap HybridObject
- Value-bearing BroadcastChannel messages for cross-tab sync
- WriteBuffer with SET/MERGE entry coalescing via BufferStore interface
- Updated PROPOSAL_DRAFT.md with revised architecture and Glaze future note

Co-authored-by: Cursor <cursoragent@cursor.com>
Root cause: the web worker's onmessage handler was async but each incoming
message spawned an independent invocation. When init took time (dynamic
WASM/IDB imports), data operations arrived and called provider!.method()
on a null provider, causing silent hangs.

Three fixes:
1. worker.ts: Replace concurrent onmessage with a serial message queue.
   Messages are processed one at a time, guaranteeing init completes
   before any data operation touches the provider.

2. WorkerStorageProvider.init(): Return the init Promise so callers
   can await worker readiness (previously returned void/undefined).

3. storage/index.ts: Properly await async init() return values before
   resolving the init gate that unblocks all storage operations.

Also:
- Widen StorageProvider.init type to () => void | Promise<void>
- Add idb-keyval to vitest optimizeDeps.include to prevent mid-test
  Vite dependency reloads
- Regenerate benchmark comparison with all 5 files (init + clear now work)

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants