[WIP] Rewrite the "sync_local" query by rkistner · Pull Request #56 · powersync-ja/powersync-sqlite-core

rkistner · 2025-02-03T09:10:26Z

#40 fixed a performance issue in initial/bulk sync when there are many duplicate row_ids, but decreased the performance slightly for the general case. This attempts to optimize it again, mostly by removing the second temp b-tree used in query execution.

This does not make a massive difference in overall initial sync performance. On my machine, with 1M ops, the query time reduces from around 5s -> 3s, versus a total initial sync time of 60s. So it's not a big gain overall, but this is the slowest query that locks the database for writes and cannot be split into smaller subqueries, so any optimization here helps with app responsiveness.

There is another query form added in the comments, which can take the initial sync query time down in the above case to under 2s (with no temp b-tree at all), but it doesn't cater for incremental updates. It needs some stats tracking / heuristics added to know when to use one query or the other, so I'm leaving that for later.

The temp b-trees used by this query could also be related to RangeError: Maximum call stack size exceeded errors seen on iOS, as well as disk I/O error occasionally seen on Android, when SQLite is configured to use files for temporary storage. While those issues have other workarounds, any changes to reduce temporary b-trees here could help.

TODO:

Regression testing
Test real-world performance

rkistner · 2025-02-07T08:58:11Z

I did some performance benchmarks on web, with 1M rows with large ids. This forces SQLite to write the temp b-tree to disk.

Takeways:

The optimized query here makes a minor but still significant difference with native and OPFS.
The optimized "sync from scratch" query gives a major performance boost, so it's worth investigating that further.
OPFSCoopSyncVFS appears to scale similar to native performance, just taking 2x as long for any query.
IDBBatchAtomicVFS scales much worse, and can take 10x as long as long as OPFSCoopSyncVFS with large datasets, when it works at all.
With IDBBatchAtomicVFS, we cannot sync large datasets at all without PRAGMA temp_store = 2.
With IDBBatchAtomicVFS, there is a lot of filesystem overhead. We can however use PRAGMA temp_store = 2, PRAGMA cache_size = -200000 to let SQLite get similar read performance to OPFS due to everything being in memory.

Linux, native:

Original query: 2.01s
Optimized query: 1.42s
Optimized "sync from scratch" query: 0.254s

Web, IDBBatchAtomicVFS:

Original query: IDBBatchAtomicVFS.js:553 RangeError: offset is out of bounds after around 10-15s.
Optimized query: IDBBatchAtomicVFS.js:553 RangeError: offset is out of bounds after around 10-15s.
Optimized "sync from scratch" query: 5.81s

Linux, OPFSCoopSyncVFS:

Original query: 4.38s
Optimized query: 2.68s
Optimized "sync from scratch" query: 0.502s

Web, IDBBatchAtomicVFS, PRAGMA temp_store = 2:

Original query: 40s.
Optimized query: 21s.
Optimized "sync from scratch" query: 5.81s

Web, IDBBatchAtomicVFS, PRAGMA temp_store = 2, PRAGMA cache_size = -200000, pre-warmed cache:

Original query: 4.43s.
Optimized query: 3.30s.
Optimized "sync from scratch" query: 0.727s

The test data:

BEGIN TRANSACTION;

-- 1M ops
WITH RECURSIVE generate_rows(n) AS (
    SELECT 1
    UNION ALL
    SELECT n + 1 FROM generate_rows WHERE n < 1000000
)
INSERT INTO ps_oplog (bucket, op_id, row_type, row_id, key, data, hash)
SELECT 
    (n % 10), -- Generate 10 different buckets
    n,
    'thisisatable',
    'thisismyrowid' || n,
    'thisisarowkeykey_' || n,
    '{"n": ' || n || '}',
    (n * 17) % 1000000000 -- Some pseudo-random hash
    
FROM generate_rows;

-- 10 buckets
WITH RECURSIVE generate_rows(n) AS (
    SELECT 1
    UNION ALL
    SELECT n + 1 FROM generate_rows WHERE n < 10
)
INSERT INTO ps_buckets (id, name)
SELECT 
    (n % 10),
    'bucket' || n
    
FROM generate_rows;

COMMIT;

rkistner · 2025-05-15T15:21:37Z

After some reports of slow performance for this query on Android devices when syncing large databases, I did some further testing.

It is quite tricky to simulate the same conditions on a desktop system. On Linux, I managed to get some good results by using cgroups v2 to limit filesystem throughput, and importantly also limit max memory. This ensures the temporary files are actually flushed to disk - otherwise it is effectively just buffered in memory, never seeing the slowdown from filesystem usage.

With the above applied, I can sometimes see massive performance differences between the current query and the optimized ones - as much as 60s -> 2s for some tests. It does depend a lot on the data volumes, memory available and filesystem throughput. In some cases, everything performs great until you reach a certain threshold, and then performance degrades significantly due to the filesystem usage.

I'll need to rebase this PR to work with the latest updates, and also do some proper real-life testing on e.g. Android devices, but I think it's worth taking this further.

rkistner · 2025-05-26T12:25:23Z

Superseded by #78.

rkistner added 2 commits February 3, 2025 11:01

Rewrite the "sync_local" query.

8107e42

Add note on is_err().

f1f039d

rkistner mentioned this pull request Feb 4, 2025

Support bucket with different priorities #55

Merged

rkistner mentioned this pull request Feb 7, 2025

Cache size & temp storage powersync-ja/powersync-js#492

Merged

rkistner mentioned this pull request May 26, 2025

Rewrite the "sync_local" query #78

Merged

rkistner closed this May 26, 2025

rkistner deleted the faster-initial-sync branch May 26, 2025 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Rewrite the "sync_local" query#56

[WIP] Rewrite the "sync_local" query#56
rkistner wants to merge 2 commits intomainfrom
faster-initial-sync

rkistner commented Feb 3, 2025 •

edited

Loading

Uh oh!

rkistner commented Feb 7, 2025

Uh oh!

rkistner commented May 15, 2025

Uh oh!

rkistner commented May 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rkistner commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkistner commented Feb 7, 2025

Uh oh!

rkistner commented May 15, 2025

Uh oh!

rkistner commented May 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rkistner commented Feb 3, 2025 •

edited

Loading