You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: API.md
+172-1Lines changed: 172 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,12 @@
1
1
# API Reference
2
2
3
-
This document provides a reference for the SQLite functions provided by the `sqlite-sync` extension.
3
+
This document provides a reference for the SQL functions provided by the `sqlite-sync` extension. Unless noted otherwise, the APIs are available on both SQLite and PostgreSQL builds.
@@ -40,6 +45,37 @@ This document provides a reference for the SQLite functions provided by the `sql
40
45
41
46
## Configuration Functions
42
47
48
+
### `cloudsync_set(key, value)`
49
+
50
+
**Description:** Stores a global CloudSync setting in the current database. Settings persist across database reopens and are loaded automatically by the extension.
51
+
52
+
The following payload setting is supported:
53
+
54
+
| Key | Description | Default | Minimum |
55
+
|---|---|---:|---:|
56
+
|`payload_max_chunk_size`| Maximum transport payload size generated by [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version). Values below the minimum are clamped. |`5242880` (5 MB) |`262144` (256 KB) |
57
+
58
+
`payload_max_chunk_size` affects only chunk generation. [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload) continues to accept legacy payloads, monolithic payloads, and v3 chunk-fragment payloads even when they are larger than the local setting. This preserves compatibility between peers using different settings.
59
+
60
+
**Parameters:**
61
+
62
+
-`key` (TEXT): The setting key.
63
+
-`value` (TEXT): The setting value. For `payload_max_chunk_size`, pass the value in bytes.
64
+
65
+
**Returns:** SQLite returns no value. PostgreSQL returns `true` on success.
**Description:** Initializes a table for `sqlite-sync` synchronization. This function is idempotent and needs to be called only once per table on each site; configurations are stored in the database and automatically loaded with the extension.
**Description:** Encodes rows from `cloudsync_changes` into a single monolithic payload. This is the legacy payload API and remains fully supported for backward compatibility.
453
+
454
+
Use this API when the expected payload size is modest or when you need to interoperate with callers that expect a single BLOB. For large rowsets or large individual BLOB/TEXT values, prefer [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version), which splits transport payloads according to `payload_max_chunk_size`.
455
+
456
+
**Parameters:** The function is an aggregate over the columns returned by `cloudsync_changes`:
**Description:** Generates sync payloads as a stream of transport-sized chunks. It is the chunk-aware evolution of [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq), designed for large rowsets and for single BLOB/TEXT values that are larger than the configured chunk size.
484
+
485
+
The maximum generated chunk size is controlled by the global `payload_max_chunk_size` setting. The default is 5 MB and the technical minimum is 256 KB:
When a single encoded column value does not fit in one chunk, CloudSync transparently emits v3 payload fragments for that value. The receiver stages fragments internally and applies the value when all parts arrive. Fragments can arrive out of order; incomplete stale fragment groups are cleaned up automatically.
492
+
493
+
`cloudsync_payload_chunks()` does not change the apply contract: [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload) accepts legacy payloads, monolithic payloads, and v3 chunk-fragment payloads. The local `payload_max_chunk_size` setting is not used to reject incoming payloads.
494
+
495
+
**Important memory note:** chunking limits the size of each transport payload that CloudSync generates. It does not remove the database engine's need to materialize a single final cell value when applying a very large BLOB/TEXT column. In other words, a 500 MB BLOB can be transported in smaller chunks, but the receiving database must still be able to store and bind the completed 500 MB value when that row is applied.
496
+
497
+
**Parameters:**
498
+
499
+
-`since_db_version` (INTEGER/BIGINT, optional): Start after this source database version. If omitted, CloudSync uses the stored send checkpoint.
500
+
-`filter_site_id` (BLOB, optional): Site ID whose changes should be encoded. If omitted, CloudSync uses the local site ID.
501
+
-`until_db_version` (INTEGER/BIGINT, optional): Upper watermark to include. If omitted or `0`, CloudSync captures the current maximum source database version before streaming chunks.
502
+
503
+
**Returns:** A rowset with one row per chunk:
504
+
505
+
| Column | Description |
506
+
|---|---|
507
+
|`payload`| Payload BLOB to pass to `cloudsync_payload_apply()`. |
508
+
|`chunk_index`| Zero-based chunk index for this stream. |
509
+
|`payload_size`| Payload size in bytes. |
510
+
|`rows`| Number of encoded payload rows in this chunk. Fragment chunks usually contain one fragment row. |
511
+
|`db_version_min`| Minimum source `db_version` represented by this chunk. |
512
+
|`db_version_max`| Maximum source `db_version` represented by this chunk. |
513
+
|`watermark_db_version`| Stable upper watermark captured for this chunk stream. Store this after all chunks are durably transferred/applied. |
514
+
515
+
**SQLite usage:**`cloudsync_payload_chunks` is exposed as a virtual table with hidden constraint columns:
516
+
517
+
```sql
518
+
-- Default: uses the stored send checkpoint and local site id
**PostgreSQL usage:**`cloudsync_payload_chunks` is exposed as a set-returning function with three optional arguments:
533
+
534
+
```sql
535
+
-- Default: uses the stored send checkpoint and local site id
536
+
SELECT*
537
+
FROM cloudsync_payload_chunks();
538
+
539
+
-- Explicit arguments
540
+
SELECT*
541
+
FROM cloudsync_payload_chunks(100, cloudsync_siteid(), 200);
542
+
```
543
+
544
+
**Apply example:**
545
+
546
+
```sql
547
+
-- Apply chunks on a receiving peer. Chunks may be applied one at a time.
548
+
SELECT cloudsync_payload_apply(?);
549
+
```
550
+
551
+
On PostgreSQL, apply chunks as individual statements from the transport/client layer. Do not use a set-based statement such as `SELECT cloudsync_payload_apply(payload) FROM chunks_table;` while reading payloads from a table in the same database session. `cloudsync_payload_apply()` performs writes through SPI, and applying while the same statement is still scanning a payload table can conflict with PostgreSQL executor resource ownership. Fetch each payload into the client (or into a local procedural variable after the read completes) and then call `cloudsync_payload_apply()` for that single payload.
552
+
553
+
---
554
+
555
+
### `cloudsync_payload_apply(payload)`
556
+
557
+
**Description:** Applies a sync payload to the current database. The function accepts all supported payload formats:
558
+
559
+
- Legacy payloads generated by older SQLite Sync versions.
560
+
- Monolithic payloads generated by [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq).
561
+
- Chunk-fragment payloads generated by [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version).
562
+
563
+
When a v3 fragment payload is received, CloudSync stores the fragment in an internal table and returns after applying zero or more completed values. Once the final fragment for a value is received, the completed value is validated and applied. Duplicate fragment delivery is idempotent.
564
+
565
+
**Parameters:**
566
+
567
+
-`payload` (BLOB/BYTEA): Payload BLOB to apply.
568
+
569
+
**Returns:** Number of payload rows applied. Fragment payloads that are staged but not yet complete can return `0`.
570
+
571
+
**Example:**
572
+
573
+
```sql
574
+
SELECT cloudsync_payload_apply(:payload);
575
+
```
576
+
577
+
---
578
+
412
579
## Network Functions
413
580
414
581
### `cloudsync_network_init(managedDatabaseId)`
@@ -500,6 +667,10 @@ This means: if you get JSON back, the server was reachable and the network proto
500
667
501
668
**Description:** Sends all unsent local changes to the remote server.
502
669
670
+
The send path streams payloads through [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version), so `payload_max_chunk_size` also limits the payloads generated for network transport. Each generated chunk is uploaded/applied independently; the local send checkpoint is advanced only after the chunk stream completes successfully.
671
+
672
+
Chunk transport is transparent to the CloudSync backend. Each chunk is sent as a normal `/apply` payload, either inline as a base64 `blob` or through the upload `url` path. There is no separate chunk flag: old payloads, monolithic payloads, and v3 fragment payloads are distinguished by the payload format itself.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,19 @@ All notable changes to this project will be documented in this file.
4
4
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
6
6
7
+
## [Unreleased]
8
+
9
+
### Added
10
+
11
+
-**Chunked payload generation** via `cloudsync_payload_chunks()`, available as a SQLite virtual table and as a PostgreSQL set-returning function. The API emits transport-sized payload chunks and transparently fragments oversized BLOB/TEXT values into v3 fragment payloads.
12
+
-**`payload_max_chunk_size` global setting** for controlling generated chunk size. The default is 5 MB and values below the 256 KB technical minimum are clamped.
13
+
-**Payload chunking documentation** in `API.md` and `PERFORMANCE.md`, including the explicit memory note that chunking bounds transport payloads but the database must still materialize a completed single BLOB/TEXT value when it is applied.
14
+
15
+
### Changed
16
+
17
+
-`cloudsync_payload_apply()` now accepts legacy payloads, monolithic payloads, and v3 fragment payloads without enforcing the local `payload_max_chunk_size`, preserving compatibility between peers with different settings.
18
+
-`cloudsync_network_send_changes()` now streams outgoing changes through `cloudsync_payload_chunks()` instead of first building one monolithic payload. This bounds transport payload size for the built-in network path and lets large rowsets or oversized BLOB/TEXT values flow through the same `/apply` endpoint as regular payloads.
Copy file name to clipboardExpand all lines: PERFORMANCE.md
+11-6Lines changed: 11 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ SELECT ... FROM cloudsync_changes WHERE db_version > <last_synced_version>
41
41
42
42
Each metadata table has an **index on `db_version`**, so payload generation scales primarily with the number of new changes, plus a small per-synced-table overhead to construct the `cloudsync_changes` query. It does not diff the full dataset. In SQLite, each changed column also performs a primary-key lookup in the base table to retrieve the current value.
43
43
44
-
The resulting payload is LZ4-compressed before transmission.
44
+
The legacy `cloudsync_payload_encode()` API builds one monolithic LZ4-compressed payload before transmission. For large deltas, `cloudsync_payload_chunks()` can be used instead: it streams a sequence of payload chunks bounded by the `payload_max_chunk_size` setting (default 5 MB, minimum 256 KB). If a single encoded BLOB/TEXT value is larger than the chunk budget, the value is split into transparent v3 fragments and reassembled by `cloudsync_payload_apply()` on the receiver.
45
45
46
46
#### Pull: Payload Application
47
47
@@ -69,7 +69,7 @@ When the application runs sync off the main thread, perceived latency depends on
69
69
70
70
-**Sync interval**: How often the app triggers a push/pull cycle. More frequent syncs mean smaller deltas (smaller D) and faster individual sync operations, at the cost of more network round-trips.
71
71
-**Network latency**: The round-trip time to the sync server. LZ4 compression reduces payload size, but latency is dominated by the network hop itself for small deltas.
72
-
-**Payload size**: Proportional to D x average column value size. Large BLOBs or TEXT values will increase transfer time linearly.
72
+
-**Payload size**: Proportional to D x average column value size. Large BLOBs or TEXT values will increase transfer time linearly. Use `cloudsync_payload_chunks()` when transport payloads may be large; it limits each generated transport payload but does not change the size of the final database value.
73
73
74
74
The extension does not impose a sync schedule -- the application controls when and how often to sync. A typical pattern is to sync on a timer (e.g., every 5-30 seconds) or on specific events (app foreground, user action).
75
75
@@ -118,7 +118,11 @@ Normal application reads are not directly instrumented by the extension. No trig
118
118
119
119
When a new device syncs for the first time (`db_version = 0`), the push payload contains the **entire dataset**: every column of every row across all synced tables. The payload size is proportional to `N * C` (total rows times columns).
120
120
121
-
The payload is built entirely in memory, starting with a 512 KB buffer (`CLOUDSYNC_PAYLOAD_MINBUF_SIZE` in `src/cloudsync.c`) and growing via `realloc` as needed. Peak memory usage is at least the full uncompressed payload size and can be higher during compression. For a database with 1 million rows and 10 columns of average 50 bytes each, the uncompressed payload could reach ~500 MB before LZ4 compression.
121
+
With the legacy `cloudsync_payload_encode()` API, the payload is built entirely in memory, starting with a 512 KB buffer (`CLOUDSYNC_PAYLOAD_MINBUF_SIZE` in `src/cloudsync.c`) and growing via `realloc` as needed. Peak memory usage is at least the full uncompressed payload size and can be higher during compression. For a database with 1 million rows and 10 columns of average 50 bytes each, the uncompressed payload could reach ~500 MB before LZ4 compression.
122
+
123
+
For large initial syncs, prefer `cloudsync_payload_chunks()`. It keeps each generated transport payload bounded by `payload_max_chunk_size` and can fragment a single oversized BLOB/TEXT column across multiple v3 fragment payloads. This prevents the transport payload itself from growing without bound and avoids constructing a monolithic v2 payload during v3 apply.
124
+
125
+
Important limitation: chunking does **not** make a single database cell streamable all the way into the storage engine. When the last fragment of a very large BLOB/TEXT value arrives, the receiver must still materialize the completed value once in order to bind/store it in the destination database. Size `payload_max_chunk_size` for transport safety, but size application memory limits for the largest individual value you allow.
122
126
123
127
Subsequent syncs are incremental (proportional to D, changes since the last sync), so the first sync is the expensive one. Applications with large datasets should plan for this -- for example, by seeding new devices from a database snapshot rather than syncing from scratch.
124
128
@@ -185,6 +189,7 @@ CloudSync: sync_time ~ O(D) -- grows with changes since last sy
185
189
2.**`db_version` index**: Enables efficient range scans for delta extraction.
186
190
3.**Deferred batch merge**: Column changes for the same primary key are accumulated and flushed as a single SQL statement.
187
191
4.**Prepared statement caching**: Merge statements are compiled once and reused across rows.
188
-
5.**LZ4 compression**: Reduces payload size for network transfer.
189
-
6.**Per-column tracking**: Only changed columns are included in the sync payload, not entire rows.
190
-
7.**Early exit on stale data**: The CLS algorithm skips rows where the incoming causal length is lower than the local one, avoiding unnecessary column-level comparisons.
192
+
5.**Chunked payload generation**: `cloudsync_payload_chunks()` bounds transport payload size and handles oversized single values with transparent v3 fragments.
193
+
6.**LZ4 compression**: Reduces payload size for network transfer.
194
+
7.**Per-column tracking**: Only changed columns are included in the sync payload, not entire rows.
195
+
8.**Early exit on stale data**: The CLS algorithm skips rows where the incoming causal length is lower than the local one, avoiding unnecessary column-level comparisons.
0 commit comments