Skip to content

Commit f1ace81

Browse files
committed
Add chunked payload transport
1 parent 7be7d3b commit f1ace81

23 files changed

Lines changed: 2498 additions & 126 deletions

API.md

Lines changed: 172 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
# API Reference
22

3-
This document provides a reference for the SQLite functions provided by the `sqlite-sync` extension.
3+
This document provides a reference for the SQL functions provided by the `sqlite-sync` extension. Unless noted otherwise, the APIs are available on both SQLite and PostgreSQL builds.
44

55
## Index
66

77
- [Configuration Functions](#configuration-functions)
88
- [`cloudsync_init()`](#cloudsync_inittable_name-crdt_algo-init_flags)
9+
- [`cloudsync_set()`](#cloudsync_setkey-value)
910
- [`cloudsync_enable()`](#cloudsync_enabletable_name)
1011
- [`cloudsync_disable()`](#cloudsync_disabletable_name)
1112
- [`cloudsync_is_enabled()`](#cloudsync_is_enabledtable_name)
@@ -24,6 +25,10 @@ This document provides a reference for the SQLite functions provided by the `sql
2425
- [Schema Alteration Functions](#schema-alteration-functions)
2526
- [`cloudsync_begin_alter()`](#cloudsync_begin_altertable_name)
2627
- [`cloudsync_commit_alter()`](#cloudsync_commit_altertable_name)
28+
- [Payload Functions](#payload-functions)
29+
- [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq)
30+
- [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version)
31+
- [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload)
2732
- [Network Functions](#network-functions)
2833
- [`cloudsync_network_init()`](#cloudsync_network_initmanageddatabaseid)
2934
- [`cloudsync_network_cleanup()`](#cloudsync_network_cleanup)
@@ -40,6 +45,37 @@ This document provides a reference for the SQLite functions provided by the `sql
4045

4146
## Configuration Functions
4247

48+
### `cloudsync_set(key, value)`
49+
50+
**Description:** Stores a global CloudSync setting in the current database. Settings persist across database reopens and are loaded automatically by the extension.
51+
52+
The following payload setting is supported:
53+
54+
| Key | Description | Default | Minimum |
55+
|---|---|---:|---:|
56+
| `payload_max_chunk_size` | Maximum transport payload size generated by [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version). Values below the minimum are clamped. | `5242880` (5 MB) | `262144` (256 KB) |
57+
58+
`payload_max_chunk_size` affects only chunk generation. [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload) continues to accept legacy payloads, monolithic payloads, and v3 chunk-fragment payloads even when they are larger than the local setting. This preserves compatibility between peers using different settings.
59+
60+
**Parameters:**
61+
62+
- `key` (TEXT): The setting key.
63+
- `value` (TEXT): The setting value. For `payload_max_chunk_size`, pass the value in bytes.
64+
65+
**Returns:** SQLite returns no value. PostgreSQL returns `true` on success.
66+
67+
**Example:**
68+
69+
```sql
70+
-- Use 1 MB transport chunks
71+
SELECT cloudsync_set('payload_max_chunk_size', '1048576');
72+
73+
-- Restore the default 5 MB transport chunks
74+
SELECT cloudsync_set('payload_max_chunk_size', '5242880');
75+
```
76+
77+
---
78+
4379
### `cloudsync_init(table_name, [crdt_algo], [init_flags])`
4480

4581
**Description:** Initializes a table for `sqlite-sync` synchronization. This function is idempotent and needs to be called only once per table on each site; configurations are stored in the database and automatically loaded with the extension.
@@ -409,6 +445,137 @@ SELECT cloudsync_commit_alter('my_table');
409445

410446
---
411447

448+
## Payload Functions
449+
450+
### `cloudsync_payload_encode(tbl, pk, col_name, col_value, col_version, db_version, site_id, cl, seq)`
451+
452+
**Description:** Encodes rows from `cloudsync_changes` into a single monolithic payload. This is the legacy payload API and remains fully supported for backward compatibility.
453+
454+
Use this API when the expected payload size is modest or when you need to interoperate with callers that expect a single BLOB. For large rowsets or large individual BLOB/TEXT values, prefer [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version), which splits transport payloads according to `payload_max_chunk_size`.
455+
456+
**Parameters:** The function is an aggregate over the columns returned by `cloudsync_changes`:
457+
458+
- `tbl` (TEXT): Source table name.
459+
- `pk` (BLOB): Encoded primary key.
460+
- `col_name` (TEXT): Changed column name.
461+
- `col_value` (BLOB): Encoded column value.
462+
- `col_version` (INTEGER/BIGINT): Column version.
463+
- `db_version` (INTEGER/BIGINT): Source database version.
464+
- `site_id` (BLOB): Source site identifier.
465+
- `cl` (INTEGER/BIGINT): Causal length.
466+
- `seq` (INTEGER/BIGINT): Sequence number within the source database version.
467+
468+
**Returns:** A single payload BLOB.
469+
470+
**Example:**
471+
472+
```sql
473+
SELECT cloudsync_payload_encode(
474+
tbl, pk, col_name, col_value, col_version, db_version, site_id, cl, seq
475+
) AS payload
476+
FROM cloudsync_changes;
477+
```
478+
479+
---
480+
481+
### `cloudsync_payload_chunks([since_db_version], [filter_site_id], [until_db_version])`
482+
483+
**Description:** Generates sync payloads as a stream of transport-sized chunks. It is the chunk-aware evolution of [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq), designed for large rowsets and for single BLOB/TEXT values that are larger than the configured chunk size.
484+
485+
The maximum generated chunk size is controlled by the global `payload_max_chunk_size` setting. The default is 5 MB and the technical minimum is 256 KB:
486+
487+
```sql
488+
SELECT cloudsync_set('payload_max_chunk_size', '5242880');
489+
```
490+
491+
When a single encoded column value does not fit in one chunk, CloudSync transparently emits v3 payload fragments for that value. The receiver stages fragments internally and applies the value when all parts arrive. Fragments can arrive out of order; incomplete stale fragment groups are cleaned up automatically.
492+
493+
`cloudsync_payload_chunks()` does not change the apply contract: [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload) accepts legacy payloads, monolithic payloads, and v3 chunk-fragment payloads. The local `payload_max_chunk_size` setting is not used to reject incoming payloads.
494+
495+
**Important memory note:** chunking limits the size of each transport payload that CloudSync generates. It does not remove the database engine's need to materialize a single final cell value when applying a very large BLOB/TEXT column. In other words, a 500 MB BLOB can be transported in smaller chunks, but the receiving database must still be able to store and bind the completed 500 MB value when that row is applied.
496+
497+
**Parameters:**
498+
499+
- `since_db_version` (INTEGER/BIGINT, optional): Start after this source database version. If omitted, CloudSync uses the stored send checkpoint.
500+
- `filter_site_id` (BLOB, optional): Site ID whose changes should be encoded. If omitted, CloudSync uses the local site ID.
501+
- `until_db_version` (INTEGER/BIGINT, optional): Upper watermark to include. If omitted or `0`, CloudSync captures the current maximum source database version before streaming chunks.
502+
503+
**Returns:** A rowset with one row per chunk:
504+
505+
| Column | Description |
506+
|---|---|
507+
| `payload` | Payload BLOB to pass to `cloudsync_payload_apply()`. |
508+
| `chunk_index` | Zero-based chunk index for this stream. |
509+
| `payload_size` | Payload size in bytes. |
510+
| `rows` | Number of encoded payload rows in this chunk. Fragment chunks usually contain one fragment row. |
511+
| `db_version_min` | Minimum source `db_version` represented by this chunk. |
512+
| `db_version_max` | Maximum source `db_version` represented by this chunk. |
513+
| `watermark_db_version` | Stable upper watermark captured for this chunk stream. Store this after all chunks are durably transferred/applied. |
514+
515+
**SQLite usage:** `cloudsync_payload_chunks` is exposed as a virtual table with hidden constraint columns:
516+
517+
```sql
518+
-- Default: uses the stored send checkpoint and local site id
519+
SELECT payload, chunk_index, payload_size, watermark_db_version
520+
FROM cloudsync_payload_chunks
521+
ORDER BY chunk_index;
522+
523+
-- Explicit arguments through hidden columns
524+
SELECT payload, chunk_index, payload_size, watermark_db_version
525+
FROM cloudsync_payload_chunks
526+
WHERE since_db_version = 100
527+
AND site_id = cloudsync_siteid()
528+
AND until_db_version = 200
529+
ORDER BY chunk_index;
530+
```
531+
532+
**PostgreSQL usage:** `cloudsync_payload_chunks` is exposed as a set-returning function with three optional arguments:
533+
534+
```sql
535+
-- Default: uses the stored send checkpoint and local site id
536+
SELECT *
537+
FROM cloudsync_payload_chunks();
538+
539+
-- Explicit arguments
540+
SELECT *
541+
FROM cloudsync_payload_chunks(100, cloudsync_siteid(), 200);
542+
```
543+
544+
**Apply example:**
545+
546+
```sql
547+
-- Apply chunks on a receiving peer. Chunks may be applied one at a time.
548+
SELECT cloudsync_payload_apply(?);
549+
```
550+
551+
On PostgreSQL, apply chunks as individual statements from the transport/client layer. Do not use a set-based statement such as `SELECT cloudsync_payload_apply(payload) FROM chunks_table;` while reading payloads from a table in the same database session. `cloudsync_payload_apply()` performs writes through SPI, and applying while the same statement is still scanning a payload table can conflict with PostgreSQL executor resource ownership. Fetch each payload into the client (or into a local procedural variable after the read completes) and then call `cloudsync_payload_apply()` for that single payload.
552+
553+
---
554+
555+
### `cloudsync_payload_apply(payload)`
556+
557+
**Description:** Applies a sync payload to the current database. The function accepts all supported payload formats:
558+
559+
- Legacy payloads generated by older SQLite Sync versions.
560+
- Monolithic payloads generated by [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq).
561+
- Chunk-fragment payloads generated by [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version).
562+
563+
When a v3 fragment payload is received, CloudSync stores the fragment in an internal table and returns after applying zero or more completed values. Once the final fragment for a value is received, the completed value is validated and applied. Duplicate fragment delivery is idempotent.
564+
565+
**Parameters:**
566+
567+
- `payload` (BLOB/BYTEA): Payload BLOB to apply.
568+
569+
**Returns:** Number of payload rows applied. Fragment payloads that are staged but not yet complete can return `0`.
570+
571+
**Example:**
572+
573+
```sql
574+
SELECT cloudsync_payload_apply(:payload);
575+
```
576+
577+
---
578+
412579
## Network Functions
413580

414581
### `cloudsync_network_init(managedDatabaseId)`
@@ -500,6 +667,10 @@ This means: if you get JSON back, the server was reachable and the network proto
500667

501668
**Description:** Sends all unsent local changes to the remote server.
502669

670+
The send path streams payloads through [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version), so `payload_max_chunk_size` also limits the payloads generated for network transport. Each generated chunk is uploaded/applied independently; the local send checkpoint is advanced only after the chunk stream completes successfully.
671+
672+
Chunk transport is transparent to the CloudSync backend. Each chunk is sent as a normal `/apply` payload, either inline as a base64 `blob` or through the upload `url` path. There is no separate chunk flag: old payloads, monolithic payloads, and v3 fragment payloads are distinguished by the payload format itself.
673+
503674
**Parameters:** None.
504675

505676
**Returns:** A JSON string with the send result:

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,19 @@ All notable changes to this project will be documented in this file.
44

55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
66

7+
## [Unreleased]
8+
9+
### Added
10+
11+
- **Chunked payload generation** via `cloudsync_payload_chunks()`, available as a SQLite virtual table and as a PostgreSQL set-returning function. The API emits transport-sized payload chunks and transparently fragments oversized BLOB/TEXT values into v3 fragment payloads.
12+
- **`payload_max_chunk_size` global setting** for controlling generated chunk size. The default is 5 MB and values below the 256 KB technical minimum are clamped.
13+
- **Payload chunking documentation** in `API.md` and `PERFORMANCE.md`, including the explicit memory note that chunking bounds transport payloads but the database must still materialize a completed single BLOB/TEXT value when it is applied.
14+
15+
### Changed
16+
17+
- `cloudsync_payload_apply()` now accepts legacy payloads, monolithic payloads, and v3 fragment payloads without enforcing the local `payload_max_chunk_size`, preserving compatibility between peers with different settings.
18+
- `cloudsync_network_send_changes()` now streams outgoing changes through `cloudsync_payload_chunks()` instead of first building one monolithic payload. This bounds transport payload size for the built-in network path and lets large rowsets or oversized BLOB/TEXT values flow through the same `/apply` endpoint as regular payloads.
19+
720
## [1.0.20] - 2026-05-26
821

922
### Changed

PERFORMANCE.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ SELECT ... FROM cloudsync_changes WHERE db_version > <last_synced_version>
4141

4242
Each metadata table has an **index on `db_version`**, so payload generation scales primarily with the number of new changes, plus a small per-synced-table overhead to construct the `cloudsync_changes` query. It does not diff the full dataset. In SQLite, each changed column also performs a primary-key lookup in the base table to retrieve the current value.
4343

44-
The resulting payload is LZ4-compressed before transmission.
44+
The legacy `cloudsync_payload_encode()` API builds one monolithic LZ4-compressed payload before transmission. For large deltas, `cloudsync_payload_chunks()` can be used instead: it streams a sequence of payload chunks bounded by the `payload_max_chunk_size` setting (default 5 MB, minimum 256 KB). If a single encoded BLOB/TEXT value is larger than the chunk budget, the value is split into transparent v3 fragments and reassembled by `cloudsync_payload_apply()` on the receiver.
4545

4646
#### Pull: Payload Application
4747

@@ -69,7 +69,7 @@ When the application runs sync off the main thread, perceived latency depends on
6969

7070
- **Sync interval**: How often the app triggers a push/pull cycle. More frequent syncs mean smaller deltas (smaller D) and faster individual sync operations, at the cost of more network round-trips.
7171
- **Network latency**: The round-trip time to the sync server. LZ4 compression reduces payload size, but latency is dominated by the network hop itself for small deltas.
72-
- **Payload size**: Proportional to D x average column value size. Large BLOBs or TEXT values will increase transfer time linearly.
72+
- **Payload size**: Proportional to D x average column value size. Large BLOBs or TEXT values will increase transfer time linearly. Use `cloudsync_payload_chunks()` when transport payloads may be large; it limits each generated transport payload but does not change the size of the final database value.
7373

7474
The extension does not impose a sync schedule -- the application controls when and how often to sync. A typical pattern is to sync on a timer (e.g., every 5-30 seconds) or on specific events (app foreground, user action).
7575

@@ -118,7 +118,11 @@ Normal application reads are not directly instrumented by the extension. No trig
118118

119119
When a new device syncs for the first time (`db_version = 0`), the push payload contains the **entire dataset**: every column of every row across all synced tables. The payload size is proportional to `N * C` (total rows times columns).
120120

121-
The payload is built entirely in memory, starting with a 512 KB buffer (`CLOUDSYNC_PAYLOAD_MINBUF_SIZE` in `src/cloudsync.c`) and growing via `realloc` as needed. Peak memory usage is at least the full uncompressed payload size and can be higher during compression. For a database with 1 million rows and 10 columns of average 50 bytes each, the uncompressed payload could reach ~500 MB before LZ4 compression.
121+
With the legacy `cloudsync_payload_encode()` API, the payload is built entirely in memory, starting with a 512 KB buffer (`CLOUDSYNC_PAYLOAD_MINBUF_SIZE` in `src/cloudsync.c`) and growing via `realloc` as needed. Peak memory usage is at least the full uncompressed payload size and can be higher during compression. For a database with 1 million rows and 10 columns of average 50 bytes each, the uncompressed payload could reach ~500 MB before LZ4 compression.
122+
123+
For large initial syncs, prefer `cloudsync_payload_chunks()`. It keeps each generated transport payload bounded by `payload_max_chunk_size` and can fragment a single oversized BLOB/TEXT column across multiple v3 fragment payloads. This prevents the transport payload itself from growing without bound and avoids constructing a monolithic v2 payload during v3 apply.
124+
125+
Important limitation: chunking does **not** make a single database cell streamable all the way into the storage engine. When the last fragment of a very large BLOB/TEXT value arrives, the receiver must still materialize the completed value once in order to bind/store it in the destination database. Size `payload_max_chunk_size` for transport safety, but size application memory limits for the largest individual value you allow.
122126

123127
Subsequent syncs are incremental (proportional to D, changes since the last sync), so the first sync is the expensive one. Applications with large datasets should plan for this -- for example, by seeding new devices from a database snapshot rather than syncing from scratch.
124128

@@ -185,6 +189,7 @@ CloudSync: sync_time ~ O(D) -- grows with changes since last sy
185189
2. **`db_version` index**: Enables efficient range scans for delta extraction.
186190
3. **Deferred batch merge**: Column changes for the same primary key are accumulated and flushed as a single SQL statement.
187191
4. **Prepared statement caching**: Merge statements are compiled once and reused across rows.
188-
5. **LZ4 compression**: Reduces payload size for network transfer.
189-
6. **Per-column tracking**: Only changed columns are included in the sync payload, not entire rows.
190-
7. **Early exit on stale data**: The CLS algorithm skips rows where the incoming causal length is lower than the local one, avoiding unnecessary column-level comparisons.
192+
5. **Chunked payload generation**: `cloudsync_payload_chunks()` bounds transport payload size and handles oversized single values with transparent v3 fragments.
193+
6. **LZ4 compression**: Reduces payload size for network transfer.
194+
7. **Per-column tracking**: Only changed columns are included in the sync payload, not entire rows.
195+
8. **Early exit on stale data**: The CLS algorithm skips rows where the incoming causal length is lower than the local one, avoiding unnecessary column-level comparisons.

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,7 @@ See the full guide: **[Row-Level Security Documentation](./docs/row-level-securi
219219
## Documentation
220220

221221
- **[API Reference](./API.md)**: all functions, parameters, and examples
222+
- **[Performance & Overhead](./PERFORMANCE.md)**: sync cost model, payload chunking, and large-value memory notes
222223
- **[Installation Guide](./docs/installation.md)**: platform-specific setup (Swift, Android, Expo, React Native, Flutter, WASM)
223224
- **[Block-Level LWW Guide](./docs/block-lww.md)**: line-level text merge for markdown and documents
224225
- **[Row-Level Security Guide](./docs/row-level-security.md)**: multi-tenant access control with server-enforced policies

docker/postgresql/Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ FROM postgres:${POSTGRES_TAG}
66
# and install the matching server-dev package
77
RUN apt-get update && apt-get install -y \
88
build-essential \
9+
postgresql-contrib-${PG_MAJOR} \
910
postgresql-server-dev-${PG_MAJOR} \
1011
git \
1112
make \

docker/postgresql/Dockerfile.debug

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,9 @@ RUN set -eux; \
4444
cd /usr/src/postgresql-17; \
4545
./configure --enable-debug --enable-cassert --without-icu CFLAGS="-O0 -g3 -fno-omit-frame-pointer"; \
4646
make -j"$(nproc)"; \
47-
make install
47+
make install; \
48+
make -C contrib/dblink -j"$(nproc)"; \
49+
make -C contrib/dblink install
4850

4951
ENV PATH="/usr/local/pgsql/bin:${PATH}"
5052
ENV LD_LIBRARY_PATH="/usr/local/pgsql/lib:${LD_LIBRARY_PATH}"

0 commit comments

Comments
 (0)