sqliteai
diff --git a/‎API.md‎
Lines changed: 172 additions & 1 deletion b/‎API.md‎
Lines changed: 172 additions & 1 deletion
diff --git a/‎CHANGELOG.md‎
Lines changed: 13 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎PERFORMANCE.md‎
Lines changed: 11 additions & 6 deletions b/‎PERFORMANCE.md‎
Lines changed: 11 additions & 6 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docker/postgresql/Dockerfile‎
Lines changed: 1 addition & 0 deletions b/‎docker/postgresql/Dockerfile‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docker/postgresql/Dockerfile.debug‎
Lines changed: 3 additions & 1 deletion b/‎docker/postgresql/Dockerfile.debug‎
Lines changed: 3 additions & 1 deletion
@@ -1,11 +1,12 @@
 # API Reference
 
-This document provides a reference for the SQLite functions provided by the `sqlite-sync` extension.
+This document provides a reference for the SQL functions provided by the `sqlite-sync` extension. Unless noted otherwise, the APIs are available on both SQLite and PostgreSQL builds.
 
 ## Index
 
 - [Configuration Functions](#configuration-functions)
   - [`cloudsync_init()`](#cloudsync_inittable_name-crdt_algo-init_flags)
+  - [`cloudsync_set()`](#cloudsync_setkey-value)
   - [`cloudsync_enable()`](#cloudsync_enabletable_name)
   - [`cloudsync_disable()`](#cloudsync_disabletable_name)
   - [`cloudsync_is_enabled()`](#cloudsync_is_enabledtable_name)
@@ -24,6 +25,10 @@ This document provides a reference for the SQLite functions provided by the `sql
 - [Schema Alteration Functions](#schema-alteration-functions)
   - [`cloudsync_begin_alter()`](#cloudsync_begin_altertable_name)
   - [`cloudsync_commit_alter()`](#cloudsync_commit_altertable_name)
+- [Payload Functions](#payload-functions)
+  - [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq)
+  - [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version)
+  - [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload)
 - [Network Functions](#network-functions)
   - [`cloudsync_network_init()`](#cloudsync_network_initmanageddatabaseid)
   - [`cloudsync_network_cleanup()`](#cloudsync_network_cleanup)
@@ -40,6 +45,37 @@ This document provides a reference for the SQLite functions provided by the `sql
 
 ## Configuration Functions
 
+### `cloudsync_set(key, value)`
+
+**Description:** Stores a global CloudSync setting in the current database. Settings persist across database reopens and are loaded automatically by the extension.
+
+The following payload setting is supported:
+
+| Key | Description | Default | Minimum |
+|---|---|---:|---:|
+| `payload_max_chunk_size` | Maximum transport payload size generated by [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version). Values below the minimum are clamped. | `5242880` (5 MB) | `262144` (256 KB) |
+
+`payload_max_chunk_size` affects only chunk generation. [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload) continues to accept legacy payloads, monolithic payloads, and v3 chunk-fragment payloads even when they are larger than the local setting. This preserves compatibility between peers using different settings.
+
+**Parameters:**
+
+- `key` (TEXT): The setting key.
+- `value` (TEXT): The setting value. For `payload_max_chunk_size`, pass the value in bytes.
+
+**Returns:** SQLite returns no value. PostgreSQL returns `true` on success.
+
+**Example:**
+
+```sql
+-- Use 1 MB transport chunks
+SELECT cloudsync_set('payload_max_chunk_size', '1048576');
+
+-- Restore the default 5 MB transport chunks
+SELECT cloudsync_set('payload_max_chunk_size', '5242880');
+```
+
+---
+
 ### `cloudsync_init(table_name, [crdt_algo], [init_flags])`
 
 **Description:** Initializes a table for `sqlite-sync` synchronization. This function is idempotent and needs to be called only once per table on each site; configurations are stored in the database and automatically loaded with the extension.
@@ -409,6 +445,137 @@ SELECT cloudsync_commit_alter('my_table');
 
 ---
 
+## Payload Functions
+
+### `cloudsync_payload_encode(tbl, pk, col_name, col_value, col_version, db_version, site_id, cl, seq)`
+
+**Description:** Encodes rows from `cloudsync_changes` into a single monolithic payload. This is the legacy payload API and remains fully supported for backward compatibility.
+
+Use this API when the expected payload size is modest or when you need to interoperate with callers that expect a single BLOB. For large rowsets or large individual BLOB/TEXT values, prefer [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version), which splits transport payloads according to `payload_max_chunk_size`.
+
+**Parameters:** The function is an aggregate over the columns returned by `cloudsync_changes`:
+
+- `tbl` (TEXT): Source table name.
+- `pk` (BLOB): Encoded primary key.
+- `col_name` (TEXT): Changed column name.
+- `col_value` (BLOB): Encoded column value.
+- `col_version` (INTEGER/BIGINT): Column version.
+- `db_version` (INTEGER/BIGINT): Source database version.
+- `site_id` (BLOB): Source site identifier.
+- `cl` (INTEGER/BIGINT): Causal length.
+- `seq` (INTEGER/BIGINT): Sequence number within the source database version.
+
+**Returns:** A single payload BLOB.
+
+**Example:**
+
+```sql
+SELECT cloudsync_payload_encode(
+  tbl, pk, col_name, col_value, col_version, db_version, site_id, cl, seq
+) AS payload
+FROM cloudsync_changes;
+```
+
+---
+
+### `cloudsync_payload_chunks([since_db_version], [filter_site_id], [until_db_version])`
+
+**Description:** Generates sync payloads as a stream of transport-sized chunks. It is the chunk-aware evolution of [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq), designed for large rowsets and for single BLOB/TEXT values that are larger than the configured chunk size.
+
+The maximum generated chunk size is controlled by the global `payload_max_chunk_size` setting. The default is 5 MB and the technical minimum is 256 KB:
+
+```sql
+SELECT cloudsync_set('payload_max_chunk_size', '5242880');
+```
+
+When a single encoded column value does not fit in one chunk, CloudSync transparently emits v3 payload fragments for that value. The receiver stages fragments internally and applies the value when all parts arrive. Fragments can arrive out of order; incomplete stale fragment groups are cleaned up automatically.
+
+`cloudsync_payload_chunks()` does not change the apply contract: [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload) accepts legacy payloads, monolithic payloads, and v3 chunk-fragment payloads. The local `payload_max_chunk_size` setting is not used to reject incoming payloads.
+
+**Important memory note:** chunking limits the size of each transport payload that CloudSync generates. It does not remove the database engine's need to materialize a single final cell value when applying a very large BLOB/TEXT column. In other words, a 500 MB BLOB can be transported in smaller chunks, but the receiving database must still be able to store and bind the completed 500 MB value when that row is applied.
+
+**Parameters:**
+
+- `since_db_version` (INTEGER/BIGINT, optional): Start after this source database version. If omitted, CloudSync uses the stored send checkpoint.
+- `filter_site_id` (BLOB, optional): Site ID whose changes should be encoded. If omitted, CloudSync uses the local site ID.
+- `until_db_version` (INTEGER/BIGINT, optional): Upper watermark to include. If omitted or `0`, CloudSync captures the current maximum source database version before streaming chunks.
+
+**Returns:** A rowset with one row per chunk:
+
+| Column | Description |
+|---|---|
+| `payload` | Payload BLOB to pass to `cloudsync_payload_apply()`. |
+| `chunk_index` | Zero-based chunk index for this stream. |
+| `payload_size` | Payload size in bytes. |
+| `rows` | Number of encoded payload rows in this chunk. Fragment chunks usually contain one fragment row. |
+| `db_version_min` | Minimum source `db_version` represented by this chunk. |
+| `db_version_max` | Maximum source `db_version` represented by this chunk. |
+| `watermark_db_version` | Stable upper watermark captured for this chunk stream. Store this after all chunks are durably transferred/applied. |
+
+**SQLite usage:** `cloudsync_payload_chunks` is exposed as a virtual table with hidden constraint columns:
+
+```sql
+-- Default: uses the stored send checkpoint and local site id
+SELECT payload, chunk_index, payload_size, watermark_db_version
+FROM cloudsync_payload_chunks
+ORDER BY chunk_index;
+
+-- Explicit arguments through hidden columns
+SELECT payload, chunk_index, payload_size, watermark_db_version
+FROM cloudsync_payload_chunks
+WHERE since_db_version = 100
+  AND site_id = cloudsync_siteid()
+  AND until_db_version = 200
+ORDER BY chunk_index;
+```
+
+**PostgreSQL usage:** `cloudsync_payload_chunks` is exposed as a set-returning function with three optional arguments:
+
+```sql
+-- Default: uses the stored send checkpoint and local site id
+SELECT *
+FROM cloudsync_payload_chunks();
+
+-- Explicit arguments
+SELECT *
+FROM cloudsync_payload_chunks(100, cloudsync_siteid(), 200);
+```
+
+**Apply example:**
+
+```sql
+-- Apply chunks on a receiving peer. Chunks may be applied one at a time.
+SELECT cloudsync_payload_apply(?);
+```
+
+On PostgreSQL, apply chunks as individual statements from the transport/client layer. Do not use a set-based statement such as `SELECT cloudsync_payload_apply(payload) FROM chunks_table;` while reading payloads from a table in the same database session. `cloudsync_payload_apply()` performs writes through SPI, and applying while the same statement is still scanning a payload table can conflict with PostgreSQL executor resource ownership. Fetch each payload into the client (or into a local procedural variable after the read completes) and then call `cloudsync_payload_apply()` for that single payload.
+
+---
+
+### `cloudsync_payload_apply(payload)`
+
+**Description:** Applies a sync payload to the current database. The function accepts all supported payload formats:
+
+- Legacy payloads generated by older SQLite Sync versions.
+- Monolithic payloads generated by [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq).
+- Chunk-fragment payloads generated by [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version).
+
+When a v3 fragment payload is received, CloudSync stores the fragment in an internal table and returns after applying zero or more completed values. Once the final fragment for a value is received, the completed value is validated and applied. Duplicate fragment delivery is idempotent.
+
+**Parameters:**
+
+- `payload` (BLOB/BYTEA): Payload BLOB to apply.
+
+**Returns:** Number of payload rows applied. Fragment payloads that are staged but not yet complete can return `0`.
+
+**Example:**
+
+```sql
+SELECT cloudsync_payload_apply(:payload);
+```
+
+---
+
 ## Network Functions
 
 ### `cloudsync_network_init(managedDatabaseId)`
@@ -500,6 +667,10 @@ This means: if you get JSON back, the server was reachable and the network proto
 
 **Description:** Sends all unsent local changes to the remote server.
 
+The send path streams payloads through [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version), so `payload_max_chunk_size` also limits the payloads generated for network transport. Each generated chunk is uploaded/applied independently; the local send checkpoint is advanced only after the chunk stream completes successfully.
+
+Chunk transport is transparent to the CloudSync backend. Each chunk is sent as a normal `/apply` payload, either inline as a base64 `blob` or through the upload `url` path. There is no separate chunk flag: old payloads, monolithic payloads, and v3 fragment payloads are distinguished by the payload format itself.
+
 **Parameters:** None.
 
 **Returns:** A JSON string with the send result:
 
@@ -4,6 +4,19 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
+## [Unreleased]
+
+### Added
+
+- **Chunked payload generation** via `cloudsync_payload_chunks()`, available as a SQLite virtual table and as a PostgreSQL set-returning function. The API emits transport-sized payload chunks and transparently fragments oversized BLOB/TEXT values into v3 fragment payloads.
+- **`payload_max_chunk_size` global setting** for controlling generated chunk size. The default is 5 MB and values below the 256 KB technical minimum are clamped.
+- **Payload chunking documentation** in `API.md` and `PERFORMANCE.md`, including the explicit memory note that chunking bounds transport payloads but the database must still materialize a completed single BLOB/TEXT value when it is applied.
+
+### Changed
+
+- `cloudsync_payload_apply()` now accepts legacy payloads, monolithic payloads, and v3 fragment payloads without enforcing the local `payload_max_chunk_size`, preserving compatibility between peers with different settings.
+- `cloudsync_network_send_changes()` now streams outgoing changes through `cloudsync_payload_chunks()` instead of first building one monolithic payload. This bounds transport payload size for the built-in network path and lets large rowsets or oversized BLOB/TEXT values flow through the same `/apply` endpoint as regular payloads.
+
 ## [1.0.20] - 2026-05-26
 
 ### Changed
 
@@ -41,7 +41,7 @@ SELECT ... FROM cloudsync_changes WHERE db_version > <last_synced_version>
 
 Each metadata table has an **index on `db_version`**, so payload generation scales primarily with the number of new changes, plus a small per-synced-table overhead to construct the `cloudsync_changes` query. It does not diff the full dataset. In SQLite, each changed column also performs a primary-key lookup in the base table to retrieve the current value.
 
-The resulting payload is LZ4-compressed before transmission.
+The legacy `cloudsync_payload_encode()` API builds one monolithic LZ4-compressed payload before transmission. For large deltas, `cloudsync_payload_chunks()` can be used instead: it streams a sequence of payload chunks bounded by the `payload_max_chunk_size` setting (default 5 MB, minimum 256 KB). If a single encoded BLOB/TEXT value is larger than the chunk budget, the value is split into transparent v3 fragments and reassembled by `cloudsync_payload_apply()` on the receiver.
 
 #### Pull: Payload Application
 
@@ -69,7 +69,7 @@ When the application runs sync off the main thread, perceived latency depends on
 
 - **Sync interval**: How often the app triggers a push/pull cycle. More frequent syncs mean smaller deltas (smaller D) and faster individual sync operations, at the cost of more network round-trips.
 - **Network latency**: The round-trip time to the sync server. LZ4 compression reduces payload size, but latency is dominated by the network hop itself for small deltas.
-- **Payload size**: Proportional to D x average column value size. Large BLOBs or TEXT values will increase transfer time linearly.
+- **Payload size**: Proportional to D x average column value size. Large BLOBs or TEXT values will increase transfer time linearly. Use `cloudsync_payload_chunks()` when transport payloads may be large; it limits each generated transport payload but does not change the size of the final database value.
 
 The extension does not impose a sync schedule  -- the application controls when and how often to sync. A typical pattern is to sync on a timer (e.g., every 5-30 seconds) or on specific events (app foreground, user action).
 
@@ -118,7 +118,11 @@ Normal application reads are not directly instrumented by the extension. No trig
 
 When a new device syncs for the first time (`db_version = 0`), the push payload contains the **entire dataset**: every column of every row across all synced tables. The payload size is proportional to `N * C` (total rows times columns).
 
-The payload is built entirely in memory, starting with a 512 KB buffer (`CLOUDSYNC_PAYLOAD_MINBUF_SIZE` in `src/cloudsync.c`) and growing via `realloc` as needed. Peak memory usage is at least the full uncompressed payload size and can be higher during compression. For a database with 1 million rows and 10 columns of average 50 bytes each, the uncompressed payload could reach ~500 MB before LZ4 compression.
+With the legacy `cloudsync_payload_encode()` API, the payload is built entirely in memory, starting with a 512 KB buffer (`CLOUDSYNC_PAYLOAD_MINBUF_SIZE` in `src/cloudsync.c`) and growing via `realloc` as needed. Peak memory usage is at least the full uncompressed payload size and can be higher during compression. For a database with 1 million rows and 10 columns of average 50 bytes each, the uncompressed payload could reach ~500 MB before LZ4 compression.
+
+For large initial syncs, prefer `cloudsync_payload_chunks()`. It keeps each generated transport payload bounded by `payload_max_chunk_size` and can fragment a single oversized BLOB/TEXT column across multiple v3 fragment payloads. This prevents the transport payload itself from growing without bound and avoids constructing a monolithic v2 payload during v3 apply.
+
+Important limitation: chunking does **not** make a single database cell streamable all the way into the storage engine. When the last fragment of a very large BLOB/TEXT value arrives, the receiver must still materialize the completed value once in order to bind/store it in the destination database. Size `payload_max_chunk_size` for transport safety, but size application memory limits for the largest individual value you allow.
 
 Subsequent syncs are incremental (proportional to D, changes since the last sync), so the first sync is the expensive one. Applications with large datasets should plan for this -- for example, by seeding new devices from a database snapshot rather than syncing from scratch.
 
@@ -185,6 +189,7 @@ CloudSync:        sync_time ~ O(D)           -- grows with changes since last sy
 2. **`db_version` index**: Enables efficient range scans for delta extraction.
 3. **Deferred batch merge**: Column changes for the same primary key are accumulated and flushed as a single SQL statement.
 4. **Prepared statement caching**: Merge statements are compiled once and reused across rows.
-5. **LZ4 compression**: Reduces payload size for network transfer.
-6. **Per-column tracking**: Only changed columns are included in the sync payload, not entire rows.
-7. **Early exit on stale data**: The CLS algorithm skips rows where the incoming causal length is lower than the local one, avoiding unnecessary column-level comparisons.
+5. **Chunked payload generation**: `cloudsync_payload_chunks()` bounds transport payload size and handles oversized single values with transparent v3 fragments.
+6. **LZ4 compression**: Reduces payload size for network transfer.
+7. **Per-column tracking**: Only changed columns are included in the sync payload, not entire rows.
+8. **Early exit on stale data**: The CLS algorithm skips rows where the incoming causal length is lower than the local one, avoiding unnecessary column-level comparisons.
@@ -219,6 +219,7 @@ See the full guide: **[Row-Level Security Documentation](./docs/row-level-securi
 ## Documentation
 
 - **[API Reference](./API.md)**: all functions, parameters, and examples
+- **[Performance & Overhead](./PERFORMANCE.md)**: sync cost model, payload chunking, and large-value memory notes
 - **[Installation Guide](./docs/installation.md)**: platform-specific setup (Swift, Android, Expo, React Native, Flutter, WASM)
 - **[Block-Level LWW Guide](./docs/block-lww.md)**: line-level text merge for markdown and documents
 - **[Row-Level Security Guide](./docs/row-level-security.md)**: multi-tenant access control with server-enforced policies
 
@@ -6,6 +6,7 @@ FROM postgres:${POSTGRES_TAG}
 # and install the matching server-dev package
 RUN apt-get update && apt-get install -y \
     build-essential \
+    postgresql-contrib-${PG_MAJOR} \
     postgresql-server-dev-${PG_MAJOR} \
     git \
     make \
 
@@ -44,7 +44,9 @@ RUN set -eux; \
     cd /usr/src/postgresql-17; \
     ./configure --enable-debug --enable-cassert --without-icu CFLAGS="-O0 -g3 -fno-omit-frame-pointer"; \
     make -j"$(nproc)"; \
-    make install
+    make install; \
+    make -C contrib/dblink -j"$(nproc)"; \
+    make -C contrib/dblink install
 
 ENV PATH="/usr/local/pgsql/bin:${PATH}"
 ENV LD_LIBRARY_PATH="/usr/local/pgsql/lib:${LD_LIBRARY_PATH}"