Merge pull request #515 from tidesdb/tdb935

guycipher · web-flow · commit fbe34fee87cf · 2026-06-06T02:48:12.000-04:00
update design doc, building, doc and c reference for tidesdb v935
diff --git a/src/content/docs/getting-started/how-does-tidesdb-work.md b/src/content/docs/getting-started/how-does-tidesdb-work.md
@@ -130,7 +130,7 @@ Repeatable Read remembers every key it read, along with the version it saw. At c
 
 Snapshot Isolation detects write-write conflicts only, with first-committer-wins. It keeps no read set; its commit aborts if another transaction wrote one of its keys after its snapshot began. It deliberately allows write skew — two transactions reading overlapping data and writing disjoint keys — because that matches the textbook definition, under which snapshot isolation requires only write-write conflict detection.
 
-Serializable adds read-write conflict tracking on top of snapshot isolation, implementing serializable snapshot isolation (SSI). Only Repeatable Read and Serializable allocate a read set; once that set passes 64 entries it is backed by an xxHash table for O(1) conflict checks. At commit the engine examines all concurrent transactions: if transaction T read a key that another transaction T′ wrote, it marks an outgoing conflict on T and an incoming conflict on T′. A transaction carrying both an incoming and an outgoing conflict is a pivot in a "dangerous structure," and its commit aborts. This is a deliberately simplified SSI: it detects pivots but builds no precedence graph and does no cycle detection, so it can occasionally abort a transaction that was in fact serializable.
+Serializable adds read-write conflict tracking on top of snapshot isolation, implementing serializable snapshot isolation (SSI). Only Repeatable Read and Serializable allocate a read set; once that set passes 64 entries it is backed by an xxHash table for O(1) conflict checks. At commit the engine examines other concurrent serializable transactions: if transaction T read a key that another transaction T′ wrote, it marks an outgoing conflict on T and an incoming conflict on T′. A transaction carrying both an incoming and an outgoing conflict is a pivot in a "dangerous structure," and its commit aborts. This is a deliberately simplified SSI: it detects pivots but builds no precedence graph and does no cycle detection, so it can occasionally abort a transaction that was in fact serializable.
 
 ### Transactions Across Column Families
 
@@ -197,7 +197,7 @@ The L0 stall bounds the queue of frozen memtables, but not the active memtable t
 
 Level 1 is watched alongside L0 because a high L1 count means compaction is falling behind, and a compaction backlog eventually starves flushing too (flushers wait on compaction to free space). Throttling on L1 therefore acts as a leading indicator, applying pressure before L0 becomes critical and heading off a cascade.
 
-The per-column-family signals above cannot, by themselves, prevent an out-of-memory condition when many column families fill up at once. So a separate global guard runs in the reaper thread every 100ms. It sums all the memory the database is using — active and immutable memtables, in-flight transaction buffers, compaction scratch space, bloom filters, block indexes, and caches — and divides by a resolved limit (`max_memory_usage`, default half of system RAM, never less than 5%). The resulting pressure level is graduated: normal below 60%, elevated to 75%, high to 95%, critical above. The write path reads this level with one atomic load per commit, so it costs nothing at normal pressure. As pressure climbs, the response escalates: at elevated, the flush threshold tightens and the current family is flushed proactively; at high, the current family is force-flushed and the reaper force-flushes the largest non-flushing family; at critical, writes block entirely until the reaper brings pressure down (timing out after 10 seconds with `TDB_ERR_BUSY`), while the reaper force-flushes every non-flushing family and aggressively compacts the one with the most SSTables. In unified mode, where one memtable is shared, the reaper rotates that single memtable instead of iterating empty per-CF ones. As a last line of defense, an OS-level check polls real free memory every few seconds and forces the level to critical if free RAM drops below 5%, catching consumption that TidesDB's own accounting cannot see.
+The per-column-family signals above cannot, by themselves, prevent an out-of-memory condition when many column families fill up at once. So a separate global guard runs in the reaper thread every 100ms. It sums all the memory the database is using — active and immutable memtables, in-flight transaction buffers, compaction scratch space, bloom filters, block indexes, and caches — and divides by a resolved limit (`max_memory_usage`, default 75% of system RAM, never less than 5%). The resulting pressure level is graduated: normal below 60%, elevated to 75%, high to 95%, critical above. The write path reads this level with one atomic load per commit, so it costs nothing at normal pressure. As pressure climbs, the response escalates: at elevated, the flush threshold tightens and the current family is flushed proactively; at high, the current family is force-flushed, the reaper force-flushes the largest non-flushing family, and it aggressively compacts the family with the most SSTables; at critical, writes block entirely until the reaper brings pressure down (timing out after 10 seconds with `TDB_ERR_BUSY`), while the reaper force-flushes every non-flushing family. In unified mode, where one memtable is shared, the reaper rotates that single memtable instead of iterating empty per-CF ones. As a last line of defense, an OS-level check polls real free memory every few seconds and forces the level to critical if free RAM drops below 5%, catching consumption that TidesDB's own accounting cannot see.
 
 The point of the whole scheme is smooth degradation. Increasing the write-buffer size trades flush frequency against memory used during stalls; raising the stall threshold trades memory for burst tolerance; adding flush workers drains the queue faster; and `max_memory_usage` caps the whole envelope. The right settings depend on the write pattern, the available memory, and the disk — but in every case the system slows down gradually as it approaches its limits, rather than swinging between full speed and a dead stop.
 ## The Read Path
@@ -348,7 +348,7 @@ The work that does not happen on the caller's thread happens here (Figure 7). Fl
 <img src="/design-diags/07_background_workers.png" alt="Figure 7. Background worker pools.">
 </div>
 
-Flush workers (default 2) take frozen memtables off the queue and write them to SSTables, in parallel across column families. Compaction workers (default 2) merge SSTables across levels, in parallel across families, and fan out within a single round through sub-compaction. The sync worker (1 thread, started only if any WAL uses interval sync) periodically fsyncs the WALs configured for it; it finds the smallest configured interval, sleeps that long, and syncs each due WAL. Column families on interval sync also force an explicit fsync at structural boundaries — when a memtable rotates, and during every sorted-run creation and merge — which preserves durability while still batching ordinary writes.
+Flush workers (default auto, min of CPU count and 4) take frozen memtables off the queue and write them to SSTables, in parallel across column families. Compaction workers (default 2) merge SSTables across levels, in parallel across families, and fan out within a single round through sub-compaction. The sync worker (1 thread, started only if any WAL uses interval sync) periodically fsyncs the WALs configured for it; it finds the smallest configured interval, sleeps that long, and syncs each due WAL. Column families on interval sync also force an explicit fsync at structural boundaries — when a memtable rotates, and during every sorted-run creation and merge — which preserves durability while still batching ordinary writes.
 
 The reaper (1 thread) runs a maintenance loop every 100ms and is the system's general groundskeeper. Each cycle it sweeps the deferred-free list, retries flushes that were deferred under the concurrency cap, services any compaction triggers that arrived while a compaction was already running, recomputes global memory pressure and acts on it, and evicts idle SSTable file handles when too many are open. The memory-pressure response was described with [Backpressure](#backpressure-and-flow-control); the two pieces of bookkeeping unique to the reaper are worth a word each.
 
@@ -407,13 +407,13 @@ The bloom false-positive rate, 1% by default, balances memory against effectiven
 
 Memtable size trades flush frequency against recovery time and memory. Larger memtables flush less often but lengthen recovery and use more memory; smaller ones flush more (more SSTables, more compaction) but recover faster. The 64MB default holds about a million small pairs and flushes every few seconds under moderate load. Doubling it halves flush frequency but raises level-1-to-level-2 amplification, since each flush produces a larger table that takes longer to merge.
 
-Worker counts default to two flush and two compaction threads, which give cross-family parallelism at modest cost. More threads help with many active families but cost memory (each buffers 64KB blocks) and descriptors (two per table in flight). The device dominates the choice: on a spinning disk, several concurrent compactors cause head seeks that destroy throughput; on NVMe, more workers help. So 1–2 workers for HDD, 4–8 for NVMe.
+Worker counts default to auto flush threads (the CPU count, capped at 4) and two compaction threads, which give cross-family parallelism at modest cost. More threads help with many active families but cost memory (each buffers 64KB blocks) and descriptors (two per table in flight). The device dominates the choice: on a spinning disk, several concurrent compactors cause head seeks that destroy throughput; on NVMe, more workers help. So 1–2 workers for HDD, 4–8 for NVMe.
 
 ## Operational Considerations
 
 A TidesDB instance is safe for many threads in one process but exclusive to a single process: only one process may open a database directory at a time. Exclusivity is a non-blocking file lock taken during open — if another process holds it, open returns `TDB_ERR_LOCKED` at once rather than waiting. The locking primitive is chosen per platform for correct semantics: `fcntl` locks on macOS and BSD (which, unlike `flock`, are not inherited across `fork`, with the owning PID written to the lock file so a same-process double-open is caught), OFD locks on modern Linux, and `LockFileEx` on Windows, with retries on signal interruption so a stray signal cannot spuriously fail the lock.
 
-Memory use per family comes from a few structures: the active memtable is configurable (default 64MB) and the immutable queue is that size times its depth (usually 1–2); the block cache is shared across families (default 64MB total); bloom filters cost about 10 bits per key and block indexes about 32 bytes per block. A family with 10M keys across 100 SSTables therefore runs around 150MB plus its share of the cache. The `max_memory_usage` cap (default auto, resolving to half of system RAM, never clamped below 5%) bounds the aggregate across all families, which is what prevents an out-of-memory condition in many-family deployments where per-family limits cannot.
+Memory use per family comes from a few structures: the active memtable is configurable (default 64MB) and the immutable queue is that size times its depth (usually 1–2); the block cache is shared across families (default 64MB total); bloom filters cost about 10 bits per key and block indexes about 32 bytes per block. A family with 10M keys across 100 SSTables therefore runs around 150MB plus its share of the cache. The `max_memory_usage` cap (default auto, resolving to 75% of system RAM, never clamped below 5%) bounds the aggregate across all families, which is what prevents an out-of-memory condition in many-family deployments where per-family limits cannot.
 
 Three operational limits interact at the margins. When writes outpace compaction, backpressure stalls them once the flush queue passes its threshold, trading occasional latency spikes for bounded memory. Because SSTables are immutable, space is reclaimed only after a compaction finishes and deletes its inputs, so a compaction can briefly need double the space of the level it rewrites; the engine checks free space before starting one. And because each SSTable holds two descriptors open, a working set larger than the open-file budget makes the reaper thrash; an operator who wants a bigger resident set can raise the process's descriptor ceiling before opening the database, after which the engine sizes its budget to fit. The raise is opt-in and a partial failure is non-fatal.
 ## On-Disk Format
diff --git a/src/content/docs/reference/building.md b/src/content/docs/reference/building.md
@@ -469,6 +469,7 @@ TidesDB provides several CMake options to customize the build:
 | `TIDESDB_BUILD_TESTS` | Build test suite | `ON` |
 | `BUILD_SHARED_LIBS` | Build shared libraries instead of static | `ON` (Unix), `OFF` (Windows) |
 | `ENABLE_READ_PROFILING` | Enable read profiling instrumentation | `OFF` |
+| `TIDESDB_WARN_MAYBE_UNINIT` | Enable `-Wmaybe-uninitialized` (GCC only; requires an optimized build) | `OFF` |
 | `TIDESDB_WITH_SNAPPY` | Build with Snappy compression support | `ON` (`OFF` on SunOS) |
 | `TIDESDB_WITH_LZ4` | Build with LZ4 compression support | `ON` |
 | `TIDESDB_WITH_ZSTD` | Build with Zstandard compression support | `ON` |
diff --git a/src/content/docs/reference/c.md b/src/content/docs/reference/c.md
@@ -195,7 +195,7 @@ tidesdb_finalize();
 ```c
 tidesdb_config_t config = {
     .db_path = "./mydb",
-    .num_flush_threads = 2,                       /* Flush thread pool size (default: 2) */
+    .num_flush_threads = 2,                       /* Flush thread pool size (default: 0 = auto, min(cpu_count, 4)) */
     .num_compaction_threads = 2,                  /* Compaction thread pool size (default: 2) */
     .log_level = TDB_LOG_INFO,                    /* Log level: TDB_LOG_DEBUG, TDB_LOG_INFO, TDB_LOG_WARN, TDB_LOG_ERROR, TDB_LOG_FATAL, TDB_LOG_NONE */
     .block_cache_size = 64 * 1024 * 1024,         /* 64MB global block cache (default: 64MB) */
@@ -525,6 +525,7 @@ if (tidesdb_rename_column_family(db, "old_name", "new_name") != 0)
 
 **Return values**
 - `TDB_SUCCESS` · Rename completed successfully
+- `TDB_ERR_INVALID_ARGS` · `db`, `old_name`, or `new_name` is NULL
 - `TDB_ERR_NOT_FOUND` · Column family with `old_name` doesn't exist
 - `TDB_ERR_EXISTS` · Column family with `new_name` already exists
 - `TDB_ERR_IO` · Failed to rename directory on disk
@@ -774,7 +775,7 @@ if (tidesdb_get_db_stats(db, &db_stats) == 0)
 | `unified_next_cf_index` | `uint32_t` | Next CF prefix index to assign in unified mode |
 | `unified_wal_generation` | `uint64_t` | Current generation number of the unified WAL |
 | `object_store_enabled` | `int` | 1 if an object store connector is attached |
-| `object_store_connector` | `const char*` | Name of the object store connector ("fs", "s3", or NULL) |
+| `object_store_connector` | `const char*` | Name of the object store connector ("fs", "s3", "unknown", or NULL when no store is attached) |
 | `local_cache_bytes_used` | `size_t` | Bytes currently used by the local SSTable cache (object store mode) |
 | `local_cache_bytes_max` | `size_t` | Local cache capacity in bytes (object store mode) |
 | `local_cache_num_files` | `int` | Number of SSTable files resident in the local cache |
@@ -813,8 +814,8 @@ if (tidesdb_get_cache_stats(db, &cache_stats) == 0)
         printf("Cache enabled: yes\n");
         printf("Total entries: %zu\n", cache_stats.total_entries);
         printf("Total bytes: %.2f MB\n", cache_stats.total_bytes / (1024.0 * 1024.0));
-        printf("Hits: %lu\n", cache_stats.hits);
-        printf("Misses: %lu\n", cache_stats.misses);
+        printf("Hits: %" PRIu64 "\n", cache_stats.hits);
+        printf("Misses: %" PRIu64 "\n", cache_stats.misses);
         printf("Hit rate: %.1f%%\n", cache_stats.hit_rate * 100.0);
         printf("Partitions: %zu\n", cache_stats.num_partitions);
     }
@@ -2152,7 +2153,8 @@ if (tidesdb_compact_range(cf, start, sizeof(start) - 1, end, sizeof(end) - 1) !=
 **Return values**
 
 - `TDB_SUCCESS` on success
-- `TDB_ERR_INVALID_ARGS` if `cf` or either key pointer is NULL, or sizes are zero
+- `TDB_ERR_INVALID_ARGS` if `cf` is NULL, if both `start_key` and `end_key` are NULL, or if a non-NULL key has size zero (a single NULL key is allowed and means an unbounded bound on that side)
+- `TDB_ERR_LOCKED` if another compaction is already running on the column family
 - Standard I/O and memory error codes if the merge cannot complete
 
 ### Purge Column Family
@@ -2272,7 +2274,7 @@ TidesDB uses separate thread pools for flush and compaction operations. Understa
 ```c
 tidesdb_config_t config = {
     .db_path = "./mydb",
-    .num_flush_threads = 4,                /* Flush thread pool size (default: 2) */
+    .num_flush_threads = 4,                /* Flush thread pool size (default: 0 = auto, min(cpu_count, 4)) */
     .num_compaction_threads = 4,           /* Compaction thread pool size (default: 2) */
     .max_concurrent_flushes = 0,           /* 0 = auto-match num_flush_threads (recommended) */
     .log_level = TDB_LOG_INFO,