Skip to content

Commit 9ce4b88

Browse files
authored
Merge pull request #483 from tidesdb/920-440
update design doc, c ref, tidesql reference based on 2 latest minors
2 parents 64819bf + 77f7fa5 commit 9ce4b88

13 files changed

Lines changed: 207 additions & 38 deletions

src/content/docs/articles/benchmark-analysis-tidesdb-v7-4-4-rocksdb-v10-9-1.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ I ran the benchmarks using `tidesdb_rocksdb.sh` within the <a href="https://gith
4040
**Gains from v7.4.3**
4141
![Gains from v7.4.3](/tidesdb-v7-4-4-rocksdb-v10-9-1/fig1.png)
4242

43-
Overall, v7.4.4 delivers consistent and often substantial performance gains across the majority of workloads. Many read-, seek-, and range-oriented workloads show improvements in the 1.2x1.6x range, with the largest gains exceeding 2.3x.
43+
Overall, v7.4.4 delivers consistent and often substantial performance gains across the majority of workloads. Many read-, seek-, and range-oriented workloads show improvements in the 1.2x-1.6x range, with the largest gains exceeding 2.3x.
4444

4545

4646
## TidesDB v7.4.4 & RocksDB v10.9.1 Comparisons

src/content/docs/articles/benchmark-analysis-tidesdb-v8-2-1-rocksdb-v10-10-1.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,16 +41,16 @@ The tool used for this analysis is the TidesDB benchtool project, which can be f
4141

4242
**PUT throughput across workloads**
4343
![PUT throughput across workloads](/benchmark-analysis-tidesdb-v8-2-1-rocksdb-v10-10-1/plot1.png)
44-
With a single RocksDB baseline per workload, both TidesDB formats are consistently faster on writes. The largest gain is sequential ingest - block-klog ~5.04x and btree-klog ~4.88x vs RocksDB (≈7.80M / 7.56M vs ≈1.55M ops/s baseline). Random and mixed writes remain strong ~1.571.59x on random write and ~1.401.46x on mixed random. Zipfian workloads also show robust advantages ~1.821.84x on Zipf write and ~1.791.86x on Zipf mixed.
44+
With a single RocksDB baseline per workload, both TidesDB formats are consistently faster on writes. The largest gain is sequential ingest - block-klog ~5.04x and btree-klog ~4.88x vs RocksDB (≈7.80M / 7.56M vs ≈1.55M ops/s baseline). Random and mixed writes remain strong ~1.57-1.59x on random write and ~1.40-1.46x on mixed random. Zipfian workloads also show robust advantages ~1.82-1.84x on Zipf write and ~1.79-1.86x on Zipf mixed.
4545

4646
**GET throughput across workloads**
4747
![GET throughput across workloads](/benchmark-analysis-tidesdb-v8-2-1-rocksdb-v10-10-1/plot2.png)
4848

49-
Reads are format - and workload-sensitive against the collapsed RocksDB baseline. On pure random read, block-klog is ~1.98x faster (≈3.07M vs ≈1.55M ops/s), while btree-klog is ~1.16x (≈1.80M vs ≈1.55M). On mixed random, block-klog is below baseline (~0.83x), but btree-klog becomes above baseline (~1.1x). On Zipf mixed, both formats are clearly ahead block-klog ~1.69x and btree-klog ~1.73x vs RocksDB baseline (≈3.153.23M vs ≈1.86M ops/s).
49+
Reads are format - and workload-sensitive against the collapsed RocksDB baseline. On pure random read, block-klog is ~1.98x faster (≈3.07M vs ≈1.55M ops/s), while btree-klog is ~1.16x (≈1.80M vs ≈1.55M). On mixed random, block-klog is below baseline (~0.83x), but btree-klog becomes above baseline (~1.1x). On Zipf mixed, both formats are clearly ahead block-klog ~1.69x and btree-klog ~1.73x vs RocksDB baseline (≈3.15-3.23M vs ≈1.86M ops/s).
5050

5151
**PUT p99 tail latency across workloads**
5252
![PUT p99 tail latency across workloads](/benchmark-analysis-tidesdb-v8-2-1-rocksdb-v10-10-1/plot3.png)
53-
Using the collapsed RocksDB baseline (and noting the two-run range), TidesDB generally improves tail latency on write-heavy workloads, especially sequential and Zipfian cases. For sequential write, TidesDB p99 is ≈1.551.62 ms versus RocksDB’s ≈56 ms range across the two baseline runs, indicating materially better tail behavior alongside the throughput advantage. Random write tail latency is closer block-klog is lower than RocksDB, while btree-klog is competitive but can be higher depending on workload shape - so the key point is that the major tail-latency win is most pronounced in seq/Zipf patterns, not uniformly in every random-heavy case.
53+
Using the collapsed RocksDB baseline (and noting the two-run range), TidesDB generally improves tail latency on write-heavy workloads, especially sequential and Zipfian cases. For sequential write, TidesDB p99 is ≈1.55-1.62 ms versus RocksDB’s ≈5-6 ms range across the two baseline runs, indicating materially better tail behavior alongside the throughput advantage. Random write tail latency is closer block-klog is lower than RocksDB, while btree-klog is competitive but can be higher depending on workload shape - so the key point is that the major tail-latency win is most pronounced in seq/Zipf patterns, not uniformly in every random-heavy case.
5454

5555
**On-disk database size after workload**
5656
![On-disk database size after workload](/benchmark-analysis-tidesdb-v8-2-1-rocksdb-v10-10-1/plot4.png)
@@ -60,7 +60,7 @@ For example, on seq write, RocksDB baseline is ≈200.9 MB, block-klog ≈132.8
6060

6161
**Space amplification factor (lower is better)**
6262
![Space amplification factor (lower is better)](/benchmark-analysis-tidesdb-v8-2-1-rocksdb-v10-10-1/plot5.png)
63-
Against the single RocksDB baseline, block-klog is consistently the most space-efficient, while btree-klog incurs high space amplification on uniform workloads. On seq/random/mixed-rand writes, RocksDB baseline is roughly ~0.1050.18x, block-klog improves further to ~0.080.12x, but btree-klog is about ~1.05× (roughly an order of magnitude higher than RocksDB in these cases). On Zipf write, all engines improve, but the ordering remains - RocksDB baseline ≈0.11x, block-klog ≈0.02x, btree-klog ≈0.16x.
63+
Against the single RocksDB baseline, block-klog is consistently the most space-efficient, while btree-klog incurs high space amplification on uniform workloads. On seq/random/mixed-rand writes, RocksDB baseline is roughly ~0.105-0.18x, block-klog improves further to ~0.08-0.12x, but btree-klog is about ~1.05× (roughly an order of magnitude higher than RocksDB in these cases). On Zipf write, all engines improve, but the ordering remains - RocksDB baseline ≈0.11x, block-klog ≈0.02x, btree-klog ≈0.16x.
6464

6565
So the results are rather interesting as you can see, if your priority is space efficiency and strong read performance on pure random reads with some more memory usage, block-klog is the clear winner (tiny DB sizes + very low space amp + top GET throughput on random read). Though if your priority is mixed-workload GET throughput (and you can tolerate much larger footprint on uniform workloads) the btree klog format can be attractive, especially where "mixed random" GET matters.
6666

src/content/docs/articles/benchmark-analysis-tidesdb-v8-6-0-rocksdb-v10-10-1.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -124,19 +124,19 @@ Sequential write p99/p50 ratio is 1.9x for TidesDB vs 4.4x for RocksDB, where th
124124

125125
**Write Amplification**
126126

127-
TidesDB stays between 1.031.21 across all workloads; RocksDB ranges 1.221.51. The 1524% gap translates directly to less SSD wear, less background I/O contention, and tighter tail latencies. Tightest amplification is on large values (1.03 vs 1.22) and the widest gap is on 50M small-value writes (1.21 vs 1.51).
127+
TidesDB stays between 1.03-1.21 across all workloads; RocksDB ranges 1.22-1.51. The 15-24% gap translates directly to less SSD wear, less background I/O contention, and tighter tail latencies. Tightest amplification is on large values (1.03 vs 1.22) and the widest gap is on 50M small-value writes (1.21 vs 1.51).
128128

129129
![Write Amplification](/tidesdb-v8-6-0-rocksdb-v10-10-1/plots1/10_write_amplification.png)
130130

131131
**Space Efficiency**
132132

133-
Sequential 10M keys land at 111 MB vs 208 MB (47% smaller), random 10M at 87 MB vs 142 MB (38%). Space amplification ratios are TidesDB 0.070.14 vs RocksDB 0.080.19. Sequential writes produce the tightest compaction on our sorted runs.
133+
Sequential 10M keys land at 111 MB vs 208 MB (47% smaller), random 10M at 87 MB vs 142 MB (38%). Space amplification ratios are TidesDB 0.07-0.14 vs RocksDB 0.08-0.19. Sequential writes produce the tightest compaction on our sorted runs.
134134

135135
![Space Efficiency](/tidesdb-v8-6-0-rocksdb-v10-10-1/plots1/11_space_efficiency.png)
136136

137137
**Resource Usage**
138138

139-
TidesDB uses ~4x more memory (2,035 MB vs 485 MB peak RSS on sequential writes), an intentional trade-off for speed. We write 1824% less data to disk. CPU is higher on writes (582% vs 258%) due to more aggressive parallelism across 8 threads. The new `max_memory_usage` cap in v8.6.0 keeps this bounded.
139+
TidesDB uses ~4x more memory (2,035 MB vs 485 MB peak RSS on sequential writes), an intentional trade-off for speed. We write 18-24% less data to disk. CPU is higher on writes (582% vs 258%) due to more aggressive parallelism across 8 threads. The new `max_memory_usage` cap in v8.6.0 keeps this bounded.
140140

141141
![Resource Usage](/tidesdb-v8-6-0-rocksdb-v10-10-1/plots1/12_resource_usage.png)
142142

@@ -154,7 +154,7 @@ On 4KB values, TidesDB p99/avg ratio is 1.92x (36,427 µs / 18,959 µs) while Ro
154154

155155
**Latency Variability**
156156

157-
Write CV is TidesDB 2535% vs RocksDB 200497%, making us 919x more consistent on writes as RocksDB's compaction stalls create huge latency spikes. Read and seek CV reverses with RocksDB's random read CV at 48% vs our 163%. The higher relative variability is spread around much smaller absolute numbers (2 µs vs 4.6 µs), meaning faster reads with a bit more jitter.
157+
Write CV is TidesDB 25-35% vs RocksDB 200-497%, making us 9-19x more consistent on writes as RocksDB's compaction stalls create huge latency spikes. Read and seek CV reverses with RocksDB's random read CV at 48% vs our 163%. The higher relative variability is spread around much smaller absolute numbers (2 µs vs 4.6 µs), meaning faster reads with a bit more jitter.
158158

159159
![Latency Variability](/tidesdb-v8-6-0-rocksdb-v10-10-1/plots1/15_latency_variability.png)
160160

@@ -242,7 +242,7 @@ Sequential writes at 8 threads show TidesDB p50 1,194 µs, p99 2,035 µs (ratio
242242

243243
**Write Amplification**
244244

245-
TidesDB ranges 1.041.23 across all workloads while RocksDB ranges 1.231.75. The gap is wider here than in Environment 1, with RocksDB's sequential write amplification at 16 threads hitting 1.75 (versus 1.07 for TidesDB). Zipfian remains the tightest at 1.04 for TidesDB.
245+
TidesDB ranges 1.04-1.23 across all workloads while RocksDB ranges 1.23-1.75. The gap is wider here than in Environment 1, with RocksDB's sequential write amplification at 16 threads hitting 1.75 (versus 1.07 for TidesDB). Zipfian remains the tightest at 1.04 for TidesDB.
246246

247247
![Write Amplification](/tidesdb-v8-6-0-rocksdb-v10-10-1/plots2/10_write_amplification.png)
248248

@@ -282,7 +282,7 @@ Write CV for TidesDB sequential is 16.4% vs RocksDB at 8.2%, and interestingly R
282282

283283
TidesDB v8.6.0 outperforms RocksDB v10.10.1 across the vast majority of workloads on both environments. On Environment 1 (i7-11700K, 48 GB, SATA SSD, 8 threads), speedups range from 1.27x to 4.97x with a geometric mean around 2.2x. On Environment 2 (Threadripper 2950X, 128 GB, NVMe, 8 + 16 threads), the same workloads show wider margins, with sequential write speedups reaching 8.32x at 8 threads and 9.90x at 16 threads, and synchronous writes scaling to 14.7x at 16 threads.
284284

285-
The consistent strengths across both environments include sequential and batched writes (510x), range scans (23x), seeks with skewed access patterns (35x), large-value writes (34x with dramatically better tail latency), and low write amplification (1.031.23 vs 1.221.75). The consistent weaknesses include single-key writes and deletes without batching (RocksDB wins by 1.52.5x), read/seek latency variability (RocksDB delivers more uniform timing despite higher absolute latencies), and higher memory usage (~46x RSS at 8 threads, narrowing to ~1.7x at 16 threads).
285+
The consistent strengths across both environments include sequential and batched writes (5-10x), range scans (2-3x), seeks with skewed access patterns (3-5x), large-value writes (3-4x with dramatically better tail latency), and low write amplification (1.03-1.23 vs 1.22-1.75). The consistent weaknesses include single-key writes and deletes without batching (RocksDB wins by 1.5-2.5x), read/seek latency variability (RocksDB delivers more uniform timing despite higher absolute latencies), and higher memory usage (~4-6x RSS at 8 threads, narrowing to ~1.7x at 16 threads).
286286

287287
Environment 2 also exposed a new weak spot not visible at 8 threads. Under extreme concurrent write pressure at 16 threads, the mixed random GET path drops as the back-pressure systems over-throttle operations.
288288

src/content/docs/articles/benchmark-analysis-tidesdb-v8-7-1-rocksdb-11-0-3.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -102,13 +102,13 @@ Sequential write p99/p50 ratio is 1.82x for TidesDB (611/1111 µs) vs 1.48x for
102102

103103
**Write Amplification**
104104

105-
TidesDB stays between 1.041.25 across all workloads; RocksDB ranges 1.051.52. The tightest is Zipfian at 1.04 vs 1.05 where hot-key overwrites keep both engines lean. The widest gap is on 50M small-value random writes at 1.25 vs 1.52. Lower write amplification means less SSD wear and less background I/O contention.
105+
TidesDB stays between 1.04-1.25 across all workloads; RocksDB ranges 1.05-1.52. The tightest is Zipfian at 1.04 vs 1.05 where hot-key overwrites keep both engines lean. The widest gap is on 50M small-value random writes at 1.25 vs 1.52. Lower write amplification means less SSD wear and less background I/O contention.
106106

107107
![Write Amplification](/analysis-tidesdb-v8-7-1-rocksdb-11-0-3/10_write_amplification.png)
108108

109109
**Space Efficiency**
110110

111-
Sequential 10M keys land at 111 MB vs 205 MB (46% smaller), random 10M at 90 MB vs 140 MB (36% smaller). Small-value 50M sits at 522 MB vs 503 MB (4% larger for TidesDB). Large-value 1M is 302 MB vs 348 MB (13% smaller). Space amplification ratios are TidesDB 0.070.14 vs RocksDB 0.080.19.
111+
Sequential 10M keys land at 111 MB vs 205 MB (46% smaller), random 10M at 90 MB vs 140 MB (36% smaller). Small-value 50M sits at 522 MB vs 503 MB (4% larger for TidesDB). Large-value 1M is 302 MB vs 348 MB (13% smaller). Space amplification ratios are TidesDB 0.07-0.14 vs RocksDB 0.08-0.19.
112112

113113
![Space Efficiency](/analysis-tidesdb-v8-7-1-rocksdb-11-0-3/11_space_efficiency.png)
114114

@@ -132,7 +132,7 @@ On 4KB values TidesDB p99/avg is 1.79x (35,326 µs / 19,723 µs) while RocksDB i
132132

133133
**Latency Variability**
134134

135-
Write CV is TidesDB 1138% vs RocksDB 11457% across write workloads. Zipfian writes are the tightest at 11% for both engines. Random write CV is 37% for TidesDB vs 253% for RocksDB, a 6.8x consistency advantage. Read CV shows TidesDB random reads at 187% vs RocksDB at 48%, the same pattern as before where higher relative variability sits around much smaller absolute latencies (2.13 µs vs 5.06 µs). Random seek CV is very high for TidesDB at 22,750% but that's a quirk of sub-microsecond median latency where even tiny absolute jitter produces a large coefficient.
135+
Write CV is TidesDB 11-38% vs RocksDB 11-457% across write workloads. Zipfian writes are the tightest at 11% for both engines. Random write CV is 37% for TidesDB vs 253% for RocksDB, a 6.8x consistency advantage. Read CV shows TidesDB random reads at 187% vs RocksDB at 48%, the same pattern as before where higher relative variability sits around much smaller absolute latencies (2.13 µs vs 5.06 µs). Random seek CV is very high for TidesDB at 22,750% but that's a quirk of sub-microsecond median latency where even tiny absolute jitter produces a large coefficient.
136136

137137
![Latency Variability](/analysis-tidesdb-v8-7-1-rocksdb-11-0-3/15_latency_variability.png)
138138

@@ -142,7 +142,7 @@ TidesDB v8.7.1 delivers rather great improvements across the board versus v8.6.x
142142

143143
The backpressure consolidation from per-op to per-column-family-per-commit fixed TidesDB's historical weakness on single-key operations. Batch-1 writes and single-key deletes now favor TidesDB where previously RocksDB won, and the double-sleep elimination means mixed workloads no longer over-throttle under combined L0 and memory pressure.
144144

145-
The raw byte cache replacing the old block cache improved cache utilization and shows up in the random read improvement from 2.23x to 2.43x. Write amplification remains consistently lower than RocksDB at 1.041.25 vs 1.051.52, and space efficiency holds with 3646% smaller on-disk sizes for standard workloads.
145+
The raw byte cache replacing the old block cache improved cache utilization and shows up in the random read improvement from 2.23x to 2.43x. Write amplification remains consistently lower than RocksDB at 1.04-1.25 vs 1.05-1.52, and space efficiency holds with 36-46% smaller on-disk sizes for standard workloads.
146146

147147
The reaper's stack-allocated eviction buffer and the pwritev block manager writes are less visible in the headline numbers but contribute to the overall consistency, removing per-cycle mallocs and reducing syscalls on the write path.
148148

src/content/docs/articles/benchmark-analysis-tidesql-v1-0-0-innodb-in-mariadb-v12-1-2.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -97,22 +97,22 @@ This plot shows a large and meaningful gap. InnoDB uses roughly 12 MB to store a
9797

9898
That's not noise and it's not tuning; it's a consequence of structure. InnoDB pays for B-trees, pages, free space, and metadata. If storage footprint matters, this result alone is hard to ignore.
9999

100-
## P95 latency INSERT
100+
## P95 latency - INSERT
101101

102-
![P95 latency INSERT](/benchmark-analysis-tidesdb-innodb-in-mariadb-v12-1-2-feb1-2026/fig10.png)
102+
![P95 latency - INSERT](/benchmark-analysis-tidesdb-innodb-in-mariadb-v12-1-2-feb1-2026/fig10.png)
103103

104104
At the 95th percentile, TidesDB inserts are dramatically more predictable. InnoDB shows a long tail, with occasional stalls that push p95 close to 0.4 ms, while TidesDB stays well under 0.1 ms
105105

106106

107-
## P95 latency SELECT
107+
## P95 latency - SELECT
108108

109-
![P95 latency SELECT](/benchmark-analysis-tidesdb-innodb-in-mariadb-v12-1-2-feb1-2026/fig11.png)
109+
![P95 latency - SELECT](/benchmark-analysis-tidesdb-innodb-in-mariadb-v12-1-2-feb1-2026/fig11.png)
110110

111111
For reads, the situation reverses. InnoDB's p95 SELECT latency is significantly lower, while TidesDB shows both higher average and worse tail latency obviously.
112112

113-
## P95 latency UPDATE
113+
## P95 latency - UPDATE
114114

115-
![P95 latency UPDATE](/benchmark-analysis-tidesdb-innodb-in-mariadb-v12-1-2-feb1-2026/fig12.png)
115+
![P95 latency - UPDATE](/benchmark-analysis-tidesdb-innodb-in-mariadb-v12-1-2-feb1-2026/fig12.png)
116116

117117
Updates land somewhere in between. TidesDB again has tighter tail latency, while InnoDB shows more variance and higher p95
118118

src/content/docs/articles/tidesdb-8-optional-lsmb+.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ This figure captures the core architectural win. For point lookups and seeks, th
4848
## PUT tail latency (p95 with p99 markers)
4949
![PUT tail latency (p95 with p99 markers)](/tidesdb-8-optional-lsmb+/plotC_put_latency_tail.png)
5050

51-
In random, mixed, and populate phases, p95 latency drops by roughly 3040%, and p99 follows the same trend. The exception is range-populate, where B+tree p95 is worse, consistent with its lower PUT throughput there, but not too concerning.
51+
In random, mixed, and populate phases, p95 latency drops by roughly 30-40%, and p99 follows the same trend. The exception is range-populate, where B+tree p95 is worse, consistent with its lower PUT throughput there, but not too concerning.
5252

5353
## Read / seek / range tail latency (log scale)
5454
![Read / seek / range tail latency (log scale)](/tidesdb-8-optional-lsmb+/plotD2_read_seek_range_latency_log.png)
@@ -58,7 +58,7 @@ On a log scale, this plot highlights how the B+tree collapses read-side tail lat
5858
## On-disk database size
5959
![ On-disk database size](/tidesdb-8-optional-lsmb+/plotE2_db_size.png)
6060

61-
The B+tree variant consumes an order of magnitude more disk space than the block layout in several workloads (~1.11.2GB vs ~100MB). This makes sense, as the block layout is highly optimized for space efficiency.
61+
The B+tree variant consumes an order of magnitude more disk space than the block layout in several workloads (~1.1-1.2GB vs ~100MB). This makes sense, as the block layout is highly optimized for space efficiency.
6262

6363
## Peak RSS
6464
![ Peak RSS](/tidesdb-8-optional-lsmb+/plotG2_peak_rss_mb.png)

0 commit comments

Comments
 (0)