Skip to content

Commit 0f43437

Browse files
authored
Merge pull request #379 from tidesdb/updates751
data correction links in new article
2 parents c4d5a79 + 819349b commit 0f43437

2 files changed

Lines changed: 69 additions & 2 deletions

File tree

src/content/docs/articles/benchmark-analysis-tidesql-v3-1-0-innodb-mariadb-12-2-2.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,6 @@ That's all for now.
108108
Data:
109109
| File Name | Format | SHA256 Checksum |
110110
|-----------|--------|----------------|
111-
| <a href="tidesql-v3-1-0-innodb-mariadb-12-2-2/detail_20260216_215126.csv">detail_20260216_215126.csv</a> | CSV | ca9bc0adf999be76cc1acaad702565f3cd0aa3cc17400e291737e5265b750c07 |
112-
| <a href="tidesql-v3-1-0-innodb-mariadb-12-2-2/summary_20260216_215126.csv">summary_20260216_215126.csv</a> | CSV | 74e7373c8a5c7a6b7538b2b3fbdcbb2479d741b475e90d520365ffb948c789d0 |
111+
| <a href="/tidesql-v3-1-0-innodb-mariadb-12-2-2/detail_20260216_215126.csv">detail_20260216_215126.csv</a> | CSV | ca9bc0adf999be76cc1acaad702565f3cd0aa3cc17400e291737e5265b750c07 |
112+
| <a href="/tidesql-v3-1-0-innodb-mariadb-12-2-2/summary_20260216_215126.csv">summary_20260216_215126.csv</a> | CSV | 74e7373c8a5c7a6b7538b2b3fbdcbb2479d741b475e90d520365ffb948c789d0 |
113113

src/content/docs/reference/c.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -625,6 +625,73 @@ if (tidesdb_get_cache_stats(db, &cache_stats) == 0)
625625
The block cache is a database-level resource shared across all column families. It caches deserialized klog blocks to avoid repeated disk I/O and deserialization. Configure cache size via `config.block_cache_size` when opening the database. Set to 0 to disable caching.
626626
:::
627627

628+
### Range Cost Estimation
629+
630+
`tidesdb_range_cost` estimates the computational cost of iterating between two keys in a column family. The returned value is an opaque double — meaningful only for comparison with other values from the same function. It uses only in-memory metadata and performs no disk I/O.
631+
632+
```c
633+
int tidesdb_range_cost(tidesdb_column_family_t *cf,
634+
const uint8_t *key_a, size_t key_a_size,
635+
const uint8_t *key_b, size_t key_b_size,
636+
double *cost);
637+
```
638+
639+
**Parameters**
640+
| Name | Type | Description |
641+
|------|------|-------------|
642+
| `cf` | `tidesdb_column_family_t*` | Column family to estimate cost for |
643+
| `key_a` | `const uint8_t*` | First key (bound of range) |
644+
| `key_a_size` | `size_t` | Size of first key |
645+
| `key_b` | `const uint8_t*` | Second key (bound of range) |
646+
| `key_b_size` | `size_t` | Size of second key |
647+
| `cost` | `double*` | Output: estimated traversal cost (higher = more expensive) |
648+
649+
**Returns**
650+
- `TDB_SUCCESS` on success
651+
- `TDB_ERR_INVALID_ARGS` on bad input (NULL pointers, zero-length keys)
652+
653+
**Example**
654+
```c
655+
tidesdb_column_family_t *cf = tidesdb_get_column_family(db, "my_cf");
656+
if (!cf) return -1;
657+
658+
double cost_a = 0.0, cost_b = 0.0;
659+
660+
tidesdb_range_cost(cf, (uint8_t *)"user:0000", 9,
661+
(uint8_t *)"user:0999", 9, &cost_a);
662+
663+
tidesdb_range_cost(cf, (uint8_t *)"user:1000", 9,
664+
(uint8_t *)"user:1099", 9, &cost_b);
665+
666+
if (cost_a < cost_b)
667+
{
668+
printf("Range A is cheaper to iterate\n");
669+
}
670+
```
671+
672+
**How it works**
673+
674+
The function walks all SSTable levels and uses in-memory metadata to estimate how many blocks and entries fall within the given key range:
675+
676+
- With block indexes enabled · Uses O(log B) binary search per overlapping SSTable to find the block slots containing each key bound. The block span between slots, scaled by `index_sample_ratio`, gives the estimated block count.
677+
- Without block indexes · Falls back to byte-level key interpolation. The leading 8 bytes of each key are converted to a numeric position within the SSTable's min/max key range to estimate the fraction of blocks covered.
678+
- B+tree SSTables (`use_btree=1`) · Uses the same key interpolation against tree node counts, plus tree height as a seek cost. Only applies to column families configured with B+tree klog format.
679+
- Compression · Compressed SSTables receive a 1.5× weight multiplier to account for decompression overhead.
680+
- Merge overhead · Each overlapping SSTable adds a small fixed cost for merge-heap operations.
681+
- Memtable · The active memtable's entry count contributes a small in-memory cost.
682+
683+
Key order does not matter — the function normalizes the range so `key_a > key_b` produces the same result as `key_b > key_a`.
684+
685+
**Use cases**
686+
- Query planning · Compare candidate key ranges to find the cheapest one to scan
687+
- Load balancing · Distribute range scan work across threads by estimating per-range cost
688+
- Adaptive prefetching · Decide how aggressively to prefetch based on range size
689+
- Monitoring · Track how data distribution changes across key ranges over time
690+
691+
:::note[Cost Values]
692+
The returned cost is not an absolute measure (it does not represent milliseconds, bytes, or entry counts). It is a relative scalar — only meaningful when compared with other `tidesdb_range_cost` results. A cost of 0.0 means no overlapping SSTables or memtable entries were found for the range.
693+
:::
694+
628695
### Compression Algorithms
629696

630697
TidesDB supports multiple compression algorithms to reduce storage footprint and I/O bandwidth. Compression is applied to both klog (key-log) and vlog (value-log) blocks before writing to disk.

0 commit comments

Comments
 (0)