You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On startup, the server applies this option to all existing column families. The option is read-only at runtime.
1333
+
1334
+
#### Commonly configured parameters
1335
+
1336
+
*`write_buffer_size` — Size of a single memtable. When the limit is reached, the memtable is frozen and scheduled for flush to an SST (Sorted String Table)
1337
+
file.
1338
+
1339
+
*`max_write_buffer_number` — Maximum number of memtables that can accumulate in memory (one active, others waiting to flush). Raising `max_write_buffer_number` helps absorb
1340
+
bursts of writes.
1341
+
1342
+
*`max_bytes_for_level_base` — Total size limit for level 1 of the LSM (Log-Structured Merge) tree; the level-1 limit influences how large subsequent levels become.
1343
+
1344
+
*`target_file_size_base` — Target size for a single SST file at level 1. Combined with level size limits, `target_file_size_base` affects how many files exist per level.
1345
+
1346
+
*`compression_per_level` — Compression algorithm per level (for example LZ4, ZSTD) to balance CPU and disk space.
1347
+
1348
+
*`block_based_table_factory` — Nested settings for blocks: Bloom filters, index types, block cache behavior.
1349
+
1350
+
*`level0_file_num_compaction_trigger` — How many L0 (level 0) files trigger a compaction.
1351
+
1352
+
#### Benefits of tuning
1353
+
1354
+
Centralized control over compaction style, memory, and
1355
+
I/O (input/output) parallelism; adjusting the `rocksdb_default_cf_options`
1356
+
string for the hardware (SSD versus HDD) is the
1357
+
primary way to optimize MyRocks throughput.
1358
+
1359
+
The default varies by MyRocks version but generally balances LZ4 compression
1360
+
with moderate buffer sizes (for example, 64 MB memtables). The default value
Specifies the default column family options for MyRocks. On startup, the server applies this option to all existing column families. This option is
1324
-
read-only at runtime.
1367
+
#### Breakdown of the main components
1368
+
1369
+
1.**Block-based table options** — How data is laid out and cached inside SST
1370
+
(Sorted String Table) files:
1371
+
1372
+
*`cache_index_and_filter_blocks=1` — Forces the index and Bloom filter data into the RocksDB block cache instead of pinning them outside the cache, for better control of total memory.
1373
+
1374
+
*`filter_policy=bloomfilter:10:false` — Bloom filter with 10 bits per key. The `false` refers to `use_block_based_builder`, this setting uses the modern, more efficient Full Filter format.
1375
+
1376
+
*`whole_key_filtering=1` — Hashes the entire key in the Bloom filter for the fastest possible performance for point lookups.
1377
+
1378
+
2.**Compaction and layout** — `level_compaction_dynamic_level_bytes=true`
1379
+
adjusts per-level byte limits from the bottom level, reducing space
1380
+
amplification and making sizing more self-tuning.
1381
+
`compaction_pri=kMinOverlappingRatio` prefers compactions that free the most
checks on the bottommost level where hits are statistically more likely,
1386
+
saving CPU (central processing unit) time.
1387
+
1388
+
4.**Compression** — `compression=kLZ4Compression` and
1389
+
`bottommost_compression=kLZ4Compression` use LZ4 for low CPU overhead and
1390
+
solid general-purpose compression.
1325
1391
1326
1392
### `rocksdb_delayed_write_rate`
1327
1393
@@ -1759,22 +1825,48 @@ This variable controls whether to write and check RocksDB file-level checksums.
1759
1825
| Data type | Numeric |
1760
1826
| Default | 1 |
1761
1827
1762
-
Specifies whether to sync on every transaction commit,
1763
-
similar to [innodb_flush_log_at_trx_commit :octicons-link-external-16:](https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit).
1764
-
Enabled by default, which ensures ACID compliance.
1828
+
Specifies whether the RocksDB Write-Ahead Log (WAL) is synchronized to disk
By default, the setting is enabled (`1`), which ensures ACID compliance by
1833
+
guaranteeing that committed transactions are durable even in the event of a
1834
+
crash. Choosing less strict values can improve performance at the cost of
1835
+
durability.
1836
+
1837
+
#### Possible values
1838
+
1839
+
The variable accepts `0`, `1`, or `2`; the following describes each value:
1840
+
1841
+
***`0` (Do not sync on commit)**
1842
+
1843
+
Compared with `1`, which waits for a durable WAL sync on every commit, and with `2`, which still writes the WAL on each commit but defers durable sync to a background thread, `0` does not flush or sync the WAL on commit. That removes the most commit-time I/O of the three settings, so you usually get the highest throughput and lowest commit latency, but you also accept the weakest durability: after a crash, recently committed work may be missing or the database may be inconsistent, often by a wider margin than the roughly once-per-second window commonly associated with `2`, and far beyond what `1` allows. The outcomes are as follows.
1844
+
1845
+
* Leaving the WAL unflushed and unsynced on transaction commit.
1846
+
1847
+
* Minimizing commit-time I/O relative to `1` and `2`.
1765
1848
1766
-
Possible values:
1849
+
* Risking extensive data loss or inconsistency after a crash compared with stricter settings.
1767
1850
1768
-
*`0`: Do not sync on transaction commit.
1769
-
This provides better performance, but may lead to data inconsistency
1770
-
in case of a crash.
1851
+
***`1` (Sync on every commit) [Default]**
1771
1852
1772
-
*`1`: Sync on every transaction commit.
1773
-
This is set by default and recommended
1774
-
as it ensures data consistency,
1775
-
but reduces performance.
1853
+
Compared with `0`, which does not flush or sync the WAL on commit, and with `2`, which writes the WAL on each commit but batches durable sync, `1` makes every commit wait until the WAL is durably on disk (typically a full sync such as `fsync`) before the commit returns. That is the usual choice when a successful commit must survive a crash: you get the strongest durability and ACID guarantees of the three settings. The tradeoff is the most synchronous disk work per commit, so commit latency and sustained write throughput are often lower than with `0` or `2` when commits are frequent or when disk sync is slow. The outcomes are as follows.
1776
1854
1777
-
*`2`: Sync every second.
1855
+
* Writing and syncing the WAL to disk at each transaction commit.
1856
+
1857
+
* Ensuring full durability and ACID compliance for committed work.
1858
+
1859
+
* Incurring the highest per-commit I/O and typically the slowest commits of the three settings.
1860
+
1861
+
***`2` (Sync in background, typically once per second)**
1862
+
1863
+
With `1`, each commit waits until the WAL is durably on disk (typically a full sync such as `fsync`) before the commit returns. With `2`, each commit still writes the WAL, but the session usually does not wait for that durable sync; a background thread performs syncs on a schedule (for example, about once per second). So individual commits can return faster than with `1`, because they skip the per-commit sync wait, at the cost of possibly losing the last second of commits after a crash. The outcomes are as follows.
1864
+
1865
+
* Recording each commit in the WAL without blocking the commit on a full durable sync every time.
1866
+
1867
+
* Balancing performance and durability.
1868
+
1869
+
* Risking the loss of up to about one second of committed transactions after a crash.
1778
1870
1779
1871
### `rocksdb_flush_memtable_on_analyze`
1780
1872
@@ -1815,10 +1907,34 @@ This provides better accuracy, but may reduce performance.
1815
1907
| Dynamic | Yes |
1816
1908
| Scope | Global |
1817
1909
| Data type | Numeric |
1818
-
| Default | 60000000 |
1910
+
| Default | 60000000 (60 seconds) |
1911
+
1912
+
This variable determines how long (in microseconds) MyRocks caches statistics
1913
+
gathered from the memtables for the query optimizer. When the optimizer
1914
+
evaluates a query, it needs row-count estimates; data not yet flushed to disk
1915
+
requires scanning memtables for accurate statistics.
1916
+
1917
+
#### How it works
1918
+
1919
+
**The cache:** To avoid the CPU cost of re-scanning memtables for every query,
1920
+
MyRocks stores the results in a cache.
1819
1921
1820
-
Specifies for how long the cached value of memtable statistics should
1821
-
be used instead of computing it every time during the query plan analysis.
1922
+
**The timer:** This variable defines the expiration of that cache.
1923
+
1924
+
Default is `60000000` (60 seconds).
1925
+
1926
+
Specifies for how long the cached value of memtable statistics should be used
1927
+
instead of computing it on every query plan analysis.
1928
+
1929
+
#### Key trade-offs
1930
+
1931
+
**Higher value (for example, several minutes):** Improves performance in
1932
+
high-query-rate environments by reducing how often statistics collection runs.
1933
+
The optimizer may use stale data if the table is being updated rapidly.
1934
+
1935
+
**Lower value (for example, 1 second):** Gives the optimizer a near-real-time
1936
+
view of the data and can yield better plans on volatile workloads, at the cost
1937
+
of more CPU use during query optimization.
1822
1938
1823
1939
### `rocksdb_force_flush_memtable_and_lzero_now`
1824
1940
@@ -2387,10 +2503,32 @@ Allowed range is up to `64`.
2387
2503
| Data type | Numeric |
2388
2504
| Default | 2 GB |
2389
2505
2390
-
Specifies the maximum total size of WAL (write-ahead log) files,
2391
-
after which memtables are flushed.
2392
-
Default value is `2 GB`
2393
-
The allowed range is up to `9223372036854775807`.
2506
+
This setting limits the total disk space consumed by Write Ahead Log (WAL)
2507
+
files across all column families. The limit helps prevent log files from
2508
+
exhausting disk capacity.
2509
+
2510
+
Specifies the maximum total size of WAL files, after which memtables are
2511
+
flushed. Default value is `2 GB`. The allowed range is up to
2512
+
`9223372036854775807`.
2513
+
2514
+
#### How it works
2515
+
2516
+
**The trigger:** When the combined size of all WAL files exceeds this
2517
+
threshold, RocksDB identifies the oldest logs and forces a flush of their
2518
+
associated memtables to SST files.
2519
+
2520
+
**The result:** Once the data is safely in an SST file, the corresponding
2521
+
WAL files are deleted or archived, bringing total usage back under the
2522
+
limit.
2523
+
2524
+
#### Key trade-offs
2525
+
2526
+
**Higher limit:** Improves write performance by allowing larger, infrequent
2527
+
flushes. Disk usage increases and recovery time after a crash
2528
+
lengthens (more log data to replay).
2529
+
2530
+
**Lower limit:** Keeps disk footprint small and recovery fast, but may
2531
+
cause frequent forced flushes, which can throttle write throughput.
2394
2532
2395
2533
### `rocksdb_merge_buf_size`
2396
2534
@@ -2547,7 +2685,37 @@ The dafault value is `ON` which means this variable is enabled.
2547
2685
| Data type | Unsigned Integer |
2548
2686
| Default | 0 |
2549
2687
2550
-
The variable was implemented in [Percona Server for MySQL 8.0.27-18](release-notes/Percona-Server-8.0.27-18.md). Maximum memory to use when sorting an unmaterialized group for partial indexes. The 0(zero) value is defined as no limit.
2688
+
The variable was implemented in [Percona Server for MySQL 8.0.27-18](release-notes/Percona-Server-8.0.27-18.md).
2689
+
2690
+
This variable sets the memory threshold (in bytes) for MyRocks to perform an
2691
+
in-memory sort when a query is only partially satisfied by an index.
2692
+
2693
+
**The default: `0` (uncapped)**
2694
+
2695
+
When set to `0`, the memory limit is effectively removed.
2696
+
2697
+
**The result:** MyRocks may use as much RAM as needed to perform the sort
2698
+
in-memory.
2699
+
2700
+
**The benefit:** Maximum performance for partial index scans by avoiding slow
2701
+
disk-based filesorts.
2702
+
2703
+
**The risk:** Without a cap, a large query, or many concurrent queries, could
2704
+
consume all available system memory, potentially leading to an out-of-memory
2705
+
(OOM) crash.
2706
+
2707
+
#### Why change it
2708
+
2709
+
Setting this to a non-zero value (for example, `16777216` for 16 MB) introduces
2710
+
a safety governor.
2711
+
2712
+
**Control:** MyRocks uses the optimized in-memory sort path only if the
2713
+
result set fits within the defined memory budget.
2714
+
2715
+
**Stability:** If a sort requires more than the cap, MyRocks falls back to a
2716
+
standard filesort. That path avoids unbounded memory use and protects overall
2717
+
server stability, but affected queries often take longer to complete because
2718
+
sorting uses disk (or temp files) instead of staying entirely in memory.
2551
2719
2552
2720
### `rocksdb_pause_background_work`
2553
2721
@@ -3291,9 +3459,34 @@ Disabled by default.
3291
3459
3292
3460
The variable was implemented in [Percona Server for MySQL 8.0.33-25](release-notes/8.0.33-25.md).
3293
3461
3294
-
If enabled, this variable uses HyperClockCache instead of default LRUCache for RocksDB.
3462
+
This setting replaces the standard LRU (Least Recently Used) block cache with
3463
+
a lock-free HyperClockCache implementation.
3295
3464
3296
-
This variable is disabled (OFF) by default.
3465
+
If enabled, MyRocks uses HyperClockCache instead of the default LRUCache for
3466
+
RocksDB. The variable is disabled (`OFF`) by default.
3467
+
3468
+
#### Key benefits:
3469
+
3470
+
**High concurrency:** Intended for many-core systems (16+ cores). Reduces the
3471
+
global lock bottleneck found in traditional LRU caches.
3472
+
3473
+
**CPU efficiency:** Uses a clock algorithm instead of a linked list, avoiding
3474
+
expensive memory writes and synchronization on every cache hit.
3475
+
3476
+
#### Trade-offs:
3477
+
3478
+
**Performance:** Can offer significantly higher throughput under heavy read or
3479
+
scan workloads.
3480
+
3481
+
**Memory:** Uses a fixed-size hash table, which can have slightly higher
3482
+
per-entry memory overhead than a standard LRU cache.
3483
+
3484
+
**Precision:** Approximate LRU ordering is less precise but faster to maintain.
3485
+
3486
+
#### When to use
3487
+
3488
+
Enable if CPU profiling shows high mutex contention within the
3489
+
RocksDB block cache or on high core-count servers.
3297
3490
3298
3491
### `rocksdb_use_io_uring`
3299
3492
@@ -3456,10 +3649,31 @@ Allowed range is up to `9223372036854775807`.
3456
3649
| Data type | Boolean |
3457
3650
| Default | ON |
3458
3651
3459
-
Specifies whether the bloomfilter should use the whole key for filtering
3460
-
instead of just the prefix.
3461
-
Enabled by default.
3462
-
Make sure that lookups use the whole key for matching.
3652
+
The `rocksdb_whole_key_filtering` variable determines whether the Bloom filter
3653
+
stores a hash of the entire key or just the prefix. The option is part of
3654
+
RocksDB `BlockBasedTableOptions` and is enabled (`ON`) by default in MyRocks.
3655
+
3656
+
Specifies whether the Bloom filter should use the whole key for filtering
3657
+
instead of just the prefix. Make sure that lookups use the whole key for
3658
+
matching when whole-key filtering is enabled.
3659
+
3660
+
#### How it works
3661
+
3662
+
***Enabled (default):** Both the whole key and the prefix are added to the Bloom
3663
+
filter. Storing both yields the most accurate filtering for point lookups (for
3664
+
example, `WHERE pk = 10`), so the engine can skip SST files that definitely do
3665
+
not contain the key.
3666
+
3667
+
***Disabled:** Only the prefix is stored in the Bloom filter. Because there are
3668
+
typically fewer unique prefixes than unique keys, Bloom filters are much
0 commit comments