Skip to content

Expose BlockBasedTableOptions::uniform_cv_threshold and BlockSearchType::kAuto in the C API#14775

Open
zaidoon1 wants to merge 1 commit into
facebook:mainfrom
zaidoon1:expose_uniform_cv_threshold_and_kauto_c_api
Open

Expose BlockBasedTableOptions::uniform_cv_threshold and BlockSearchType::kAuto in the C API#14775
zaidoon1 wants to merge 1 commit into
facebook:mainfrom
zaidoon1:expose_uniform_cv_threshold_and_kauto_c_api

Conversation

@zaidoon1
Copy link
Copy Markdown
Contributor

Summary

The block-based table format gained an "auto" index-block search mode (table.h) that selects binary vs interpolation search per index block based on key uniformity. The C++ surface exposes this as two coupled knobs:

  • BlockSearchType::kAuto = 0x02 selects the per-block adaptive search at read time.
  • BlockBasedTableOptions::uniform_cv_threshold (default -1, i.e. disabled) is the coefficient-of-variation threshold checked on the write path to set the per-block is_uniform footer bit that kAuto reads.

This PR adds the missing C API coverage for both:

  1. rocksdb_block_based_table_index_block_search_type_auto = 2 enum constant. The existing setter rocksdb_block_based_options_set_index_block_search_type already does static_cast<BlockSearchType>(v), so kAuto = 2 was reachable today by passing the raw int — only the named constant was missing.

  2. rocksdb_block_based_options_set_uniform_cv_threshold(...) setter. The field had no C wrapper, so C/binding users could select kAuto but the is_uniform bit was never set on the write path, making kAuto degenerate to kBinary.

No getter is added: the surrounding rocksdb_block_based_options_set_* functions in c.h do not expose getters either, so adding one only here would be inconsistent with the local style.

Motivation

Without both pieces, kAuto is effectively unreachable from C. This matters for binding consumers (Rust, Go, Java-via-JNI shim, etc.) who want to opt index-block search into the per-block adaptive mode.

The setter mirrors the existing rocksdb_block_based_options_set_data_block_hash_ratio shape exactly (same struct, same double payload, same naming pattern), so review surface is minimal.

No change to the C++ API.

…pe::kAuto in the C API

The block-based table format gained an "auto" index-block search mode that
selects binary vs interpolation search per index block based on key
uniformity (see include/rocksdb/table.h). The C++ surface exposes this as
two coupled knobs that the C API only partially covers:

1. BlockSearchType::kAuto = 0x02 selects the per-block adaptive search at
   read time. The existing C API enum
   (rocksdb_block_based_table_index_block_search_type_*) only declares
   values 0 (binary) and 1 (interpolation). The setter
   rocksdb_block_based_options_set_index_block_search_type already accepts
   any int via static_cast<BlockSearchType>(v), so kAuto = 2 is reachable
   today by passing the raw int. Only the named constant was missing.

2. BlockBasedTableOptions::uniform_cv_threshold (default -1, i.e. disabled)
   is the coefficient-of-variation threshold checked on the write path to
   set the per-block is_uniform footer bit that kAuto reads. Without a way
   to set this from C, kAuto degenerates to kBinary because the is_uniform
   bit is never written. The C API did not expose this field at all.

Add:
- rocksdb_block_based_table_index_block_search_type_auto = 2 enum constant.
- rocksdb_block_based_options_set_uniform_cv_threshold setter, mirroring
  rocksdb_block_based_options_set_data_block_hash_ratio (the closest
  sibling — both are doubles tuning the block-based table format).

No getter is added: the surrounding rocksdb_block_based_options_set_*
functions in c.h do not expose getters either, so adding one only here
would be inconsistent with the local style.

No change to the C++ API.
@meta-cla meta-cla Bot added the CLA Signed label May 23, 2026
@github-actions
Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 56.4s.

@github-actions
Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 0db9f5d


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase

@github-actions
Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 0db9f5d


Summary

Clean, minimal C API addition that correctly follows all established RocksDB C API patterns. The enum value mapping is verified correct (kAuto = 0x02 = 2), the setter mirrors set_data_block_hash_ratio exactly, and the test exercises the correct combination of both settings.

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

No high-severity findings.

🟡 MEDIUM

No medium-severity findings.

🟢 LOW / NIT

L1. No input validation for NaN/infinity on uniform_cv_thresholddb/c.cc:3774
  • Issue: The setter accepts any double value including NaN and infinity without validation.
  • Root cause: The C++ layer (BlockBasedTableFactory::ValidateOptions) also does not validate uniform_cv_threshold for NaN/infinity, so this is not a regression. The analogous data_block_hash_table_util_ratio setter also lacks NaN checks. The downstream consumer (BlockBuilder::ScanForUniformity at block_builder.cc:407) treats negative values as disabled, and NaN comparisons return false, so NaN effectively disables the feature (safe behavior).
  • Suggested fix: Not required for this PR. If NaN validation is desired, it should be added to BlockBasedTableFactory::ValidateOptions in the C++ layer, not in the C wrapper. This would be a separate improvement applicable to multiple double options.
L2. Test only verifies no crash, not functional correctness — db/c_test.c:891
  • Issue: The test calls both setters but only verifies the DB opens and operates without crashing. It does not verify that the is_uniform bit is set or that interpolation search is used.
  • Root cause: This is consistent with how all block-based table option setters are tested in c_test.c. Functional correctness of kAuto is tested in C++ tests (block_test.cc).
  • Suggested fix: No change needed. The C API test appropriately validates that the C bindings compile, link, and don't crash. Functional testing belongs in the C++ test suite.

Cross-Component Analysis

Context Does code execute? Assumptions hold? Action needed?
WritePreparedTxnDB N/A (config setter) YES None
ReadOnly DB N/A (config setter) YES None
User-defined timestamps N/A (config setter) YES None
All compaction styles N/A (config setter) YES None

This PR only adds C API wrappers around existing C++ configuration fields. No new runtime code paths are introduced. The kAuto feature and uniform_cv_threshold are fully implemented and tested in C++ — this PR merely makes them reachable from C.

Assumption stress test:

  • Claim: "Enum value 2 in C maps to kAuto=0x02 in C++." Verified by reading table.h:289kAuto = 0x02. The static_cast<BlockSearchType>(2) in the existing set_index_block_search_type produces kAuto. Safe.
  • Claim: "No validation bypass." Verified by reading BlockBasedTableFactory::ValidateOptions() — no validation exists for uniform_cv_threshold in C++ either. The C wrapper is consistent.

Positive Observations

  • The PR correctly adds both pieces needed for kAuto to work from C (enum constant AND threshold setter). Without both, the feature would be incomplete.
  • Placement of the new setter between set_index_block_search_type and set_data_block_hash_ratio is logical — it groups related options together.
  • The release note is well-written and explains the relationship between the two additions clearly.
  • The test correctly sets both options together, demonstrating proper usage.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant