Skip to content

Commit aa33a86

Browse files
authored
Add high-level docs to vortex-btrblocks (#6984)
## Summary Decided to add some docs to `vortex-btrblocks` as I went through it to try and understand it better. ## Testing N/A Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
1 parent c58d993 commit aa33a86

3 files changed

Lines changed: 47 additions & 2 deletions

File tree

vortex-btrblocks/src/compressor/mod.rs

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,19 @@
11
// SPDX-License-Identifier: Apache-2.0
22
// SPDX-FileCopyrightText: Copyright the Vortex contributors
33

4-
//! Compressor traits for type-specific compression.
4+
//! Type-specific compressor traits that drive scheme selection and compression.
5+
//!
6+
//! [`Compressor`] defines the interface: generate statistics for an array via
7+
//! [`Compressor::gen_stats`], and provide available [`Scheme`]s via [`Compressor::schemes`].
8+
//!
9+
//! [`CompressorExt`] is blanket-implemented for all `Compressor`s and adds the core logic:
10+
//!
11+
//! - [`CompressorExt::choose_scheme`] iterates all schemes, skips excluded ones, and calls
12+
//! [`Scheme::expected_compression_ratio`] on each. It returns the scheme with the highest ratio
13+
//! above 1.0, or falls back to the default. See the [`scheme`](crate::scheme) module for how
14+
//! ratio estimation works.
15+
//! - [`CompressorExt::compress`] generates stats, calls `choose_scheme()`, and applies the
16+
//! result. If compression did not shrink the array, the original is returned.
517
618
use vortex_array::ArrayRef;
719
use vortex_array::IntoArray;

vortex-btrblocks/src/lib.rs

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,22 @@
1818
//! - **Statistical Analysis**: Uses data sampling and statistics to predict compression ratios
1919
//! - **Recursive Structure Handling**: Compresses nested structures like structs and lists
2020
//!
21+
//! # How It Works
22+
//!
23+
//! [`BtrBlocksCompressor::compress()`] takes an `&ArrayRef` and returns an `ArrayRef` that may
24+
//! use a different encoding. It first canonicalizes the input, then dispatches by type.
25+
//! Primitives go to a type-specific `Compressor` (integer, float, or string). Compound types
26+
//! like structs and lists recurse into their fields and elements.
27+
//!
28+
//! Each type-specific compressor holds a static list of `Scheme` implementations (e.g.
29+
//! BitPacking, ALP, Dict). There is no dynamic registry. The compressor evaluates each scheme by
30+
//! compressing a ~1% sample and measuring the ratio, then picks the best. See `SchemeExt` for
31+
//! details on how sampling works.
32+
//!
33+
//! Schemes can produce arrays that are themselves further compressed (e.g. FoR then BitPacking),
34+
//! up to `MAX_CASCADE` (3) layers deep. An `Excludes` set prevents the same scheme from being
35+
//! applied twice in a chain.
36+
//!
2137
//! # Example
2238
//!
2339
//! ```rust

vortex-btrblocks/src/scheme.rs

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,24 @@
11
// SPDX-License-Identifier: Apache-2.0
22
// SPDX-FileCopyrightText: Copyright the Vortex contributors
33

4-
//! Compression scheme traits.
4+
//! Compression scheme traits. This is the interface each encoding implements to participate in
5+
//! compression.
6+
//!
7+
//! [`Scheme`] is the core trait. Each encoding (e.g. BitPacking, ALP, Dict) implements it with
8+
//! two key methods: [`Scheme::expected_compression_ratio`] to estimate how well it compresses
9+
//! the data, and [`Scheme::compress`] to apply the encoding. Type-specific sub-traits
10+
//! ([`IntegerScheme`], [`FloatScheme`], [`StringScheme`]) bind schemes to the appropriate stats
11+
//! and code types.
12+
//!
13+
//! [`SchemeExt`] provides the default ratio estimation strategy. It samples ~1% of the array
14+
//! (minimum [`SAMPLE_SIZE`] values), compresses the sample, and returns the before/after byte
15+
//! ratio. Schemes can override [`Scheme::expected_compression_ratio`] if they have a cheaper
16+
//! heuristic.
17+
//!
18+
//! [`IntegerScheme`]: crate::compressor::integer::IntegerScheme
19+
//! [`FloatScheme`]: crate::compressor::float::FloatScheme
20+
//! [`StringScheme`]: crate::compressor::string::StringScheme
21+
//! [`SAMPLE_SIZE`]: crate::stats::SAMPLE_SIZE
522
623
use std::fmt::Debug;
724
use std::hash::Hash;

0 commit comments

Comments
 (0)