File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 11// SPDX-License-Identifier: Apache-2.0
22// SPDX-FileCopyrightText: Copyright the Vortex contributors
33
4- //! Compressor traits for type-specific compression.
4+ //! Type-specific compressor traits that drive scheme selection and compression.
5+ //!
6+ //! [`Compressor`] defines the interface: generate statistics for an array via
7+ //! [`Compressor::gen_stats`], and provide available [`Scheme`]s via [`Compressor::schemes`].
8+ //!
9+ //! [`CompressorExt`] is blanket-implemented for all `Compressor`s and adds the core logic:
10+ //!
11+ //! - [`CompressorExt::choose_scheme`] iterates all schemes, skips excluded ones, and calls
12+ //! [`Scheme::expected_compression_ratio`] on each. It returns the scheme with the highest ratio
13+ //! above 1.0, or falls back to the default. See the [`scheme`](crate::scheme) module for how
14+ //! ratio estimation works.
15+ //! - [`CompressorExt::compress`] generates stats, calls `choose_scheme()`, and applies the
16+ //! result. If compression did not shrink the array, the original is returned.
517
618use vortex_array:: ArrayRef ;
719use vortex_array:: IntoArray ;
Original file line number Diff line number Diff line change 1818//! - **Statistical Analysis**: Uses data sampling and statistics to predict compression ratios
1919//! - **Recursive Structure Handling**: Compresses nested structures like structs and lists
2020//!
21+ //! # How It Works
22+ //!
23+ //! [`BtrBlocksCompressor::compress()`] takes an `&ArrayRef` and returns an `ArrayRef` that may
24+ //! use a different encoding. It first canonicalizes the input, then dispatches by type.
25+ //! Primitives go to a type-specific `Compressor` (integer, float, or string). Compound types
26+ //! like structs and lists recurse into their fields and elements.
27+ //!
28+ //! Each type-specific compressor holds a static list of `Scheme` implementations (e.g.
29+ //! BitPacking, ALP, Dict). There is no dynamic registry. The compressor evaluates each scheme by
30+ //! compressing a ~1% sample and measuring the ratio, then picks the best. See `SchemeExt` for
31+ //! details on how sampling works.
32+ //!
33+ //! Schemes can produce arrays that are themselves further compressed (e.g. FoR then BitPacking),
34+ //! up to `MAX_CASCADE` (3) layers deep. An `Excludes` set prevents the same scheme from being
35+ //! applied twice in a chain.
36+ //!
2137//! # Example
2238//!
2339//! ```rust
Original file line number Diff line number Diff line change 11// SPDX-License-Identifier: Apache-2.0
22// SPDX-FileCopyrightText: Copyright the Vortex contributors
33
4- //! Compression scheme traits.
4+ //! Compression scheme traits. This is the interface each encoding implements to participate in
5+ //! compression.
6+ //!
7+ //! [`Scheme`] is the core trait. Each encoding (e.g. BitPacking, ALP, Dict) implements it with
8+ //! two key methods: [`Scheme::expected_compression_ratio`] to estimate how well it compresses
9+ //! the data, and [`Scheme::compress`] to apply the encoding. Type-specific sub-traits
10+ //! ([`IntegerScheme`], [`FloatScheme`], [`StringScheme`]) bind schemes to the appropriate stats
11+ //! and code types.
12+ //!
13+ //! [`SchemeExt`] provides the default ratio estimation strategy. It samples ~1% of the array
14+ //! (minimum [`SAMPLE_SIZE`] values), compresses the sample, and returns the before/after byte
15+ //! ratio. Schemes can override [`Scheme::expected_compression_ratio`] if they have a cheaper
16+ //! heuristic.
17+ //!
18+ //! [`IntegerScheme`]: crate::compressor::integer::IntegerScheme
19+ //! [`FloatScheme`]: crate::compressor::float::FloatScheme
20+ //! [`StringScheme`]: crate::compressor::string::StringScheme
21+ //! [`SAMPLE_SIZE`]: crate::stats::SAMPLE_SIZE
522
623use std:: fmt:: Debug ;
724use std:: hash:: Hash ;
You can’t perform that action at this time.
0 commit comments