Skip to content

Commit b994a06

Browse files
author
Claude Agent
committed
FSST contains: extract scan-routing into a cost-model planner
Replaces the hardcoded `if let Some(...) { ... } else if ...` cascade inside `FoldedContainsDfa::scan_to_bitbuf` (and the smaller cascades in `FlatContainsDfa` / `MultiContainsDfa`) with a single `ScanPlanner` that picks a `ScanPlan` up front and dispatches through one match. New `dfa/planner.rs` (~430 lines) exposes: - `ScanPlan` — one variant per legacy cascade branch, plus a reserved `ShiftOr` slot for Task A. Slot is `cfg_attr`-gated dead_code outside the test harness. - `ScanContext` — borrowed inputs (n, all_bytes, ssa codes, bucket summaries, escape-only flag) the planner reads in O(1). - `ScanPlanner::plan_folded` / `plan_flat_or_multi` — rules-based routing that replicates the legacy cascade exactly (locked in by `test_planner_matches_legacy_cascade` against every fsst_contains bench needle on every bench corpus). - `ssa_saturated` and `escape_pair_targets` moved here as the single source of truth. - `ArchProfile::detect()` runs CPUID once at `ScanPlanner::new()`; the arch is cached for the lifetime of the DFA. - `ScanPlanner::estimated_cost_ns` returns approximate per-call cost. Calibrated from `DESIGN.md` numbers and benches/fsst_like.rs: * triple Teddy: AVX-512 4.28 GB/s, AVX2 2.74 GB/s, NEON 2.5, scalar 0.8 * pair Teddy: AVX-512 5.50, AVX2 3.30, NEON 3.0, scalar 1.0 * 1-byte: AVX-512 12.0, AVX2 8.0, NEON 7.0, scalar 2.0 * memmem ~25 GB/s, row-loop ~150 ns/row Today the cost is diagnostic only (the routing is rules-based); the constants exist for VORTEX_FSST_PLAN_TRACE and to make later comparison-based selection mechanical. `FoldedContainsDfa::scan_to_bitbuf` now extracts each path into a `run_*` helper (`run_escape_only`, `run_one_byte_saturated`, `run_triple_teddy`, `run_escape_pair`, `run_pair_teddy`, `run_one_byte_bitset`, `run_row_loop`) and dispatches via `match plan { ... }`. The Teddy-trace `VORTEX_FSST_TEDDY_TRACE` output is preserved verbatim, and a new `VORTEX_FSST_PLAN_TRACE=1` prints the planner's chosen plan plus inputs and the estimated cost. `FlatContainsDfa` and `MultiContainsDfa` route through the same planner (only `EscapeOnly` vs `RowLoop`) so the dispatch surface is uniform across the three contains DFAs. Regression guards added: - `test_planner_matches_legacy_cascade` runs every fsst_contains bench's underlying call (12 corpus × needle pairs) and asserts `planner.plan() == legacy_path_for(...)`. Future changes can't silently re-route traffic. - 11 unit tests in `planner::tests` cover each routing decision row, cost-model monotonicity, and `ScanPlan::name` uniqueness. No algorithmic changes — every existing scan path is invoked under the same conditions as before, so benches are at parity. Checks: - cargo test -p vortex-fsst --lib --features _test-harness: 184 passed - cargo test -p vortex-fsst --lib: 182 passed - cargo +nightly fmt --all: clean - cargo clippy -p vortex-fsst --all-targets --all-features: no new lints in changed files (pre-existing lints in dfa_compressed/, anchor_scan.rs:3100+, mod.rs:498, multi_contains.rs:405 untouched). - cargo bench -p vortex-fsst --bench fsst_like --features _test-harness: benches compile and `fsst_contains_htt_{cb,urls}` / `fsst_contains_https_urls` run inside expected timings. Signed-off-by: Claude Agent <claude-agent@anthropic.com>
1 parent f0c8b51 commit b994a06

6 files changed

Lines changed: 1248 additions & 272 deletions

File tree

encodings/fsst/src/dfa/flat_contains.rs

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ use super::build_fused_table;
3636
use super::build_symbol_transitions;
3737
use super::kmp_byte_transitions;
3838
use super::needle_bytes_absent_from_all_symbols;
39+
use super::planner::ScanContext;
40+
use super::planner::ScanPlan;
41+
use super::planner::ScanPlanner;
3942
use super::scan_to_bitbuf_with;
4043
use super::skip::SkipStrategy;
4144

@@ -58,6 +61,10 @@ pub(crate) struct FlatContainsDfa {
5861
/// with a single `memmem` over `all_bytes` rather than running the
5962
/// sentinel-branching per-code DFA on every row.
6063
escape_only_pattern: Option<Vec<u8>>,
64+
/// Routing engine. The flat DFA only routes between `EscapeOnly`
65+
/// and `RowLoop`, but going through the planner keeps the
66+
/// dispatch surface uniform across the three contains DFAs.
67+
planner: ScanPlanner,
6168
}
6269

6370
impl FlatContainsDfa {
@@ -118,6 +125,7 @@ impl FlatContainsDfa {
118125
skip,
119126
anchor,
120127
escape_only_pattern,
128+
planner: ScanPlanner::new(),
121129
})
122130
}
123131

@@ -176,10 +184,18 @@ impl FlatContainsDfa {
176184
where
177185
T: vortex_array::dtype::IntegerPType,
178186
{
179-
if let Some(pattern) = self.escape_only_pattern.as_deref() {
180-
return self.scan_via_escape_only_memmem(n, offsets, all_bytes, pattern, negated);
187+
let ctx = ScanContext::for_flat_or_multi(n, all_bytes, self.escape_only_pattern.is_some());
188+
match self.planner.plan_flat_or_multi(&ctx) {
189+
ScanPlan::EscapeOnly => {
190+
let pattern = self
191+
.escape_only_pattern
192+
.as_deref()
193+
.vortex_expect("EscapeOnly plan requires escape_only_pattern");
194+
self.scan_via_escape_only_memmem(n, offsets, all_bytes, pattern, negated)
195+
}
196+
// The planner only emits these two for the flat DFA today.
197+
_ => scan_to_bitbuf_with(n, offsets, all_bytes, negated, |codes| self.matches(codes)),
181198
}
182-
scan_to_bitbuf_with(n, offsets, all_bytes, negated, |codes| self.matches(codes))
183199
}
184200

185201
/// Single-`memmem` prefilter for the escape-only regime. Each hit is

0 commit comments

Comments
 (0)