Skip to content

Commit f460fae

Browse files
committed
feat: add ParquetLookupProvider and SqliteLookupProvider behind feature gates
Ports the two storage backends proven in the df-vector-search benchmark POC into the library as optional, feature-gated providers. - `parquet-provider` feature: ParquetLookupProvider — concurrent row-group reads from any ObjectStore (S3, local FS) with pre-cached parquet footers and optional RowSelection for page-skip optimisation - `sqlite-provider` feature: SqliteLookupProvider — B-tree point lookups via a WAL-mode connection pool; builds from parquet on first run - `keys` module (always compiled): pack_key / unpack_key / DatasetLayout — shared key encoding utilities extracted from indexing.rs - Integration tests for both providers (9 tests total) Breaking change in Cargo.toml: tokio gains the "sync" feature unconditionally (needed by SqliteLookupProvider's Semaphore; tokio was already a hard dep).
1 parent 64a7b1c commit f460fae

8 files changed

Lines changed: 1522 additions & 49 deletions

File tree

Cargo.lock

Lines changed: 107 additions & 42 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,30 @@ edition = "2024"
55
description = "DataFusion extension for USearch HNSW vector similarity search with adaptive WHERE clause filtering"
66
license = "MIT OR Apache-2.0"
77

8+
[features]
9+
parquet-provider = ["dep:parquet", "dep:object_store", "dep:bytes"]
10+
sqlite-provider = ["dep:rusqlite", "dep:serde_json", "dep:parquet"]
11+
812
[dependencies]
9-
tracing = "0.1"
10-
datafusion = "51.0.0"
11-
usearch = "2.24.0"
12-
arrow-array = "57.2.0"
13+
tracing = "0.1"
14+
datafusion = "51.0.0"
15+
usearch = "2.24.0"
16+
arrow-array = "57.2.0"
1317
arrow-schema = "57.2.0"
14-
async-trait = "0.1"
15-
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
18+
async-trait = "0.1"
19+
futures = "0.3"
20+
# "sync" adds tokio::sync::Semaphore, used by SqliteLookupProvider's connection pool
21+
tokio = { version = "1", features = ["rt-multi-thread", "macros", "sync"] }
22+
23+
# parquet-provider
24+
parquet = { version = "57.2.0", optional = true, features = ["async", "object_store"] }
25+
object_store = { version = "0.12", optional = true }
26+
bytes = { version = "1", optional = true }
27+
28+
# sqlite-provider
29+
rusqlite = { version = "0.32", optional = true, features = ["bundled"] }
30+
serde_json = { version = "1", optional = true }
1631

1732
[dev-dependencies]
18-
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
33+
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
34+
tempfile = "3"

0 commit comments

Comments
 (0)