This directory contains the C++ implementation of the Dynamic Exploration Graph (DEG) optimized and tailored for the SISAP 2026 Challenge (asks 1 & 2). It includes the core deglib header-only library and the task runners inside the sisap subdirectory.
Download and extract the data set files from the main readme file.
- C++ Compiler: A modern C++20 compiler (GCC 10.0+, Clang 11.0+, MSVC 2022+, or AppleClang).
- CMake: Version 3.19+
IMPORTANT NOTE: this code is highly optimized using AVX2 instructions for fast distance computation.
Select your operating system and preferred setup method:
# Install MSVC C++ Compiler (Visual Studio Build Tools)
$ winget install --id Microsoft.VisualStudio.2022.BuildTools --override "--add Microsoft.VisualStudio.Workload.VCTools --passive"
# Install CMake
$ winget install --id Kitware.CMake- Go to the Visual Studio Download Page.
- Scroll down to the bottom of the page and expand the section "Tools for Visual Studio".
- Download the installer for "Build Tools for Visual Studio 2022".
- Run the installer, select the "Desktop development with C++" workload (which installs the required C++ Build Tools), and complete the installation.
- Download and install CMake for Windows.
$ sudo apt-get update && sudo apt-get install build-essential cmake# Install AppleClang C++ Compiler (Xcode Command Line Tools)
$ xcode-select --install
# Install CMake via Homebrew
$ brew install cmake- Open the Mac App Store, search for "Xcode", and install it (this will install the AppleClang compiler). Launch Xcode once after installation to accept the license agreement.
- Download and install CMake for macOS directly.
Rename CMakePresets.json.sample to CMakePresets.json and change the DATA_PATH cache variable inside of the file to point to the directory where your datasets are located.
Then, compile the project using standard CMake Presets from the root directory:
# Configure using your environment's preset (e.g. "windows-msvc", "linux-gcc", or "macos-clang")
cmake --preset <Preset-Name>
# Compile the Release target (e.g. "windows-msvc-release", "linux-gcc-release", or "macos-clang-release")
cmake --build --preset <Build-Preset-Name>This C++ repository is organized as follows:
- deglib/include/: The core C++ library of the Dynamic Exploration Graph (DEG). It contains the distance metrics, builders, repository, graph structures, and search algorithms.
- sisap/: Challenge benchmark runner files:
- sisap.cpp: The main combined entry point program (
deglib_sisap). It acts as a router, dispatching execution to eithertask1ortask2depending on the first CLI argument passed. - task1.cpp: Implementation for EVP Benchmark Task 1. Dispatches HDF5-based approximate nearest neighbor (ANN) search benchmarks across 7 different modes (e.g., FP16, EVP bits, asymmetric search, and candidate list rerankings).
- task2.cpp: Implementation for EVP Benchmark Task 2. Evaluates the
llama-devdatasets and incorporates advanced features like FLAS pre-sorting, entry-search expansions, and graph optimization sweeps. - flas/: Fast Linear Assignment Sorter library used to pre-sort database vectors to optimize graph construction.
- hdf5/: Custom header-only parser to scan, read, and interpret dataset HDF5 files natively.
- sisap.cpp: The main combined entry point program (
Once compiled, the executable deglib_sisap can be run from the build output directory.
# General syntax
./deglib_sisap <task> <hdf5_file_path> <mode_name> [options...]<task>: Set totask1ortask2.<hdf5_file_path>: Path to a valid SISAP HDF5 dataset file (e.g.benchmark-dev-wikipedia-bge-m3-small.h5orllama-dev.h5).<mode_name>: The target benchmark mode (e.g.mode4,evp-rerank).
# On Windows (from the build folder):
.\build\windows-msvc\bin\Release\deglib_sisap.exe task1 data/wikipedia-small.h5 mode4 --threads 8 --max-dist 200
# On Linux:
./build/linux-gcc/bin/Release/deglib_sisap task1 data/wikipedia-small.h5 mode4 --threads 8 --max-dist 200# Run with FLAS pre-sorting and a search limit sweep:
./build/linux-gcc/bin/Release/deglib_sisap task2 data/llama-dev.h5 mode7 --threads 8 --flas --max-dist 5000,6000 --eps-search 0.007--threads <n>: Number of CPU worker threads used for parallel construction and search (default:8/ host allocation).--max-dist <list>: Exploration search budget (e.g.,200or a comma-separated list like100,200,300for sweeps).--k-top <n>: Number of nearest neighbors to retrieve per query (default:15for task1,30for task2).--k-graph <n>: Degree/edges per vertex in the graph (default:32).--no-recall: Skips loading ground-truth data (requires--output).--output <path>: Path to write retrieved neighbor indices to a binary.ivecsfile.--flas(Task 2 only): Enables FLAS pre-sorting of training vectors before graph building.
The deglib_sisap binary implements seven graph modes per task (mode1…mode7). All modes share the same save-mode contract (writing one result file per operating point holding neighbor IDs and distances), so they are drop-in alternatives.
| Mode | Name | Description |
|---|---|---|
| mode1 | fp16 | FP16 build + FP16 explore |
| mode2 | evp-linear | EVP quantization + brute-force linear search |
| mode3 | evp | EVP build + EVP explore (no rerank) |
| mode4 ⭐ | evp-rerank | EVP build + EVP explore + FP16 rerank |
| mode5 | evp-build-fp16-external-search | EVP build + FP16 external graph search |
| mode6 | evp-asymmetric | EVP build + asymmetric FP16-vs-EVP search |
| mode7 ⭐ | evp-asymmetric-rerank | EVP build + asymmetric search + FP16 rerank |
| Mode | Name | Description |
|---|---|---|
| mode1 | baseline | FP32 build + FP32 inner-product explore |
| mode2 | fp16-build-fp16-explore | FP16 build + FP16 IP explore |
| mode3 | baseline-fp16 | FP32 build + FP16 IP explore |
| mode4 | l2-converted | FP32 L2(d+1) build + FP32 L2 explore |
| mode5 ⭐ | l2-fp16-ip | FP32 L2(d+1) build + FP16 IP explore |
| mode6 | l2-fp16-l2 | FP32 L2(d+1) build + FP16 L2 explore |
| mode7 ⭐ | l2-fp16-d2 | FP32 L2(d+2) build + FP16 L2 explore |