SISAP 2026 — deglib

Submission for the SISAP 2026 Indexing Challenge. The index is a Dynamic Exploration Graph (DEG) combined with EVP (Equi-Voronoi Polytope) quantization and FLAS (Fast Linear Assignment Sorter) for optimized insertion order, implemented in C++ (cpp/) and driven by the official baseline Python harness (submission/). Everything ships as a single Docker container built by TIRA from this repo.

Approach

We configure the even-regular Dynamic Exploration Graph (deglib) library to target the specific constraints and scoring metrics of the two tasks:

Task 1 (k-NN self-join, scored on total build + search time): We build the graph using EVP-quantized representations (EvpBits metric). Since this is a self-join, every database element has a corresponding vertex in the graph. We optimize the search by starting the traversal directly at the target vertex's position, bypassing the entry-point routing phase. We evaluate two configurations on this graph:
- mode4 (evp-rerank): The traversal walks the local neighborhood starting from the target vertex using quantized EvpBits distances. The retrieved candidates are then reranked using exact FP16 inner products.
- mode7 (evp-asymmetric-rerank): The traversal walks the local neighborhood starting from the target vertex using an asymmetric distance function (the vertex's original FP16 vector vs. the EVP-quantized vertices in the graph), followed by exact FP16 inner-product reranking of the retrieved candidates.
Task 2 (MIPS search, scored on query time): To perform maximum inner product search (MIPS) on the deglib graph, we transform the inner product into an L2-similarity search by extending the vectors' dimensionality. We build the graph once and sweep both eps_search and max_dist on the built graph to produce multiple operating points. We evaluate two configurations:
- mode5 (l2-fp16-ip): Vectors are extended to $d+1$ dimensions to transform inner product to L2 distance during the build (speeded up by pre-sorting vectors using FLAS). The query search is performed using fast FP16 inner-product exploration.
- mode7 (l2-fp16-d2): Vectors are extended to $d+2$ dimensions for the graph build (also utilizing FLAS). Query search is performed using fast FP16 L2 distance exploration.

The C++ binary computes neighbors and distances during search; the thin Python entrypoint adapts the output to the official result format.

Challenge tasks & constraints

Both tasks run under the same hard limits: 8 vCPUs, 24 GB RAM, ≤ 8 h, read-only dataset, no internet in the container (the eval node is an AMD EPYC 7F72, no AVX-512). The goal is ≥ 0.8 average recall; among the operating points reaching it, the fastest on the scored metric wins.

	Task 1	Task 2
Dataset family	Wikipedia BGE-M3 (FP16, 1024-dim, normalized)	Llama-Dev (FP32, 128-dim)
Problem	k-NN graph self-join, k = 15	k-NN search, k = 30
Distance	inner product	inner product (via L2 lift)
Scored metric	build + search wall-clock (`buildtime`)	query time (`querytime`)
Build threads	all 8	1 (configured build thread)

Datasets

Task	Variant	File	Vectors
1	spot-check	`benchmark-dev-gooaq-small.h5`	10,000 (384-dim — off-family smoke test)
1	small (dev)	`benchmark-dev-wikipedia-bge-m3-small.h5`	200,000
1	large (eval)	`benchmark-dev-wikipedia-bge-m3.h5`	6,350,000
2	spot-check	`benchmark-dev-llama-small.h5`	14,000
2	dev/eval	`llama-dev.h5`	256,921

Repository layout

Path	Contents
`cpp/`	deglib (DEG) C++ library and the SISAP binary under `cpp/sisap/` (`task1.cpp`, `task2.cpp`, `sisap.cpp`, per-mode headers in `task1/`, `task2/`).
`submission/`	TIRA entrypoint `search.py` and evaluation tools (see submission/README.md).
`Dockerfile`	Two-stage image: build the C++ binary (AVX2), then a thin Python runtime running `search.py`.
`.github/workflows/ci.yml`	Builds the image and runs all three spot-checks through the exact TIRA command schema, then evaluates + plots.
`python/`	Legacy reference implementation (not used by the submission).

Submission via TIRA

Submissions are handled through TIRA (tira.io/task-overview/sisap-2026), which provides a reproducible, containerized evaluation framework. Code submissions for SISAP 2026 are handled only through TIRA.

Step 1 — Register your team

Sign up or log in at tira.io (GitHub login supported).
Navigate to tira.io/task-overview/sisap-2026 and click Register.
Optionally add team members via tira.io/g?type=my.

Step 2 — Verify locally

To test the containerized submission pipeline locally on your machine, sync the workspace dependencies:

# Install/update workspace dependencies
uv sync

# Run a dry run against the task-1 spot-check datasets:
uv run tira-cli code-submission \
    --path . \
    --command 'python3 /app/search.py --input $inputDataset/*.h5 --task-description $inputDataset/config.json --output $outputDir' \
    --task sisap-2026 \
    --dataset task-1-spot-check-20260602-training \
    --dry-run

# Run a dry run against the task-2 spot-check datasets:
uv run tira-cli code-submission \
    --path . \
    --command 'python3 /app/search.py --input $inputDataset/*.h5 --task-description $inputDataset/config.json --output $outputDir' \
    --task sisap-2026 \
    --dataset task-2-spot-check-20260602-training \
    --dry-run

Step 3 — Authenticate and submit

Retrieve your authentication token from the TIRA task page (Submit → Code Submissions → New Submission → I want to submit from my local machine), then:

uv run tira-cli login --token AUTH-TOKEN
uv run tira-cli verify-installation --task sisap-2026 --team deglib

uv run tira-cli code-submission \
    --path . \
    --command 'python3 /app/search.py --input $inputDataset/*.h5 --task-description $inputDataset/config.json --output $outputDir' \
    --task sisap-2026 \
    --dataset task-1-spot-check-20260602-training

Step 4 — Trigger evaluation in the TIRA UI

Navigate to the task page, click Submit → Code Submissions.
Select your submission, choose a dataset and hardware configuration and trigger a run.
The organizers will handle execution on all datasets once your submission looks correct.

Build & run locally

# Build the submission image
docker build -t sisap-deglib .

# Run one task the way TIRA does e.g. task-2-spot-check (<your-dataset-dir> holds the .h5)
mkdir -p results
docker run --rm --cpus=8 --memory=24g \
    -v "$PWD/<your-dataset-dir>/:/app/dataset:ro" \
    -v "$PWD/results:/app/results:rw" \
    sisap-deglib \
    python3 /app/search.py --input '/app/dataset/benchmark-dev-llama-small.h5' \
        --task-description /app/data/task-2-spot-check/config.json --output /app/results

# Install/update workspace dependencies
uv sync

# Score the results against the dataset ground truth (saves res_task2.csv under results/)
uv --directory submission run eval.py --results ../results ../results/res_task2.csv

# Plot the recall-vs-QPS curve (saves result_*.png under results/)
uv --directory submission run plot.py --task task2 ../results/res_task2.csv
mv submission/result_*.png results/

Building just the C++ binary

cmake -S cpp -B cpp/build -DCMAKE_BUILD_TYPE=Release
cmake --build cpp/build --target deglib_sisap -j"$(nproc)"

# Usage: deglib_sisap <task1|task2> <input.h5> <mode> [options]
# Save mode writes one .bin per operating point into the --output directory:
cpp/build/bin/deglib_sisap task2 dataset.h5 mode5 \
    --no-recall --output results_dir \
    --k-top 30 --max-dist 5000,7000 --eps-search 0.18,0.2 --flas

Continuous integration

On every push the CI builds the image and runs all three spot-checks through the same command schema TIRA uses, under the eval node's resource limits (--cpus=8 --memory=24g), then runs eval.py / plot.py / show_operating_points.py. There is no hard recall gate — it builds, runs and reports, which is what the challenge requires for a valid public submission.

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github/workflows		.github/workflows
cpp		cpp
python		python
submission		submission
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SISAP 2026 — deglib

Approach

Challenge tasks & constraints

Datasets

Repository layout

Submission via TIRA

Step 1 — Register your team

Step 2 — Verify locally

Step 3 — Authenticate and submit

Step 4 — Trigger evaluation in the TIRA UI

Build & run locally

Building just the C++ binary

Continuous integration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SISAP 2026 — deglib

Approach

Challenge tasks & constraints

Datasets

Repository layout

Submission via TIRA

Step 1 — Register your team

Step 2 — Verify locally

Step 3 — Authenticate and submit

Step 4 — Trigger evaluation in the TIRA UI

Build & run locally

Building just the C++ binary

Continuous integration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages