Skip to content

Commit 3a7d005

Browse files
committed
Rewrite README for the current single-container deglib submission
1 parent 7899915 commit 3a7d005

1 file changed

Lines changed: 168 additions & 4 deletions

File tree

README.md

Lines changed: 168 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,172 @@
11
# SISAP 2026 — deglib
22

3-
The original C++/Python implementation has been moved to the **[python/](file:///c:/Lang/Python/sisap26-deglib/python)** subdirectory.
3+
Submission for the [SISAP 2026 Indexing Challenge](https://sisap-challenges.github.io/2026/).
4+
The index is a [**Dynamic Exploration Graph (DEG)**](https://github.com/Visual-Computing/DynamicExplorationGraph/tree/evp)
5+
combined with [**EVP (Equi-Voronoi Polytope) quantization**](https://github.com/MetricSearch/metric_space_rust),
6+
implemented in C++ (`cpp/`) and driven by the official baseline Python harness
7+
(`submission/`). Everything ships as a **single Docker container** that TIRA builds
8+
from this repo.
49

5-
You can find the documentation for this code in the legacy README:
6-
* **[python/README.md](file:///c:/Lang/Python/sisap26-deglib/python/README.md)**
10+
## Approach
711

8-
The root directory is reserved for the upcoming Rust implementation.
12+
- **Index:** deglib's even-regular exploration graph, built once per run, then a
13+
parameter sweep produces several operating points (build/recall trade-offs) from
14+
that single build.
15+
- **Task 1** (k-NN self-join, scored on **build + search time**): graph mode `mode4`
16+
EVP build + EVP explore + exact FP16 inner-product rerank. Here **search = explore +
17+
rerank**. Task 1 ranks on the *total* (build + search), which `search.py` packs into
18+
the `buildtime` attribute with `querytime` = 0 — so `buildtime` is the sum, not a
19+
claim that search is part of building. Search is a real component, often ≈ half the
20+
total.
21+
- **Task 2** (MIPS search, scored on **query time**): graph mode `mode5`
22+
L2-converted FP32 build (with FLAS pre-sort) + FP16 inner-product search.
23+
- **Task 3** (sparse SPLADE) is out of scope and skipped cleanly (exit 0) so the
24+
mandatory spot-check CI stays green.
25+
26+
The C++ binary computes neighbors **and** distances during search; the thin Python
27+
entrypoint adapts the output to the official result format. Per-dataset parameters
28+
live in `TASK1_PROFILES` / `TASK2_PROFILES` in [`submission/search.py`](submission/search.py)
29+
— unknown datasets fail fast rather than silently using bad parameters.
30+
31+
## Challenge tasks & constraints
32+
33+
Both tasks run under the same hard limits: **8 vCPUs, 24 GB RAM, ≤ 8 h, read-only
34+
dataset, no internet** in the container (the eval node is an AMD EPYC 7F72, no
35+
AVX-512). The goal is **≥ 0.8 average recall**; among the operating points that reach
36+
it, the fastest on the scored metric wins.
37+
38+
| | Task 1 | Task 2 |
39+
|----------------|-----------------------------------------------|----------------------------------------------------|
40+
| Dataset family | Wikipedia BGE-M3 (FP16, 1024-dim, normalized) | Llama-Dev (FP32, 128-dim) |
41+
| Problem | k-NN **graph** self-join, k = 15 | k-NN **search**, k = 30 |
42+
| Distance | inner product | inner product (via L2 lift) |
43+
| Scored metric | build + search wall-clock (`buildtime`) | query time (`querytime`) |
44+
| Build threads | all 8 | **1** — graph built single-threaded, per the rules |
45+
46+
### Datasets
47+
48+
| Task | Variant | File | Vectors |
49+
|------|--------------|-------------------------------------------|------------------------------------------|
50+
| 1 | spot-check | `benchmark-dev-gooaq-small.h5` | 10,000 (384-dim — off-family smoke test) |
51+
| 1 | small (dev) | `benchmark-dev-wikipedia-bge-m3-small.h5` | 200,000 |
52+
| 1 | large (eval) | `benchmark-dev-wikipedia-bge-m3.h5` | 6,350,000 |
53+
| 2 | spot-check | `benchmark-dev-llama-small.h5` | 14,000 |
54+
| 2 | dev/eval | `llama-dev.h5` | 256,921 |
55+
56+
## Graph modes
57+
58+
The `deglib_sisap` binary implements seven graph modes per task (`mode1``mode7`).
59+
The profiles in `search.py` currently use **`mode4` (Task 1)** and **`mode5` (Task 2)**;
60+
⭐ marks the strongest submission candidates (the other ⭐, `mode7`, is a close
61+
alternative that is implemented but not wired into a profile). All modes share the
62+
same save-mode contract (one result file per operating point holding neighbor ids
63+
**and** distances), so they are drop-in benchmark alternatives.
64+
65+
**Task 1** — EVP variants
66+
67+
| Mode | Name | Description |
68+
|-------------|--------------------------------|----------------------------------------------|
69+
| mode1 | fp16 | FP16 build + FP16 explore |
70+
| mode2 | evp-linear | EVP quantization + brute-force linear search |
71+
| mode3 | evp | EVP build + EVP explore (no rerank) |
72+
| **mode4**| evp-rerank | EVP build + EVP explore + FP16 rerank |
73+
| mode5 | evp-build-fp16-external-search | EVP build + FP16 external graph search |
74+
| mode6 | evp-asymmetric | EVP build + asymmetric FP16-vs-EVP search |
75+
| mode7 ⭐ | evp-asymmetric-rerank | EVP build + asymmetric search + FP16 rerank |
76+
77+
**Task 2** — L2-lift variants
78+
79+
| Mode | Name | Description |
80+
|-------------|-------------------------|-----------------------------------------|
81+
| mode1 | baseline | FP32 build + FP32 inner-product explore |
82+
| mode2 | fp16-build-fp16-explore | FP16 build + FP16 IP explore |
83+
| mode3 | baseline-fp16 | FP32 build + FP16 IP explore |
84+
| mode4 | l2-converted | FP32 L2(d+1) build + FP32 L2 explore |
85+
| **mode5**| l2-fp16-ip | FP32 L2(d+1) build + FP16 IP explore |
86+
| mode6 | l2-fp16-l2 | FP32 L2(d+1) build + FP16 L2 explore |
87+
| mode7 ⭐ | l2-fp16-d2 | FP32 L2(d+2) build + FP16 L2 explore |
88+
89+
## Repository layout
90+
91+
| Path | Contents |
92+
|---------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
93+
| [`cpp/`](cpp/) | deglib (DEG) C++ library and the SISAP binary under `cpp/sisap/` (`task1.cpp`, `task2.cpp`, `sisap.cpp`, per-mode headers in `task1/`, `task2/`). |
94+
| [`submission/`](submission/) | TIRA entrypoint `search.py` plus the vendored baseline harness (`eval.py`, `datasets.py`, `plot.py`, `show_operating_points.py`, `data/*/config.json`). |
95+
| [`Dockerfile`](Dockerfile) | Two-stage image: build the binary (AVX2), then a thin Python runtime that runs `search.py`. |
96+
| [`.github/workflows/ci.yml`](.github/workflows/ci.yml) | Builds the image and runs all three spot-checks through the exact TIRA command schema, then evaluates + plots. |
97+
| `python/` | Legacy reference implementation (not used by the submission). |
98+
99+
## How it runs on TIRA
100+
101+
TIRA builds the image from the repo, mounts the dataset (no internet inside the
102+
container), and invokes:
103+
104+
```bash
105+
python3 /app/search.py \
106+
--input $inputDataset/*.h5 \
107+
--task-description $inputDataset/config.json \
108+
--output $outputDir
109+
```
110+
111+
`search.py` reads the task config, decompresses the input on the fly when needed
112+
(the C++ HDF5 reader only handles contiguous datasets, so gzip/chunked inputs are
113+
materialized to an uncompressed temp file via `h5py`), drives the binary once per
114+
profile, and writes one result file per operating point.
115+
116+
## Output format
117+
118+
One HDF5 file per operating point under `$outputDir`, each with:
119+
120+
- datasets `knns` (1-based neighbor ids; if a query returns fewer than k candidates
121+
the padding slots are the node's own id for Task 1 and `0` for Task 2 — both
122+
harmless, since the evaluator scores by set membership) and `dists` (float), both
123+
the same shape — **`n × (k+1)` for Task 1**, **`n × k` for Task 2**;
124+
- root attributes `algo`, `dataset`, `task`, `buildtime`, `querytime`, `params`.
125+
126+
Task 1 prepends the self-reference in column 0 (the extra `+1` column), matching the
127+
ground-truth layout the evaluator uses; Task 2 has no self column. Only `knns` is
128+
scored — `recall = mean_i |knns[i,:k] ∩ gt[i,:k]| / k`.
129+
130+
## Build & run locally
131+
132+
```bash
133+
# Build the submission image
134+
docker build -t sisap-deglib .
135+
136+
# Run one task the way TIRA does (dataset dir holds the .h5 + config.json)
137+
mkdir -p results
138+
docker run --rm --cpus=8 --memory=24g \
139+
-v "$PWD/your-dataset-dir:/app/data/ds:ro" \
140+
-v "$PWD/results:/app/results:rw" \
141+
sisap-deglib \
142+
python3 /app/search.py --input '/app/data/ds/*.h5' \
143+
--task-description /app/data/ds/config.json --output /app/results
144+
145+
# Score the results against the dataset ground truth (run from submission/,
146+
# like CI does, so eval.py can import the harness modules)
147+
cd submission && PYTHONPATH=. python3 eval.py --results ../results res.csv
148+
```
149+
150+
### Building just the C++ binary
151+
152+
```bash
153+
cmake -S cpp -B cpp/build -DCMAKE_BUILD_TYPE=Release -DFORCE_AVX2=ON
154+
cmake --build cpp/build --target deglib_sisap -j"$(nproc)"
155+
156+
# Usage: deglib_sisap <task1|task2> <input.h5> <mode> [options]
157+
# Save mode writes one .bin per operating point into the --output directory:
158+
cpp/build/bin/deglib_sisap task2 dataset.h5 mode5 \
159+
--no-recall --output results_dir \
160+
--k-top 30 --max-dist 5000,7000 --eps-search 0.18,0.2 --flas
161+
```
162+
163+
`--march`/AVX note: the build is pinned to **AVX2 (no AVX-512)** because the eval
164+
node is an AMD EPYC 7F72 (Zen 2) with 8 vCPU / 24 GB RAM and no AVX-512.
165+
166+
## Continuous integration
167+
168+
On every push the CI builds the image and runs all three spot-checks through the
169+
same command schema TIRA uses, under the eval node's resource limits
170+
(`--cpus=8 --memory=24g`), then runs `eval.py` / `plot.py` /
171+
`show_operating_points.py`. There is no hard recall gate — it builds, runs and
172+
reports, which is what the challenge requires for a valid public submission.

0 commit comments

Comments
 (0)