You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We build the graph using EVP-quantized representations (`EvpBits` metric). Since this is a self-join, every database element has a corresponding vertex in the graph. We optimize the search by starting the traversal directly at the target vertex's position, bypassing the entry-point routing phase. We evaluate two configurations on this graph:
17
+
-**`mode4` (evp-rerank):** The traversal walks the local neighborhood starting from the target vertex using quantized `EvpBits` distances. The retrieved candidates are then reranked using exact FP16 inner products.
18
+
-**`mode7` (evp-asymmetric-rerank):** The traversal walks the local neighborhood starting from the target vertex using an asymmetric distance function (the vertex's original FP16 vector vs. the EVP-quantized vertices in the graph), followed by exact FP16 inner-product reranking of the retrieved candidates.
19
+
-**Task 2** (MIPS search, scored on **query time**):
20
+
To perform maximum inner product search (MIPS) on the deglib graph, we transform the inner product into an L2-similarity search by extending the vectors' dimensionality. We build the graph once and sweep both `eps_search` and `max_dist` on the built graph to produce multiple operating points. We evaluate two configurations:
21
+
-**`mode5` (l2-fp16-ip):** Vectors are extended to $d+1$ dimensions to transform inner product to L2 distance during the build (speeded up by pre-sorting vectors using **FLAS**). The query search is performed using fast FP16 inner-product exploration.
22
+
-**`mode7` (l2-fp16-d2):** Vectors are extended to $d+2$ dimensions for the graph build (also utilizing FLAS). Query search is performed using fast FP16 L2 distance exploration.
25
23
26
24
The C++ binary computes neighbors **and** distances during search; the thin Python
27
-
entrypoint adapts the output to the official result format. Per-dataset parameters
28
-
live in `TASK1_PROFILES` / `TASK2_PROFILES` in [`submission/search.py`](submission/search.py)
29
-
— unknown datasets fail fast rather than silently using bad parameters.
25
+
entrypoint adapts the output to the official result format.
30
26
31
27
## Challenge tasks & constraints
32
28
33
29
Both tasks run under the same hard limits: **8 vCPUs, 24 GB RAM, ≤ 8 h, read-only
34
30
dataset, no internet** in the container (the eval node is an AMD EPYC 7F72, no
35
-
AVX-512). The goal is **≥ 0.8 average recall**; among the operating points that reach
31
+
AVX-512). The goal is **≥ 0.8 average recall**; among the operating points reaching
36
32
it, the fastest on the scored metric wins.
37
33
38
34
|| Task 1 | Task 2 |
@@ -41,7 +37,7 @@ it, the fastest on the scored metric wins.
41
37
| Problem | k-NN **graph** self-join, k = 15 | k-NN **search**, k = 30 |
|[`cpp/`](cpp/)| deglib (DEG) C++ library and the SISAP binary under `cpp/sisap/` (`task1.cpp`, `task2.cpp`, `sisap.cpp`, per-mode headers in `task1/`, `task2/`). |
|[`Dockerfile`](Dockerfile)| Two-stage image: build the binary (AVX2), then a thin Python runtime that runs `search.py`.|
57
+
|[`submission/`](submission/)| TIRA entrypoint `search.py`and evaluation tools (see [submission/README.md](submission/README.md)). |
58
+
|[`Dockerfile`](Dockerfile)| Two-stage image: build the C++ binary (AVX2), then a thin Python runtime running `search.py`. |
96
59
|[`.github/workflows/ci.yml`](.github/workflows/ci.yml)| Builds the image and runs all three spot-checks through the exact TIRA command schema, then evaluates + plots. |
97
60
|`python/`| Legacy reference implementation (not used by the submission). |
98
61
99
-
## How it runs on TIRA
62
+
## Submission via TIRA
63
+
64
+
Submissions are handled through TIRA ([tira.io/task-overview/sisap-2026](https://www.tira.io/task-overview/sisap-2026)), which provides a reproducible, containerized evaluation framework. Code submissions for SISAP 2026 are handled only through TIRA.
65
+
66
+
### Step 1 — Register your team
67
+
68
+
1. Sign up or log in at [tira.io](https://www.tira.io) (GitHub login supported).
69
+
2. Navigate to [tira.io/task-overview/sisap-2026](https://www.tira.io/task-overview/sisap-2026) and click **Register**.
70
+
3. Optionally add team members via [tira.io/g?type=my](https://www.tira.io/g?type=my).
71
+
72
+
### Step 2 — Verify locally
73
+
74
+
To test the containerized submission pipeline locally on your machine, ensure you have the virtual environment activated (or use uv/pip to install the `tira` client):
75
+
76
+
```bash
77
+
# Install/update the tira client
78
+
uv pip install --upgrade tira
79
+
80
+
# Run a dry run against one of the spot-check datasets:
*(On Windows, use `.\.venv\Scripts\tira-cli` instead of `.venv/bin/tira-cli`)*
89
+
90
+
Use `task-2-spot-check-20260602-training` if your approach only targets Task 2.
91
+
92
+
### Step 3 — Authenticate and submit
100
93
101
-
TIRA builds the image from the repo, mounts the dataset (no internet inside the
102
-
container), and invokes:
94
+
Retrieve your authentication token from the TIRA task page (**Submit** → **Code Submissions** → **New Submission** → **I want to submit from my local machine**), then:
Copy file name to clipboardExpand all lines: cpp/readme.md
+27-1Lines changed: 27 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -123,4 +123,30 @@ Once compiled, the executable `deglib_sisap` can be run from the build output di
123
123
*`--output <path>`: Path to write retrieved neighbor indices to a binary `.ivecs` file.
124
124
*`--flas` (Task 2 only): Enables FLAS pre-sorting of training vectors before graph building.
125
125
126
-
126
+
## Graph modes
127
+
128
+
The `deglib_sisap` binary implements seven graph modes per task (`mode1`…`mode7`). All modes share the same save-mode contract (writing one result file per operating point holding neighbor IDs and distances), so they are drop-in alternatives.
This directory contains the Python runner and evaluation tools for the SISAP 2026 Indexing Challenge submission.
4
+
5
+
## Internal Execution Details
6
+
7
+
When running under TIRA, `search.py` reads the task config, decompresses the input on the fly when needed (the C++ HDF5 reader only handles contiguous datasets, so gzip/chunked inputs are materialized to an uncompressed temp file via `h5py`), drives the binary once per profile, and writes one result file per operating point.
8
+
9
+
## Output Format
10
+
11
+
One HDF5 file is generated per operating point under `$outputDir`. Each file contains:
12
+
13
+
-**Datasets:**
14
+
-`knns` (1-based neighbor IDs; if a query returns fewer than $k$ candidates, padding slots are the vertex's own ID for Task 1 and `0` for Task 2. This padding is harmless, as the evaluator scores by set membership).
15
+
-`dists` (float).
16
+
- Both datasets have the same shape: **`n × (k+1)` for Task 1**, **`n × k` for Task 2**.
Task 1 prepends the self-reference in column 0 (the extra `+1` column), matching the ground-truth layout the evaluator uses. Task 2 has no self column. Only `knns` is scored: `recall = mean_i |knns[i,:k] ∩ gt[i,:k]| / k`.
0 commit comments