Skip to content

Commit d033bf5

Browse files
asg017claude
andcommitted
Add delete recall benchmark suite
New benchmarks-ann/bench-delete/ directory measures KNN recall degradation after random row deletion across index types (flat, rescore, IVF, DiskANN). For each config and delete percentage: builds index, measures baseline recall, copies DB, deletes random rows, measures post-delete recall, VACUUMs and records size savings. Includes Makefile targets, self-contained smoke test with synthetic data, and results DB for analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent b008654 commit d033bf5

5 files changed

Lines changed: 830 additions & 0 deletions

File tree

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
runs/
2+
*.db
3+
__pycache__/
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
BENCH = python bench_delete.py
2+
EXT = ../../dist/vec0
3+
4+
# --- Configs to test ---
5+
FLAT = "flat:type=vec0-flat,variant=float"
6+
RESCORE_BIT = "rescore-bit:type=rescore,quantizer=bit,oversample=8"
7+
RESCORE_INT8 = "rescore-int8:type=rescore,quantizer=int8,oversample=8"
8+
DISKANN_R48 = "diskann-R48:type=diskann,R=48,L=128,quantizer=binary"
9+
DISKANN_R72 = "diskann-R72:type=diskann,R=72,L=128,quantizer=binary"
10+
11+
ALL_CONFIGS = $(FLAT) $(RESCORE_BIT) $(RESCORE_INT8) $(DISKANN_R48) $(DISKANN_R72)
12+
13+
DELETE_PCTS = 5,10,25,50,75,90
14+
15+
.PHONY: smoke bench-10k bench-50k bench-all report clean
16+
17+
# Quick smoke test (small dataset, few queries)
18+
smoke:
19+
$(BENCH) --subset-size 5000 --delete-pct 10,50 -k 10 -n 20 \
20+
--dataset cohere1m --ext $(EXT) \
21+
$(FLAT) $(DISKANN_R48)
22+
23+
# Standard benchmarks
24+
bench-10k:
25+
$(BENCH) --subset-size 10000 --delete-pct $(DELETE_PCTS) -k 10 -n 50 \
26+
--dataset cohere1m --ext $(EXT) $(ALL_CONFIGS)
27+
28+
bench-50k:
29+
$(BENCH) --subset-size 50000 --delete-pct $(DELETE_PCTS) -k 10 -n 50 \
30+
--dataset cohere1m --ext $(EXT) $(ALL_CONFIGS)
31+
32+
bench-all: bench-10k bench-50k
33+
34+
# Query saved results
35+
report:
36+
@echo "Query results:"
37+
@echo " sqlite3 runs/cohere1m/10000/delete_results.db \\"
38+
@echo " \"SELECT config_name, delete_pct, recall, query_mean_ms, vacuum_size_mb FROM delete_runs ORDER BY config_name, delete_pct\""
39+
40+
clean:
41+
rm -rf runs/
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# bench-delete: Recall degradation after random deletion
2+
3+
Measures how KNN recall changes after deleting a random percentage of rows
4+
from different index types (flat, rescore, DiskANN).
5+
6+
## Quick start
7+
8+
```bash
9+
# Ensure dataset exists
10+
make -C ../datasets/cohere1m
11+
12+
# Ensure extension is built
13+
make -C ../.. loadable
14+
15+
# Quick smoke test
16+
make smoke
17+
18+
# Full benchmark at 10k vectors
19+
make bench-10k
20+
```
21+
22+
## Usage
23+
24+
```bash
25+
python bench_delete.py --subset-size 10000 --delete-pct 10,25,50,75 \
26+
"flat:type=vec0-flat,variant=float" \
27+
"diskann-R72:type=diskann,R=72,L=128,quantizer=binary" \
28+
"rescore-bit:type=rescore,quantizer=bit,oversample=8"
29+
```
30+
31+
## What it measures
32+
33+
For each config and delete percentage:
34+
35+
| Metric | Description |
36+
|--------|-------------|
37+
| **recall** | KNN recall@k after deletion (ground truth recomputed over surviving rows) |
38+
| **delta** | Recall change vs 0% baseline |
39+
| **query latency** | Mean/median query time after deletion |
40+
| **db_size_mb** | DB file size before VACUUM |
41+
| **vacuum_size_mb** | DB file size after VACUUM (space reclaimed) |
42+
| **delete_time_s** | Wall time for the DELETE operations |
43+
44+
## How it works
45+
46+
1. Build index with N vectors (one copy per config)
47+
2. Measure recall at k=10 (pre-delete baseline)
48+
3. For each delete %:
49+
- Copy the master DB
50+
- Delete a random selection of rows (deterministic seed)
51+
- Measure recall (ground truth recomputed over surviving rows only)
52+
- VACUUM and measure size savings
53+
4. Print comparison table
54+
55+
## Expected behavior
56+
57+
- **Flat index**: Recall should be 1.0 at all delete percentages (brute-force is always exact)
58+
- **Rescore**: Recall should stay close to baseline (quantized scan + rescore is robust)
59+
- **DiskANN**: Recall may degrade at high delete % due to graph fragmentation (dangling edges, broken connectivity)
60+
61+
## Results DB
62+
63+
Results are stored in `runs/<dataset>/<subset_size>/delete_results.db`:
64+
65+
```sql
66+
SELECT config_name, delete_pct, recall, vacuum_size_mb
67+
FROM delete_runs
68+
ORDER BY config_name, delete_pct;
69+
```

0 commit comments

Comments
 (0)