|
| 1 | +# TODO: `ann` base branch + consolidated benchmarks |
| 2 | + |
| 3 | +## 1. Create `ann` branch with shared code |
| 4 | + |
| 5 | +### 1.1 Branch setup |
| 6 | +- [x] `git checkout -B ann origin/main` |
| 7 | +- [x] Cherry-pick `624f998` (vec0_distance_full shared distance dispatch) |
| 8 | +- [x] Cherry-pick stdint.h fix for test header |
| 9 | +- [ ] Pull NEON cosine optimization from ivf-yolo3 into shared code |
| 10 | + - Currently only in ivf branch but is general-purpose (benefits all distance calcs) |
| 11 | + - Lives in `distance_cosine_float()` — ~57 lines of ARM NEON vectorized cosine |
| 12 | + |
| 13 | +### 1.2 Benchmark infrastructure (`benchmarks-ann/`) |
| 14 | +- [x] Seed data pipeline (`seed/Makefile`, `seed/build_base_db.py`) |
| 15 | +- [x] Ground truth generator (`ground_truth.py`) |
| 16 | +- [x] Results schema (`schema.sql`) |
| 17 | +- [x] Benchmark runner with `INDEX_REGISTRY` extension point (`bench.py`) |
| 18 | + - Baseline configs (float, int8-rescore, bit-rescore) implemented |
| 19 | + - Index branches register their types via `INDEX_REGISTRY` dict |
| 20 | +- [x] Makefile with baseline targets |
| 21 | +- [x] README |
| 22 | + |
| 23 | +### 1.3 Rebase feature branches onto `ann` |
| 24 | +- [x] Rebase `diskann-yolo2` onto `ann` (1 commit: DiskANN implementation) |
| 25 | +- [x] Rebase `ivf-yolo3` onto `ann` (1 commit: IVF implementation) |
| 26 | +- [x] Rebase `annoy-yolo2` onto `ann` (2 commits: Annoy implementation + schema fix) |
| 27 | +- [x] Verify each branch has only its index-specific commits remaining |
| 28 | +- [ ] Force-push all 4 branches to origin |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +## 2. Per-branch: register index type in benchmarks |
| 33 | + |
| 34 | +Each index branch should add to `benchmarks-ann/` when rebased onto `ann`: |
| 35 | + |
| 36 | +### 2.1 Register in `bench.py` |
| 37 | + |
| 38 | +Add an `INDEX_REGISTRY` entry. Each entry provides: |
| 39 | +- `defaults` — default param values |
| 40 | +- `create_table_sql(params)` — CREATE VIRTUAL TABLE with INDEXED BY clause |
| 41 | +- `insert_sql(params)` — custom insert SQL, or None for default |
| 42 | +- `post_insert_hook(conn, params)` — training/building step, returns time |
| 43 | +- `run_query(conn, params, query, k)` — custom query, or None for default MATCH |
| 44 | +- `describe(params)` — one-line description for report output |
| 45 | + |
| 46 | +### 2.2 Add configs to `Makefile` |
| 47 | + |
| 48 | +Append index-specific config variables and targets. Example pattern: |
| 49 | + |
| 50 | +```makefile |
| 51 | +DISKANN_CONFIGS = \ |
| 52 | + "diskann-R48-binary:type=diskann,R=48,L=128,quantizer=binary" \ |
| 53 | + ... |
| 54 | + |
| 55 | +ALL_CONFIGS += $(DISKANN_CONFIGS) |
| 56 | + |
| 57 | +bench-diskann: seed |
| 58 | + $(BENCH) --subset-size 10000 -k 10 -o runs/diskann $(BASELINES) $(DISKANN_CONFIGS) |
| 59 | + ... |
| 60 | +``` |
| 61 | + |
| 62 | +### 2.3 Migrate existing benchmark results/docs |
| 63 | + |
| 64 | +- Move useful results docs (RESULTS.md, etc.) into `benchmarks-ann/results/` |
| 65 | +- Delete redundant per-branch benchmark directories once consolidated infra is proven |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +## 3. Future improvements |
| 70 | + |
| 71 | +- [ ] Reporting script (`report.py`) — query results.db, produce markdown comparison tables |
| 72 | +- [ ] Profiling targets in Makefile (lift from ivf-yolo3's Instruments/perf wrappers) |
| 73 | +- [ ] Pre-computed ground truth integration (use GT DB files instead of on-the-fly brute-force) |
0 commit comments