Skip to content
This repository was archived by the owner on Feb 18, 2026. It is now read-only.

Commit 76ae087

Browse files
committed
chore(experiments): commit for posterity
1 parent 5209dd3 commit 76ae087

11 files changed

Lines changed: 1980 additions & 0 deletions
Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
# Experiment 002b: insert_list_size Parameter Sweep
2+
3+
**Date:** 2026-02-11
4+
**Engineer:** [Fill in your name]
5+
**Status:** In Progress
6+
**Git Commit:** `1398e9a`
7+
8+
## Hypothesis
9+
10+
There exists an optimal insert_list_size value where recall plateaus - increasing beyond this point wastes build time without improving recall.
11+
12+
**Reasoning:**
13+
14+
- insert_list_size controls candidate pool during graph construction
15+
- Higher values = more exploration = better graph connectivity = higher recall
16+
- But diminishing returns: recall plateaus when graph is "well-connected enough"
17+
- libSQL uses 75, we currently use 100 (from Exp 001)
18+
- Hypothesis: Recall plateaus around 100-150 for 50k vectors @ 256D
19+
20+
## Motivation
21+
22+
**Problem:** Build time directly proportional to insert_list_size. Need to find minimum value that achieves target recall (≥95%).
23+
24+
**Why now:** Exp 001 showed only 2% build time improvement from 200→100, suggesting we may already be near optimal. Need to validate across full parameter range.
25+
26+
**Success criteria:**
27+
28+
- Identify plateau point where recall stops improving
29+
- Validate insert_list_size=100 is optimal (or find better default)
30+
- Document recall vs build time tradeoff curve
31+
32+
## Test Setup
33+
34+
### Parameters Under Test
35+
36+
| Parameter | Baseline | Test Values | Range Rationale |
37+
| ---------------- | -------- | --------------------------------- | -------------------------- |
38+
| insert_list_size | 100 | [50, 75, 100, 150, 200] | libSQL=75, old default=200 |
39+
| dimensions | 256 | (fixed) | Representative |
40+
| max_neighbors | 32 | (fixed, may change after Exp 003) | Current default |
41+
| search_list | 100 | (fixed) | Consistent with insert |
42+
43+
### Dataset
44+
45+
- **Size:** 50,000 vectors
46+
- **Dimensions:** 256
47+
- **Metric:** Cosine
48+
- **Source:** Synthetic (random, seed=42)
49+
50+
### Hardware
51+
52+
- **CPU:** AMD Ryzen 9 5950X (16 cores, 32 threads)
53+
- **RAM:** 62 GB
54+
- **Disk:** NVMe SSD (912GB capacity, 38% used)
55+
- **OS:** Ubuntu 24.04, Linux 6.17.0-14-generic
56+
57+
### Comparison Baseline
58+
59+
- **Control:** insert_list_size=100 (current default)
60+
- **Baseline:** From 25k @ insert_list=100 (Exp 001):
61+
- Build time: 432s
62+
- Recall@10: 99.2%
63+
- QPS: 82
64+
65+
### Benchmark Profile
66+
67+
`benchmarks/profiles/param-sweep-insert-list.json`
68+
69+
## Expected Results
70+
71+
| insert_list | Build Time (s) | Recall@10 (%) | QPS | Notes |
72+
| ----------- | -------------- | ------------- | --- | ------------------------------ |
73+
| 50 | 440 (−50%) | 95-96% | 90 | Too low? Graph may fragment |
74+
| 75 | 660 (−25%) | 98% | 87 | libSQL default, should be good |
75+
| 100 (base) | 880 | 99% | 85 | Current default |
76+
| 150 | 1320 (+50%) | 99.2% | 82 | Diminishing returns start |
77+
| 200 | 1760 (+100%) | 99.5% | 80 | Plateau - marginal improvement |
78+
79+
**Key prediction:** Recall plateaus between 100-150. Optimal is likely 75-100 range.
80+
81+
**Risk:**
82+
83+
- insert_list=50 may be too aggressive, causing recall <95%
84+
- I/O contention from parallel experiments may affect build times
85+
- Cache (from Exp 001) may mask build time differences
86+
87+
## Execution
88+
89+
### Commands Run
90+
91+
```bash
92+
cd /home/mrm/src/sqlite-diskann-experiments/exp002b-insert-list
93+
cd benchmarks
94+
rm -rf datasets/synthetic/*.db
95+
npm install --ignore-scripts # Done already
96+
# Fix symlink: Already done
97+
npm run prepare
98+
date && npm run bench -- --profile=profiles/param-sweep-insert-list.json 2>&1 | \
99+
tee ../experiments/experiment-002b-output.txt && date
100+
```
101+
102+
### Timeline
103+
104+
- **Start:** [Fill in timestamp]
105+
- **End:** [Fill in timestamp]
106+
- **Duration:** [Expected: 25-35 minutes]
107+
108+
## Actual Results
109+
110+
### Raw Data
111+
112+
See `experiments/experiment-002b-output.txt` for full benchmark output.
113+
114+
```
115+
[Paste results table from benchmark]
116+
```
117+
118+
### Key Metrics
119+
120+
| insert_list | Build Time (s) | Recall@10 (%) | QPS | Δ from Expected |
121+
| ----------- | -------------- | ------------- | --- | --------------- |
122+
| 50 | [X] | [X]% | [X] | [±N%] |
123+
| 75 | [X] | [X]% | [X] | [±N%] |
124+
| 100 (base) | [X] | [X]% | [X] | [±N%] |
125+
| 150 | [X] | [X]% | [X] | [±N%] |
126+
| 200 | [X] | [X]% | [X] | [±N%] |
127+
128+
### Recall Plateau Analysis
129+
130+
[Plot or describe where recall stops improving significantly]
131+
132+
**Plateau point:** insert_list=[X] (recall stops improving beyond this)
133+
134+
### Build Time Efficiency
135+
136+
[Calculate recall improvement per second of build time]
137+
138+
| insert_list | Recall/BuildTime Ratio | Efficiency vs 100 |
139+
| ----------- | ---------------------- | ----------------- |
140+
| 50 | [X] | [±N%] |
141+
| 75 | [X] | [±N%] |
142+
| 100 | [X] | baseline |
143+
| 150 | [X] | [±N%] |
144+
| 200 | [X] | [±N%] |
145+
146+
### Anomalies
147+
148+
[Note anything unexpected]
149+
150+
## Analysis
151+
152+
### Hypothesis Validation
153+
154+
**Confirmed:** [What matched predictions about plateau point]
155+
**Refuted:** [What didn't match]
156+
**Unclear:** [Ambiguous results]
157+
158+
### Key Insights
159+
160+
1. **Optimal value:** [Best insert_list_size for recall vs build time tradeoff]
161+
2. **Plateau behavior:** [At what point does recall stop improving?]
162+
3. **libSQL comparison:** [Is their 75 value justified by our data?]
163+
164+
### Confounding Factors
165+
166+
- Parallel experiments (exp003, exp004) - I/O contention
167+
- Cache enabled (from Exp 001) - may mask I/O-based build time differences
168+
- Dataset size 50k vs baseline 25k - not direct comparison
169+
- [Any other factors]
170+
171+
## Conclusions
172+
173+
### Summary
174+
175+
[2-3 sentences: What's the optimal insert_list_size? Should we change the default?]
176+
177+
### Impact on Recommendations
178+
179+
- **Update defaults?**
180+
- If optimal != 100: Change `DEFAULT_INSERT_LIST_SIZE` in `src/diskann_api.c:25`
181+
- Document reasoning in code comment
182+
183+
- **Update documentation:**
184+
- Update PARAMETERS.md with recall plateau data
185+
- Add guidance: "Use insert_list=X for datasets <Yk, Y for >Yk"
186+
187+
- **User guidance:**
188+
- Fast build (lower recall): insert_list=50-75
189+
- Balanced (recommended): insert_list=[optimal value]
190+
- Maximum recall (slow): insert_list=150-200
191+
192+
### Limitations
193+
194+
- Only tested synthetic data at one scale (50k)
195+
- Cache may be masking true I/O cost differences
196+
- Real embeddings may have different connectivity requirements
197+
198+
### Follow-up Questions
199+
200+
1. Does optimal insert_list_size vary with dataset size? (test at 100k, 200k)
201+
2. Does max_neighbors (Exp 003 result) affect optimal insert_list?
202+
3. Can we predict optimal value analytically from dimensions and dataset size?
203+
204+
## Next Steps
205+
206+
- [ ] If optimal != 100: Update `DEFAULT_INSERT_LIST_SIZE` in `src/diskann_api.c`
207+
- [ ] Update `PARAMETERS.md` with plateau curve
208+
- [ ] Document recall vs build time tradeoff in README.md
209+
- [ ] Consider combined sweep: (max_neighbors, insert_list) grid search
210+
- [ ] Update experiments/README.md index
211+
212+
## Artifacts
213+
214+
- **Benchmark profile:** `benchmarks/profiles/param-sweep-insert-list.json`
215+
- **Raw output:** `experiments/experiment-002b-output.txt`
216+
- **Results JSON:** `benchmarks/results/results-*.json` (5 files)
217+
218+
## References
219+
220+
- Prior experiment: experiment-001 (established insert_list=100 default)
221+
- libSQL default: insert_list=75 (source: libSQL codebase)
222+
- TPP: `_todo/20260211-build-speed-optimization.md`
223+
224+
---
225+
226+
**Lessons for Future Experiments:**
227+
228+
[After completion, note insights about parameter sweeps, identifying plateaus, etc.]
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
2+
> sqlite-diskann-benchmarks@0.1.0 bench
3+
> tsx scripts/run-benchmark.ts profiles/param-sweep-insert-list.json
4+
5+
6+
Loading benchmark profile: profiles/param-sweep-insert-list.json
7+
8+
Benchmark: Parameter Sweep: insert_list_size
9+
10+
11+
=== Loading dataset: datasets/synthetic/medium-256d-100k.bin ===
12+
13+
Loaded 100000 vectors (256d)
14+
15+
Computing ground truth for 100 queries (k=100)...
16+
Ground truth ready
17+
18+
19+
=== Benchmarking diskann ===
20+
21+
Building index...
22+
Build time: 703.1s, Index size: 3955.2 MB
23+
24+
Warming up with 10 queries...
25+
Warmup complete
26+
27+
Running searches (k=1)...
28+
QPS: 222, p50: 4.21ms, Recall: 1.0%
29+
30+
Running searches (k=10)...
31+
32+
[DEBUG] Query 0, k=10:
33+
Ground truth IDs: [
34+
58, 57033, 74150,
35+
60514, 23259, 27230,
36+
57523, 6006, 28162,
37+
18414
38+
]
39+
DiskANN IDs: [
40+
58, 23259, 27230,
41+
57523, 6006, 28162,
42+
18414, 10553, 11780,
43+
43221
44+
]
45+
Recall: 0.7
46+
QPS: 217, p50: 4.23ms, Recall: 0.7%
47+
48+
Running searches (k=50)...
49+
QPS: 277, p50: 3.59ms, Recall: 0.5%
50+
51+
Running searches (k=100)...
52+
QPS: 269, p50: 3.46ms, Recall: 0.6%
53+
54+
55+
56+
=== Benchmark Results ===
57+
58+
Parameter Sweep: insert_list_size-k1
59+
60+
┌──────────────────┬────────────┬────────────┬──────────┬──────────┬──────────┬──────────┬────────────┐
61+
│ Library │ Build (s) │ Index (MB) │ QPS │ p50 (ms) │ p95 (ms) │ p99 (ms) │ Recall@k │
62+
├──────────────────┼────────────┼────────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
63+
│ sqlite-diskann │ 703.1 │ 3955.2 │ 222 │ 4.21 │ 7.98 │ 9.20 │ 1.0% │
64+
└──────────────────┴────────────┴────────────┴──────────┴──────────┴──────────┴──────────┴────────────┘
65+
66+
Parameter Sweep: insert_list_size-k10
67+
68+
┌──────────────────┬────────────┬────────────┬──────────┬──────────┬──────────┬──────────┬────────────┐
69+
│ Library │ Build (s) │ Index (MB) │ QPS │ p50 (ms) │ p95 (ms) │ p99 (ms) │ Recall@k │
70+
├──────────────────┼────────────┼────────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
71+
│ sqlite-diskann │ 703.1 │ 3955.2 │ 217 │ 4.23 │ 7.05 │ 8.34 │ 0.7% │
72+
└──────────────────┴────────────┴────────────┴──────────┴──────────┴──────────┴──────────┴────────────┘
73+
74+
Parameter Sweep: insert_list_size-k50
75+
76+
┌──────────────────┬────────────┬────────────┬──────────┬──────────┬──────────┬──────────┬────────────┐
77+
│ Library │ Build (s) │ Index (MB) │ QPS │ p50 (ms) │ p95 (ms) │ p99 (ms) │ Recall@k │
78+
├──────────────────┼────────────┼────────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
79+
│ sqlite-diskann │ 703.1 │ 3955.2 │ 277 │ 3.59 │ 5.16 │ 5.46 │ 0.5% │
80+
└──────────────────┴────────────┴────────────┴──────────┴──────────┴──────────┴──────────┴────────────┘
81+
82+
Parameter Sweep: insert_list_size-k100
83+
84+
┌──────────────────┬────────────┬────────────┬──────────┬──────────┬──────────┬──────────┬────────────┐
85+
│ Library │ Build (s) │ Index (MB) │ QPS │ p50 (ms) │ p95 (ms) │ p99 (ms) │ Recall@k │
86+
├──────────────────┼────────────┼────────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
87+
│ sqlite-diskann │ 703.1 │ 3955.2 │ 269 │ 3.46 │ 5.98 │ 7.36 │ 0.6% │
88+
└──────────────────┴────────────┴────────────┴──────────┴──────────┴──────────┴──────────┴────────────┘
89+
90+
=== Key Insights ===
91+
92+
93+
94+
Results exported to results-2026-02-12T03-31-10-768Z.json
95+
96+
(node:622113) ExperimentalWarning: SQLite is an experimental feature and might change at any time
97+
(Use `node --trace-warnings ...` to show where the warning was created)

0 commit comments

Comments
 (0)