|
| 1 | +# Experiment 002b: insert_list_size Parameter Sweep |
| 2 | + |
| 3 | +**Date:** 2026-02-11 |
| 4 | +**Engineer:** [Fill in your name] |
| 5 | +**Status:** In Progress |
| 6 | +**Git Commit:** `1398e9a` |
| 7 | + |
| 8 | +## Hypothesis |
| 9 | + |
| 10 | +There exists an optimal insert_list_size value where recall plateaus - increasing beyond this point wastes build time without improving recall. |
| 11 | + |
| 12 | +**Reasoning:** |
| 13 | + |
| 14 | +- insert_list_size controls candidate pool during graph construction |
| 15 | +- Higher values = more exploration = better graph connectivity = higher recall |
| 16 | +- But diminishing returns: recall plateaus when graph is "well-connected enough" |
| 17 | +- libSQL uses 75, we currently use 100 (from Exp 001) |
| 18 | +- Hypothesis: Recall plateaus around 100-150 for 50k vectors @ 256D |
| 19 | + |
| 20 | +## Motivation |
| 21 | + |
| 22 | +**Problem:** Build time directly proportional to insert_list_size. Need to find minimum value that achieves target recall (≥95%). |
| 23 | + |
| 24 | +**Why now:** Exp 001 showed only 2% build time improvement from 200→100, suggesting we may already be near optimal. Need to validate across full parameter range. |
| 25 | + |
| 26 | +**Success criteria:** |
| 27 | + |
| 28 | +- Identify plateau point where recall stops improving |
| 29 | +- Validate insert_list_size=100 is optimal (or find better default) |
| 30 | +- Document recall vs build time tradeoff curve |
| 31 | + |
| 32 | +## Test Setup |
| 33 | + |
| 34 | +### Parameters Under Test |
| 35 | + |
| 36 | +| Parameter | Baseline | Test Values | Range Rationale | |
| 37 | +| ---------------- | -------- | --------------------------------- | -------------------------- | |
| 38 | +| insert_list_size | 100 | [50, 75, 100, 150, 200] | libSQL=75, old default=200 | |
| 39 | +| dimensions | 256 | (fixed) | Representative | |
| 40 | +| max_neighbors | 32 | (fixed, may change after Exp 003) | Current default | |
| 41 | +| search_list | 100 | (fixed) | Consistent with insert | |
| 42 | + |
| 43 | +### Dataset |
| 44 | + |
| 45 | +- **Size:** 50,000 vectors |
| 46 | +- **Dimensions:** 256 |
| 47 | +- **Metric:** Cosine |
| 48 | +- **Source:** Synthetic (random, seed=42) |
| 49 | + |
| 50 | +### Hardware |
| 51 | + |
| 52 | +- **CPU:** AMD Ryzen 9 5950X (16 cores, 32 threads) |
| 53 | +- **RAM:** 62 GB |
| 54 | +- **Disk:** NVMe SSD (912GB capacity, 38% used) |
| 55 | +- **OS:** Ubuntu 24.04, Linux 6.17.0-14-generic |
| 56 | + |
| 57 | +### Comparison Baseline |
| 58 | + |
| 59 | +- **Control:** insert_list_size=100 (current default) |
| 60 | +- **Baseline:** From 25k @ insert_list=100 (Exp 001): |
| 61 | + - Build time: 432s |
| 62 | + - Recall@10: 99.2% |
| 63 | + - QPS: 82 |
| 64 | + |
| 65 | +### Benchmark Profile |
| 66 | + |
| 67 | +`benchmarks/profiles/param-sweep-insert-list.json` |
| 68 | + |
| 69 | +## Expected Results |
| 70 | + |
| 71 | +| insert_list | Build Time (s) | Recall@10 (%) | QPS | Notes | |
| 72 | +| ----------- | -------------- | ------------- | --- | ------------------------------ | |
| 73 | +| 50 | 440 (−50%) | 95-96% | 90 | Too low? Graph may fragment | |
| 74 | +| 75 | 660 (−25%) | 98% | 87 | libSQL default, should be good | |
| 75 | +| 100 (base) | 880 | 99% | 85 | Current default | |
| 76 | +| 150 | 1320 (+50%) | 99.2% | 82 | Diminishing returns start | |
| 77 | +| 200 | 1760 (+100%) | 99.5% | 80 | Plateau - marginal improvement | |
| 78 | + |
| 79 | +**Key prediction:** Recall plateaus between 100-150. Optimal is likely 75-100 range. |
| 80 | + |
| 81 | +**Risk:** |
| 82 | + |
| 83 | +- insert_list=50 may be too aggressive, causing recall <95% |
| 84 | +- I/O contention from parallel experiments may affect build times |
| 85 | +- Cache (from Exp 001) may mask build time differences |
| 86 | + |
| 87 | +## Execution |
| 88 | + |
| 89 | +### Commands Run |
| 90 | + |
| 91 | +```bash |
| 92 | +cd /home/mrm/src/sqlite-diskann-experiments/exp002b-insert-list |
| 93 | +cd benchmarks |
| 94 | +rm -rf datasets/synthetic/*.db |
| 95 | +npm install --ignore-scripts # Done already |
| 96 | +# Fix symlink: Already done |
| 97 | +npm run prepare |
| 98 | +date && npm run bench -- --profile=profiles/param-sweep-insert-list.json 2>&1 | \ |
| 99 | + tee ../experiments/experiment-002b-output.txt && date |
| 100 | +``` |
| 101 | + |
| 102 | +### Timeline |
| 103 | + |
| 104 | +- **Start:** [Fill in timestamp] |
| 105 | +- **End:** [Fill in timestamp] |
| 106 | +- **Duration:** [Expected: 25-35 minutes] |
| 107 | + |
| 108 | +## Actual Results |
| 109 | + |
| 110 | +### Raw Data |
| 111 | + |
| 112 | +See `experiments/experiment-002b-output.txt` for full benchmark output. |
| 113 | + |
| 114 | +``` |
| 115 | +[Paste results table from benchmark] |
| 116 | +``` |
| 117 | + |
| 118 | +### Key Metrics |
| 119 | + |
| 120 | +| insert_list | Build Time (s) | Recall@10 (%) | QPS | Δ from Expected | |
| 121 | +| ----------- | -------------- | ------------- | --- | --------------- | |
| 122 | +| 50 | [X] | [X]% | [X] | [±N%] | |
| 123 | +| 75 | [X] | [X]% | [X] | [±N%] | |
| 124 | +| 100 (base) | [X] | [X]% | [X] | [±N%] | |
| 125 | +| 150 | [X] | [X]% | [X] | [±N%] | |
| 126 | +| 200 | [X] | [X]% | [X] | [±N%] | |
| 127 | + |
| 128 | +### Recall Plateau Analysis |
| 129 | + |
| 130 | +[Plot or describe where recall stops improving significantly] |
| 131 | + |
| 132 | +**Plateau point:** insert_list=[X] (recall stops improving beyond this) |
| 133 | + |
| 134 | +### Build Time Efficiency |
| 135 | + |
| 136 | +[Calculate recall improvement per second of build time] |
| 137 | + |
| 138 | +| insert_list | Recall/BuildTime Ratio | Efficiency vs 100 | |
| 139 | +| ----------- | ---------------------- | ----------------- | |
| 140 | +| 50 | [X] | [±N%] | |
| 141 | +| 75 | [X] | [±N%] | |
| 142 | +| 100 | [X] | baseline | |
| 143 | +| 150 | [X] | [±N%] | |
| 144 | +| 200 | [X] | [±N%] | |
| 145 | + |
| 146 | +### Anomalies |
| 147 | + |
| 148 | +[Note anything unexpected] |
| 149 | + |
| 150 | +## Analysis |
| 151 | + |
| 152 | +### Hypothesis Validation |
| 153 | + |
| 154 | +✅ **Confirmed:** [What matched predictions about plateau point] |
| 155 | +❌ **Refuted:** [What didn't match] |
| 156 | +❓ **Unclear:** [Ambiguous results] |
| 157 | + |
| 158 | +### Key Insights |
| 159 | + |
| 160 | +1. **Optimal value:** [Best insert_list_size for recall vs build time tradeoff] |
| 161 | +2. **Plateau behavior:** [At what point does recall stop improving?] |
| 162 | +3. **libSQL comparison:** [Is their 75 value justified by our data?] |
| 163 | + |
| 164 | +### Confounding Factors |
| 165 | + |
| 166 | +- Parallel experiments (exp003, exp004) - I/O contention |
| 167 | +- Cache enabled (from Exp 001) - may mask I/O-based build time differences |
| 168 | +- Dataset size 50k vs baseline 25k - not direct comparison |
| 169 | +- [Any other factors] |
| 170 | + |
| 171 | +## Conclusions |
| 172 | + |
| 173 | +### Summary |
| 174 | + |
| 175 | +[2-3 sentences: What's the optimal insert_list_size? Should we change the default?] |
| 176 | + |
| 177 | +### Impact on Recommendations |
| 178 | + |
| 179 | +- **Update defaults?** |
| 180 | + - If optimal != 100: Change `DEFAULT_INSERT_LIST_SIZE` in `src/diskann_api.c:25` |
| 181 | + - Document reasoning in code comment |
| 182 | + |
| 183 | +- **Update documentation:** |
| 184 | + - Update PARAMETERS.md with recall plateau data |
| 185 | + - Add guidance: "Use insert_list=X for datasets <Yk, Y for >Yk" |
| 186 | + |
| 187 | +- **User guidance:** |
| 188 | + - Fast build (lower recall): insert_list=50-75 |
| 189 | + - Balanced (recommended): insert_list=[optimal value] |
| 190 | + - Maximum recall (slow): insert_list=150-200 |
| 191 | + |
| 192 | +### Limitations |
| 193 | + |
| 194 | +- Only tested synthetic data at one scale (50k) |
| 195 | +- Cache may be masking true I/O cost differences |
| 196 | +- Real embeddings may have different connectivity requirements |
| 197 | + |
| 198 | +### Follow-up Questions |
| 199 | + |
| 200 | +1. Does optimal insert_list_size vary with dataset size? (test at 100k, 200k) |
| 201 | +2. Does max_neighbors (Exp 003 result) affect optimal insert_list? |
| 202 | +3. Can we predict optimal value analytically from dimensions and dataset size? |
| 203 | + |
| 204 | +## Next Steps |
| 205 | + |
| 206 | +- [ ] If optimal != 100: Update `DEFAULT_INSERT_LIST_SIZE` in `src/diskann_api.c` |
| 207 | +- [ ] Update `PARAMETERS.md` with plateau curve |
| 208 | +- [ ] Document recall vs build time tradeoff in README.md |
| 209 | +- [ ] Consider combined sweep: (max_neighbors, insert_list) grid search |
| 210 | +- [ ] Update experiments/README.md index |
| 211 | + |
| 212 | +## Artifacts |
| 213 | + |
| 214 | +- **Benchmark profile:** `benchmarks/profiles/param-sweep-insert-list.json` |
| 215 | +- **Raw output:** `experiments/experiment-002b-output.txt` |
| 216 | +- **Results JSON:** `benchmarks/results/results-*.json` (5 files) |
| 217 | + |
| 218 | +## References |
| 219 | + |
| 220 | +- Prior experiment: experiment-001 (established insert_list=100 default) |
| 221 | +- libSQL default: insert_list=75 (source: libSQL codebase) |
| 222 | +- TPP: `_todo/20260211-build-speed-optimization.md` |
| 223 | + |
| 224 | +--- |
| 225 | + |
| 226 | +**Lessons for Future Experiments:** |
| 227 | + |
| 228 | +[After completion, note insights about parameter sweeps, identifying plateaus, etc.] |
0 commit comments