photostructure
diff --git a/‎experiments/experiment-002b-insert-list-sweep.md‎
Lines changed: 228 additions & 0 deletions b/‎experiments/experiment-002b-insert-list-sweep.md‎
Lines changed: 228 additions & 0 deletions
diff --git a/‎experiments/experiment-002b-output.txt‎
Lines changed: 97 additions & 0 deletions b/‎experiments/experiment-002b-output.txt‎
Lines changed: 97 additions & 0 deletions
@@ -0,0 +1,228 @@
+# Experiment 002b: insert_list_size Parameter Sweep
+
+**Date:** 2026-02-11
+**Engineer:** [Fill in your name]
+**Status:** In Progress
+**Git Commit:** `1398e9a`
+
+## Hypothesis
+
+There exists an optimal insert_list_size value where recall plateaus - increasing beyond this point wastes build time without improving recall.
+
+**Reasoning:**
+
+- insert_list_size controls candidate pool during graph construction
+- Higher values = more exploration = better graph connectivity = higher recall
+- But diminishing returns: recall plateaus when graph is "well-connected enough"
+- libSQL uses 75, we currently use 100 (from Exp 001)
+- Hypothesis: Recall plateaus around 100-150 for 50k vectors @ 256D
+
+## Motivation
+
+**Problem:** Build time directly proportional to insert_list_size. Need to find minimum value that achieves target recall (≥95%).
+
+**Why now:** Exp 001 showed only 2% build time improvement from 200→100, suggesting we may already be near optimal. Need to validate across full parameter range.
+
+**Success criteria:**
+
+- Identify plateau point where recall stops improving
+- Validate insert_list_size=100 is optimal (or find better default)
+- Document recall vs build time tradeoff curve
+
+## Test Setup
+
+### Parameters Under Test
+
+| Parameter        | Baseline | Test Values                       | Range Rationale            |
+| ---------------- | -------- | --------------------------------- | -------------------------- |
+| insert_list_size | 100      | [50, 75, 100, 150, 200]           | libSQL=75, old default=200 |
+| dimensions       | 256      | (fixed)                           | Representative             |
+| max_neighbors    | 32       | (fixed, may change after Exp 003) | Current default            |
+| search_list      | 100      | (fixed)                           | Consistent with insert     |
+
+### Dataset
+
+- **Size:** 50,000 vectors
+- **Dimensions:** 256
+- **Metric:** Cosine
+- **Source:** Synthetic (random, seed=42)
+
+### Hardware
+
+- **CPU:** AMD Ryzen 9 5950X (16 cores, 32 threads)
+- **RAM:** 62 GB
+- **Disk:** NVMe SSD (912GB capacity, 38% used)
+- **OS:** Ubuntu 24.04, Linux 6.17.0-14-generic
+
+### Comparison Baseline
+
+- **Control:** insert_list_size=100 (current default)
+- **Baseline:** From 25k @ insert_list=100 (Exp 001):
+  - Build time: 432s
+  - Recall@10: 99.2%
+  - QPS: 82
+
+### Benchmark Profile
+
+`benchmarks/profiles/param-sweep-insert-list.json`
+
+## Expected Results
+
+| insert_list | Build Time (s) | Recall@10 (%) | QPS | Notes                          |
+| ----------- | -------------- | ------------- | --- | ------------------------------ |
+| 50          | 440 (−50%)     | 95-96%        | 90  | Too low? Graph may fragment    |
+| 75          | 660 (−25%)     | 98%           | 87  | libSQL default, should be good |
+| 100 (base)  | 880            | 99%           | 85  | Current default                |
+| 150         | 1320 (+50%)    | 99.2%         | 82  | Diminishing returns start      |
+| 200         | 1760 (+100%)   | 99.5%         | 80  | Plateau - marginal improvement |
+
+**Key prediction:** Recall plateaus between 100-150. Optimal is likely 75-100 range.
+
+**Risk:**
+
+- insert_list=50 may be too aggressive, causing recall <95%
+- I/O contention from parallel experiments may affect build times
+- Cache (from Exp 001) may mask build time differences
+
+## Execution
+
+### Commands Run
+
+```bash
+cd /home/mrm/src/sqlite-diskann-experiments/exp002b-insert-list
+cd benchmarks
+rm -rf datasets/synthetic/*.db
+npm install --ignore-scripts  # Done already
+# Fix symlink: Already done
+npm run prepare
+date && npm run bench -- --profile=profiles/param-sweep-insert-list.json 2>&1 | \
+  tee ../experiments/experiment-002b-output.txt && date
+```
+
+### Timeline
+
+- **Start:** [Fill in timestamp]
+- **End:** [Fill in timestamp]
+- **Duration:** [Expected: 25-35 minutes]
+
+## Actual Results
+
+### Raw Data
+
+See `experiments/experiment-002b-output.txt` for full benchmark output.
+
+```
+[Paste results table from benchmark]
+```
+
+### Key Metrics
+
+| insert_list | Build Time (s) | Recall@10 (%) | QPS | Δ from Expected |
+| ----------- | -------------- | ------------- | --- | --------------- |
+| 50          | [X]            | [X]%          | [X] | [±N%]           |
+| 75          | [X]            | [X]%          | [X] | [±N%]           |
+| 100 (base)  | [X]            | [X]%          | [X] | [±N%]           |
+| 150         | [X]            | [X]%          | [X] | [±N%]           |
+| 200         | [X]            | [X]%          | [X] | [±N%]           |
+
+### Recall Plateau Analysis
+
+[Plot or describe where recall stops improving significantly]
+
+**Plateau point:** insert_list=[X] (recall stops improving beyond this)
+
+### Build Time Efficiency
+
+[Calculate recall improvement per second of build time]
+
+| insert_list | Recall/BuildTime Ratio | Efficiency vs 100 |
+| ----------- | ---------------------- | ----------------- |
+| 50          | [X]                    | [±N%]             |
+| 75          | [X]                    | [±N%]             |
+| 100         | [X]                    | baseline          |
+| 150         | [X]                    | [±N%]             |
+| 200         | [X]                    | [±N%]             |
+
+### Anomalies
+
+[Note anything unexpected]
+
+## Analysis
+
+### Hypothesis Validation
+
+✅ **Confirmed:** [What matched predictions about plateau point]
+❌ **Refuted:** [What didn't match]
+❓ **Unclear:** [Ambiguous results]
+
+### Key Insights
+
+1. **Optimal value:** [Best insert_list_size for recall vs build time tradeoff]
+2. **Plateau behavior:** [At what point does recall stop improving?]
+3. **libSQL comparison:** [Is their 75 value justified by our data?]
+
+### Confounding Factors
+
+- Parallel experiments (exp003, exp004) - I/O contention
+- Cache enabled (from Exp 001) - may mask I/O-based build time differences
+- Dataset size 50k vs baseline 25k - not direct comparison
+- [Any other factors]
+
+## Conclusions
+
+### Summary
+
+[2-3 sentences: What's the optimal insert_list_size? Should we change the default?]
+
+### Impact on Recommendations
+
+- **Update defaults?**
+  - If optimal != 100: Change `DEFAULT_INSERT_LIST_SIZE` in `src/diskann_api.c:25`
+  - Document reasoning in code comment
+
+- **Update documentation:**
+  - Update PARAMETERS.md with recall plateau data
+  - Add guidance: "Use insert_list=X for datasets <Yk, Y for >Yk"
+
+- **User guidance:**
+  - Fast build (lower recall): insert_list=50-75
+  - Balanced (recommended): insert_list=[optimal value]
+  - Maximum recall (slow): insert_list=150-200
+
+### Limitations
+
+- Only tested synthetic data at one scale (50k)
+- Cache may be masking true I/O cost differences
+- Real embeddings may have different connectivity requirements
+
+### Follow-up Questions
+
+1. Does optimal insert_list_size vary with dataset size? (test at 100k, 200k)
+2. Does max_neighbors (Exp 003 result) affect optimal insert_list?
+3. Can we predict optimal value analytically from dimensions and dataset size?
+
+## Next Steps
+
+- [ ] If optimal != 100: Update `DEFAULT_INSERT_LIST_SIZE` in `src/diskann_api.c`
+- [ ] Update `PARAMETERS.md` with plateau curve
+- [ ] Document recall vs build time tradeoff in README.md
+- [ ] Consider combined sweep: (max_neighbors, insert_list) grid search
+- [ ] Update experiments/README.md index
+
+## Artifacts
+
+- **Benchmark profile:** `benchmarks/profiles/param-sweep-insert-list.json`
+- **Raw output:** `experiments/experiment-002b-output.txt`
+- **Results JSON:** `benchmarks/results/results-*.json` (5 files)
+
+## References
+
+- Prior experiment: experiment-001 (established insert_list=100 default)
+- libSQL default: insert_list=75 (source: libSQL codebase)
+- TPP: `_todo/20260211-build-speed-optimization.md`
+
+---
+
+**Lessons for Future Experiments:**
+
+[After completion, note insights about parameter sweeps, identifying plateaus, etc.]
@@ -0,0 +1,97 @@
+
+> sqlite-diskann-benchmarks@0.1.0 bench
+> tsx scripts/run-benchmark.ts profiles/param-sweep-insert-list.json
+
+
+Loading benchmark profile: profiles/param-sweep-insert-list.json
+
+Benchmark: Parameter Sweep: insert_list_size
+
+
+=== Loading dataset: datasets/synthetic/medium-256d-100k.bin ===
+
+Loaded 100000 vectors (256d)
+
+Computing ground truth for 100 queries (k=100)...
+Ground truth ready
+
+
+=== Benchmarking diskann ===
+
+Building index...
+  Build time: 703.1s, Index size: 3955.2 MB
+
+Warming up with 10 queries...
+Warmup complete
+
+Running searches (k=1)...
+  QPS: 222, p50: 4.21ms, Recall: 1.0%
+
+Running searches (k=10)...
+
+[DEBUG] Query 0, k=10:
+  Ground truth IDs: [
+     58, 57033, 74150,
+  60514, 23259, 27230,
+  57523,  6006, 28162,
+  18414
+]
+  DiskANN IDs: [
+     58, 23259, 27230,
+  57523,  6006, 28162,
+  18414, 10553, 11780,
+  43221
+]
+  Recall: 0.7
+  QPS: 217, p50: 4.23ms, Recall: 0.7%
+
+Running searches (k=50)...
+  QPS: 277, p50: 3.59ms, Recall: 0.5%
+
+Running searches (k=100)...
+  QPS: 269, p50: 3.46ms, Recall: 0.6%
+
+
+
+=== Benchmark Results ===
+
+Parameter Sweep: insert_list_size-k1
+
+[90m┌──────────────────[39m[90m┬────────────[39m[90m┬────────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬────────────┐[39m
+[90m│[39m[36m Library          [39m[90m│[39m[36m Build (s)  [39m[90m│[39m[36m Index (MB) [39m[90m│[39m[36m QPS      [39m[90m│[39m[36m p50 (ms) [39m[90m│[39m[36m p95 (ms) [39m[90m│[39m[36m p99 (ms) [39m[90m│[39m[36m Recall@k   [39m[90m│[39m
+[90m├──────────────────[39m[90m┼────────────[39m[90m┼────────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼────────────┤[39m
+[90m│[39m sqlite-diskann   [90m│[39m 703.1      [90m│[39m 3955.2     [90m│[39m 222      [90m│[39m 4.21     [90m│[39m 7.98     [90m│[39m 9.20     [90m│[39m 1.0%       [90m│[39m
+[90m└──────────────────[39m[90m┴────────────[39m[90m┴────────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴────────────┘[39m
+
+Parameter Sweep: insert_list_size-k10
+
+[90m┌──────────────────[39m[90m┬────────────[39m[90m┬────────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬────────────┐[39m
+[90m│[39m[36m Library          [39m[90m│[39m[36m Build (s)  [39m[90m│[39m[36m Index (MB) [39m[90m│[39m[36m QPS      [39m[90m│[39m[36m p50 (ms) [39m[90m│[39m[36m p95 (ms) [39m[90m│[39m[36m p99 (ms) [39m[90m│[39m[36m Recall@k   [39m[90m│[39m
+[90m├──────────────────[39m[90m┼────────────[39m[90m┼────────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼────────────┤[39m
+[90m│[39m sqlite-diskann   [90m│[39m 703.1      [90m│[39m 3955.2     [90m│[39m 217      [90m│[39m 4.23     [90m│[39m 7.05     [90m│[39m 8.34     [90m│[39m 0.7%       [90m│[39m
+[90m└──────────────────[39m[90m┴────────────[39m[90m┴────────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴────────────┘[39m
+
+Parameter Sweep: insert_list_size-k50
+
+[90m┌──────────────────[39m[90m┬────────────[39m[90m┬────────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬────────────┐[39m
+[90m│[39m[36m Library          [39m[90m│[39m[36m Build (s)  [39m[90m│[39m[36m Index (MB) [39m[90m│[39m[36m QPS      [39m[90m│[39m[36m p50 (ms) [39m[90m│[39m[36m p95 (ms) [39m[90m│[39m[36m p99 (ms) [39m[90m│[39m[36m Recall@k   [39m[90m│[39m
+[90m├──────────────────[39m[90m┼────────────[39m[90m┼────────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼────────────┤[39m
+[90m│[39m sqlite-diskann   [90m│[39m 703.1      [90m│[39m 3955.2     [90m│[39m 277      [90m│[39m 3.59     [90m│[39m 5.16     [90m│[39m 5.46     [90m│[39m 0.5%       [90m│[39m
+[90m└──────────────────[39m[90m┴────────────[39m[90m┴────────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴────────────┘[39m
+
+Parameter Sweep: insert_list_size-k100
+
+[90m┌──────────────────[39m[90m┬────────────[39m[90m┬────────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬──────────[39m[90m┬────────────┐[39m
+[90m│[39m[36m Library          [39m[90m│[39m[36m Build (s)  [39m[90m│[39m[36m Index (MB) [39m[90m│[39m[36m QPS      [39m[90m│[39m[36m p50 (ms) [39m[90m│[39m[36m p95 (ms) [39m[90m│[39m[36m p99 (ms) [39m[90m│[39m[36m Recall@k   [39m[90m│[39m
+[90m├──────────────────[39m[90m┼────────────[39m[90m┼────────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼──────────[39m[90m┼────────────┤[39m
+[90m│[39m sqlite-diskann   [90m│[39m 703.1      [90m│[39m 3955.2     [90m│[39m 269      [90m│[39m 3.46     [90m│[39m 5.98     [90m│[39m 7.36     [90m│[39m 0.6%       [90m│[39m
+[90m└──────────────────[39m[90m┴────────────[39m[90m┴────────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴──────────[39m[90m┴────────────┘[39m
+
+=== Key Insights ===
+
+
+
+Results exported to results-2026-02-12T03-31-10-768Z.json
+
+(node:622113) ExperimentalWarning: SQLite is an experimental feature and might change at any time
+(Use `node --trace-warnings ...` to show where the warning was created)