Date: YYYY-MM-DD
Engineer: [Your Name]
Status: Planned
Related Issues: #NNN
Git Commit: git rev-parse HEAD
[Clear, falsifiable prediction. Example: "Reducing max_neighbors from 32 to 24 will decrease index size by ~30% with <2% recall loss"]
Reasoning: [Why we believe this will happen based on theory/prior work]
Problem: [What issue are we trying to solve?]
Why now: [What triggered this experiment?]
Success criteria: [How will we know if this worked?]
| Parameter | Baseline | Test Value(s) | Range Rationale |
|---|---|---|---|
| max_neighbors | 32 | [24, 48, 64] | Industry standard range |
- Size: 50,000 vectors
- Dimensions: 256
- Metric: Cosine
- Source: Synthetic (random, seed=42)
- CPU: [model, cores]
- RAM: [amount]
- Disk: [SSD/HDD, model]
- OS: [version]
- Control: Current defaults (max_neighbors=32, insert_list=100)
- Baseline metrics: [From previous experiment or current state]
benchmarks/profiles/experiment-XXX-title.json
| Metric | Baseline | Expected Change | Confidence |
|---|---|---|---|
| Build Time (s) | 432 | -50s (-12%) | Medium |
| Index Size (MB) | 988 | -296 (-30%) | High |
| Recall@10 (%) | 99.2 | -1.5% | Medium |
| QPS | 82 | +5 (+6%) | Low |
Key prediction: [Most important expected outcome]
Risk: [What could go wrong?]
cd benchmarks
npm run bench -- --profile=profiles/experiment-XXX-title.json > \
../experiments/experiment-XXX-output.txt 2>&1- Start: [timestamp]
- End: [timestamp]
- Duration: [minutes]
[Link to output file or paste table here]
[Benchmark output table]
| Metric | Baseline | Actual | Expected | Δ from Expected |
|---|---|---|---|---|
| Build Time (s) | 432 | [X] | 382 | [±N%] |
| Index Size (MB) | 988 | [X] | 692 | [±N%] |
| Recall@10 (%) | 99.2 | [X] | 97.7 | [±N%] |
| QPS | 82 | [X] | 87 | [±N%] |
[Attach graphs, screenshots]
[Anything unexpected during execution]
✅ Confirmed: [What matched predictions] ❌ Refuted: [What didn't match] ❓ Unclear: [Ambiguous results]
- [Most important finding]
- [Secondary finding]
- [Surprising discovery]
- Sample size: N trials
- Standard deviation: [if multiple runs]
- p-value: [if comparison test run]
[What else might have influenced results?]
[2-3 sentence summary of what we learned]
- Update defaults? [Yes/No, which parameters, to what values]
- Update documentation? [Which files need changes]
- User guidance? [New recommendations for when to use these settings]
[What this experiment didn't test or answer]
- [Question raised by results]
- [Next thing to investigate]
- Update
DEFAULT_Xinsrc/diskann_api.cif defaults changed - Update
PARAMETERS.mdwith findings - Document in README.md use case guidance
- Run follow-up experiment XXX if needed
- Close related issue #NNN
- Benchmark profile:
benchmarks/profiles/experiment-XXX-title.json - Raw output:
experiments/experiment-XXX-output.txt - Results JSON:
results/experiment-XXX-[timestamp].json - Graphs:
experiments/graphs/experiment-XXX-*.png
- Prior experiment: [Link to related experiment]
- Paper: [Relevant research if applicable]
- Discussion: [GitHub issue, PR, or internal doc]
Lessons for Future Experiments:
[What would you do differently next time? What worked well?]