|
| 1 | +# Vector Backend Benchmark Summary (stackoverflow-small) |
| 2 | + |
| 3 | +## Workload at a glance |
| 4 | + |
| 5 | +- Dataset: **stackoverflow-small** |
| 6 | +- Vector count: **around 300k** (300,424 rows) |
| 7 | +- Embedding model: **all-MiniLM-L6-v2** |
| 8 | +- Vector dimension: **384** |
| 9 | +- Query workload: 1,000 queries, `k=50`, overquery factor `4` |
| 10 | +- Resource budget used in these runs: `4g` memory, `4` threads |
| 11 | + |
| 12 | +## Execution setup |
| 13 | + |
| 14 | +- All four experiments were run in Docker-isolated environments. |
| 15 | +- ArcadeDB is the only truly embedded backend in this comparison; Milvus, Qdrant, and pgvector are client/server backends, and for those runs the memory and thread budgets were split server/client at `0.8/0.2`. |
| 16 | + |
| 17 | +## Build / ingest results |
| 18 | + |
| 19 | +| Backend | Total build time (s) | DB disk size | Peak RSS (MB) | |
| 20 | +| -------- | -------------------: | -----------: | ------------: | |
| 21 | +| ArcadeDB | 726.99 | 673M | 3014.37 | |
| 22 | +| pgvector | 311.06 | 2.1G | 2117.36 | |
| 23 | +| Milvus | 61.59 | 1.9G | 1245.32 | |
| 24 | +| Qdrant | 128.16 | 737M | 1286.73 | |
| 25 | + |
| 26 | +## Search results (overquery factor = 4) |
| 27 | + |
| 28 | +| Backend | Search time for 1,000 queries (s) | Mean latency (ms) | P95 latency (ms) | Mean recall | |
| 29 | +| -------- | --------------------------------: | ----------------: | ---------------: | ----------: | |
| 30 | +| ArcadeDB | 13.36 | 12.80 | 16.71 | 0.96406 | |
| 31 | +| pgvector | 13.42 | 13.35 | 23.84 | 0.98840 | |
| 32 | +| Milvus | 15.81 | 14.94 | 35.01 | 0.9577 | |
| 33 | +| Qdrant | 52.13 | 51.80 | 83.50 | 0.98548 | |
| 34 | + |
| 35 | +## HNSW details |
| 36 | + |
| 37 | +- For a simple apples-to-apples setup, focus on `M`, `efConstruction`, and `efSearch`; in this benchmark the other three backends used effective `efSearch=200`, while ArcadeDB used overquery factor `4` as the practical counterpart (with `M=16` and `efConstruction=100` on the HNSW-based backends). |
| 38 | +- ArcadeDB vector indexing in this benchmark uses the Java **JVector** library with an **LSM**-based structure. |
| 39 | + |
| 40 | +## Notes and quick interpretation |
| 41 | + |
| 42 | +- **Fastest build** in this run: **Milvus**. |
| 43 | +- **Smallest on-disk DB** in this run: **ArcadeDB** (673M), then **Qdrant** (737M). |
| 44 | +- **Best recall** in this run: **pgvector** (0.9884), then **Qdrant** (0.98548). |
| 45 | +- **Lowest mean query latency** in this run: **ArcadeDB** (12.80ms), very close to **pgvector** (13.35ms). |
0 commit comments