Skip to content

Commit 60897ba

Browse files
committed
feat: add benchmark summary for stackoverflow-small dataset
1 parent 28e2f52 commit 60897ba

1 file changed

Lines changed: 45 additions & 0 deletions

File tree

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Vector Backend Benchmark Summary (stackoverflow-small)
2+
3+
## Workload at a glance
4+
5+
- Dataset: **stackoverflow-small**
6+
- Vector count: **around 300k** (300,424 rows)
7+
- Embedding model: **all-MiniLM-L6-v2**
8+
- Vector dimension: **384**
9+
- Query workload: 1,000 queries, `k=50`, overquery factor `4`
10+
- Resource budget used in these runs: `4g` memory, `4` threads
11+
12+
## Execution setup
13+
14+
- All four experiments were run in Docker-isolated environments.
15+
- ArcadeDB is the only truly embedded backend in this comparison; Milvus, Qdrant, and pgvector are client/server backends, and for those runs the memory and thread budgets were split server/client at `0.8/0.2`.
16+
17+
## Build / ingest results
18+
19+
| Backend | Total build time (s) | DB disk size | Peak RSS (MB) |
20+
| -------- | -------------------: | -----------: | ------------: |
21+
| ArcadeDB | 726.99 | 673M | 3014.37 |
22+
| pgvector | 311.06 | 2.1G | 2117.36 |
23+
| Milvus | 61.59 | 1.9G | 1245.32 |
24+
| Qdrant | 128.16 | 737M | 1286.73 |
25+
26+
## Search results (overquery factor = 4)
27+
28+
| Backend | Search time for 1,000 queries (s) | Mean latency (ms) | P95 latency (ms) | Mean recall |
29+
| -------- | --------------------------------: | ----------------: | ---------------: | ----------: |
30+
| ArcadeDB | 13.36 | 12.80 | 16.71 | 0.96406 |
31+
| pgvector | 13.42 | 13.35 | 23.84 | 0.98840 |
32+
| Milvus | 15.81 | 14.94 | 35.01 | 0.9577 |
33+
| Qdrant | 52.13 | 51.80 | 83.50 | 0.98548 |
34+
35+
## HNSW details
36+
37+
- For a simple apples-to-apples setup, focus on `M`, `efConstruction`, and `efSearch`; in this benchmark the other three backends used effective `efSearch=200`, while ArcadeDB used overquery factor `4` as the practical counterpart (with `M=16` and `efConstruction=100` on the HNSW-based backends).
38+
- ArcadeDB vector indexing in this benchmark uses the Java **JVector** library with an **LSM**-based structure.
39+
40+
## Notes and quick interpretation
41+
42+
- **Fastest build** in this run: **Milvus**.
43+
- **Smallest on-disk DB** in this run: **ArcadeDB** (673M), then **Qdrant** (737M).
44+
- **Best recall** in this run: **pgvector** (0.9884), then **Qdrant** (0.98548).
45+
- **Lowest mean query latency** in this run: **ArcadeDB** (12.80ms), very close to **pgvector** (13.35ms).

0 commit comments

Comments
 (0)