Skip to content

Commit 7a9268c

Browse files
committed
docs: update benchmark documentation for ArcadeDB and FAISS, enhancing clarity and detail
1 parent 730a31e commit 7a9268c

1 file changed

Lines changed: 102 additions & 36 deletions

File tree

  • bindings/python/examples/benchmark-vector
Lines changed: 102 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,85 +1,151 @@
11
# Vector Search Benchmark: ArcadeDB (JVector) vs FAISS
22

3-
This benchmark compares the performance of **ArcadeDB's Vector Index** (based on JVector + LSM Tree) against **FAISS** (Facebook AI Similarity Search) using standard ANN datasets.
3+
This benchmark compares the performance of **ArcadeDB's Vector Index** (based on
4+
JVector + LSM Tree) against **FAISS** (Facebook AI Similarity Search) using standard ANN
5+
datasets.
46

57
## 1. Algorithms Tested
68

79
We evaluated the following vector index implementations:
810

9-
* **ArcadeDB (JVector + LSM)**: ArcadeDB uses JVector for graph-based vector indexing, integrated with an LSM-tree architecture to provide transactional, persistent, and database-like capabilities. JVector combines the best of **HNSW** (Hierarchical Navigable Small World) and **DiskANN** algorithms to offer high performance on disk-based indexes.
10-
* **FAISS**: We tested four popular index types:
11-
* `HNSW` (Hierarchical Navigable Small World) - Graph-based.
12-
* `HNSW_PQ` (HNSW with Product Quantization) - Graph + Compressed.
13-
* `IVF_FLAT` (Inverted File with Flat vectors) - Quantization-based.
14-
* `IVF_PQ` (Inverted File with Product Quantization) - Compressed.
11+
- **ArcadeDB (JVector + LSM)**: ArcadeDB uses JVector for graph-based vector indexing,
12+
integrated with an LSM-tree architecture to provide transactional, persistent, and
13+
database-like capabilities. JVector combines the best of **HNSW** (Hierarchical
14+
Navigable Small World) and **DiskANN** algorithms to offer high performance on
15+
disk-based indexes.
16+
- **FAISS**: We tested four popular index types:
17+
- `HNSW` (Hierarchical Navigable Small World) - Graph-based.
18+
- `HNSW_PQ` (HNSW with Product Quantization) - Graph + Compressed.
19+
- `IVF_FLAT` (Inverted File with Flat vectors) - Quantization-based.
20+
- `IVF_PQ` (Inverted File with Product Quantization) - Compressed.
1521

1622
## 2. Datasets
1723

1824
We used two widely recognized datasets from `ann-benchmarks`:
1925

2026
1. **SIFT-128-Euclidean**
21-
* **Vectors**: 1,000,000
22-
* **Dimensions**: 128
23-
* **Metric**: Euclidean Distance
24-
* **Difficulty**: Moderate.
27+
28+
- **Vectors**: 1,000,000
29+
- **Dimensions**: 128
30+
- **Metric**: Euclidean Distance
31+
- **Difficulty**: Moderate.
2532

2633
2. **GloVe-100-Angular**
27-
* **Vectors**: ~1.2 Million (1,183,514)
28-
* **Dimensions**: 100
29-
* **Metric**: Cosine Similarity
30-
* **Difficulty**: Hard. As seen in the results, all algorithms achieve lower recall values compared to SIFT for the same parameters.
34+
- **Vectors**: ~1.2 Million (1,183,514)
35+
- **Dimensions**: 100
36+
- **Metric**: Cosine Similarity
37+
- **Difficulty**: Hard. As seen in the results, all algorithms achieve lower
38+
recall values compared to SIFT for the same parameters.
3139

3240
## 3. Hardware Environment
3341

3442
All benchmarks were executed on the following hardware:
3543

36-
* **CPU**: AMD Ryzen 9 7950X 16-Core Processor
37-
* **RAM**: 128 GB DDR5 (4×32 GB) at 3600 MT/s (Corsair)
38-
* **Disk**: Samsung SSD 970 EVO Plus 2TB
39-
* **GPU**: None (All benchmarks ran on CPU)
44+
- **CPU**: AMD Ryzen 9 7950X 16-Core Processor
45+
- **RAM**: 128 GB DDR5 (4×32 GB) at 3600 MT/s (Corsair)
46+
- **Disk**: Samsung SSD 970 EVO Plus 2TB
47+
- **GPU**: None (All benchmarks ran on CPU)
4048

4149
## 4. Benchmark Results
4250

43-
The following figures visualize the trade-off between **Recall@10** and **Latency (ms)**.
51+
The following figures visualize the trade-off between **Recall@10** and **Latency
52+
(ms)**.
4453

45-
* **X-Axis (Recall)**: Higher is better (Right).
46-
* **Y-Axis (Latency)**: Lower is better (Down).
47-
* **Goal**: The ideal performance is in the **bottom-right corner** (High Recall, Low Latency).
54+
- **X-Axis (Recall)**: Higher is better (Right).
55+
- **Y-Axis (Latency)**: Lower is better (Down).
56+
- **Goal**: The ideal performance is in the **bottom-right corner** (High Recall, Low
57+
Latency).
4858

49-
Each dot represents a specific configuration (parameter set) for an algorithm. We use scatter plots because connecting dots with lines implies a continuum that doesn't strictly exist across different discrete parameter combinations (e.g., `max_connections`, `ef_construction`, `nprobe`).
59+
Each dot represents a specific configuration (parameter set) for an algorithm. We use
60+
scatter plots because connecting dots with lines implies a continuum that doesn't
61+
strictly exist across different discrete parameter combinations (e.g.,
62+
`max_connections`, `ef_construction`, `nprobe`).
5063

51-
For detailed parameter values and raw metrics, please refer to the markdown files in the [`./results/`](./results/) directory.
64+
For detailed parameter values and raw metrics, please refer to the markdown files in the
65+
[`./results/`](./results/) directory.
5266

5367
### Note on Legend Metrics
5468

55-
The legend in the figures displays **Peak Memory** and **Avg Build** time. These metrics should be interpreted with the following context:
69+
The legend in the figures displays **Peak Memory** and **Avg Build** time. These metrics
70+
should be interpreted with the following context:
5671

57-
* **Peak Memory**: This represents the **global maximum RSS** (Resident Set Size) observed during the entire benchmark run for that algorithm. Since the script iterates through multiple parameter configurations (some heavier than others) in a single run, this value reflects the high-water mark of the most resource-intensive configuration, not necessarily the specific memory usage for every data point shown.
58-
* **Avg Build**: This is the **arithmetic mean** of the build times across all configurations tested for that algorithm. As build time varies significantly with parameters (e.g., `max_connections`, `ef_construction`), this serves as a general ballpark figure rather than a precise measurement for each specific point.
72+
- **Peak Memory**: This represents the **global maximum RSS** (Resident Set Size)
73+
observed during the entire benchmark run for that algorithm. Since the script
74+
iterates through multiple parameter configurations (some heavier than others) in a
75+
single run, this value reflects the high-water mark of the most resource-intensive
76+
configuration, not necessarily the specific memory usage for every data point shown.
77+
- **Avg Build**: This is the **arithmetic mean** of the build times across all
78+
configurations tested for that algorithm. As build time varies significantly with
79+
parameters (e.g., `max_connections`, `ef_construction`), this serves as a general
80+
ballpark figure rather than a precise measurement for each specific point.
5981

6082
### SIFT-128-Euclidean Results
6183

6284
![SIFT Results](figures/plot_sift-128-euclidean.png)
63-
*(PDF version: [figures/plot_sift-128-euclidean.pdf](figures/plot_sift-128-euclidean.pdf))*
85+
_(PDF version: [figures/plot_sift-128-euclidean.pdf](figures/plot_sift-128-euclidean.pdf))_
6486

6587
### GloVe-100-Angular Results
6688

6789
![GloVe Results](figures/plot_glove-100-angular.png)
68-
*(PDF version: [figures/plot_glove-100-angular.pdf](figures/plot_glove-100-angular.pdf))*
90+
_(PDF version: [figures/plot_glove-100-angular.pdf](figures/plot_glove-100-angular.pdf))_
6991

7092
## 5. ArcadeDB Configuration
7193

72-
For ArcadeDB, we selected the following default configuration which offers a balanced trade-off between build time, memory usage, and search performance:
94+
For ArcadeDB, we selected the following default configuration which offers a balanced
95+
trade-off between build time, memory usage, and search performance:
7396

7497
```python
7598
max_connections = 32
7699
beam_width = 200
77100
overquery_factor = 16
78101
```
79-
**Note on `overquery_factor`**: Unlike FAISS or standard HNSW implementations, JVector does not use an `ef` (or `efSearch`) parameter during search. Instead, we implemented an **"overquery"** mechanism. This retrieves `k * overquery_factor` candidates from the index, sorts them by exact similarity, and returns the top `k`. This allows trading off latency for higher recall.
102+
103+
**Note on Quantization**: No quantization (PQ/SQ) was used for the ArcadeDB JVector
104+
benchmarks. Quantization support is currently a Work In Progress (WIP) in the core Java
105+
engine.
106+
107+
**Note on `overquery_factor`**: Unlike FAISS or standard HNSW implementations, JVector
108+
does not use an `ef` (or `efSearch`) parameter during search. Instead, we implemented an
109+
**"overquery"** mechanism. This retrieves `k * overquery_factor` candidates from the
110+
index, sorts them by exact similarity, and returns the top `k`. This allows trading off
111+
latency for higher recall.
112+
113+
**Note on Build Time (Lazy Indexing)**: JVector employs lazy indexing, meaning the
114+
initial index object creation is nearly instantaneous. To capture the true cost of
115+
building the graph, our benchmark includes a "warmup" phase that triggers the actual
116+
indexing process. The reported **Build Time** for ArcadeDB is calculated as: `Index
117+
Creation Time + Warmup Time`.
118+
119+
**Note on Memory Usage**: The ArcadeDB benchmark was executed with a JVM heap limit of
120+
`ARCADEDB_JVM_MAX_HEAP='16g'`. However, we observed that the actual Resident Set Size
121+
(RSS) memory consumption exceeded this limit significantly, reaching as high as **41GB**
122+
in some test cases. This discrepancy suggests significant off-heap memory usage or other
123+
overheads that require further investigation in the future.
124+
80125
On the **GloVe-100-Angular** dataset (~1.2M vectors), this configuration achieved:
81-
* **Recall@10**: 0.8538
82-
* **Average Latency**: 36ms
83-
* **Build Time**: ~530 seconds
84126

85-
We consider this "quite decent" for a persistent, disk-based vector store compared to purely in-memory libraries.
127+
- **Recall@10**: 0.8538
128+
- **Average Latency**: 36ms
129+
- **Build Time**: ~530 seconds
130+
131+
We consider this "quite decent" for a persistent, disk-based vector store compared to
132+
purely in-memory libraries.
133+
134+
## 6. Persistence & Stability Observations
135+
136+
We explicitly tested the persistence of the vector index by closing and reopening the
137+
ArcadeDB database during the benchmark.
138+
139+
1. **Persistence Verified**: The index correctly persists to disk. We observed that
140+
**query latency remained consistent** before and after reopening the database,
141+
confirming that the index structure is preserved and loaded efficiently without
142+
needing a rebuild.
143+
2. **Recall Stability**:
144+
- **GloVe-100-Angular**: Recall values remained identical before and after the
145+
database restart, as expected.
146+
- **SIFT-128-Euclidean**: We observed a discrepancy in recall values before and
147+
after the restart. While usually small, the difference can sometimes be as high
148+
as **0.1**. The cause of this non-determinism for the Euclidean metric is
149+
currently unknown. However, since our primary production use cases rely on
150+
Cosine similarity (Angular), we have decided to deprioritize investigating this
151+
specific issue for now.

0 commit comments

Comments
 (0)