|
1 | 1 | # Vector Search Benchmark: ArcadeDB (JVector) vs FAISS |
2 | 2 |
|
3 | | -This benchmark compares the performance of **ArcadeDB's Vector Index** (based on JVector + LSM Tree) against **FAISS** (Facebook AI Similarity Search) using standard ANN datasets. |
| 3 | +This benchmark compares the performance of **ArcadeDB's Vector Index** (based on |
| 4 | +JVector + LSM Tree) against **FAISS** (Facebook AI Similarity Search) using standard ANN |
| 5 | +datasets. |
4 | 6 |
|
5 | 7 | ## 1. Algorithms Tested |
6 | 8 |
|
7 | 9 | We evaluated the following vector index implementations: |
8 | 10 |
|
9 | | -* **ArcadeDB (JVector + LSM)**: ArcadeDB uses JVector for graph-based vector indexing, integrated with an LSM-tree architecture to provide transactional, persistent, and database-like capabilities. JVector combines the best of **HNSW** (Hierarchical Navigable Small World) and **DiskANN** algorithms to offer high performance on disk-based indexes. |
10 | | -* **FAISS**: We tested four popular index types: |
11 | | - * `HNSW` (Hierarchical Navigable Small World) - Graph-based. |
12 | | - * `HNSW_PQ` (HNSW with Product Quantization) - Graph + Compressed. |
13 | | - * `IVF_FLAT` (Inverted File with Flat vectors) - Quantization-based. |
14 | | - * `IVF_PQ` (Inverted File with Product Quantization) - Compressed. |
| 11 | +- **ArcadeDB (JVector + LSM)**: ArcadeDB uses JVector for graph-based vector indexing, |
| 12 | + integrated with an LSM-tree architecture to provide transactional, persistent, and |
| 13 | + database-like capabilities. JVector combines the best of **HNSW** (Hierarchical |
| 14 | + Navigable Small World) and **DiskANN** algorithms to offer high performance on |
| 15 | + disk-based indexes. |
| 16 | +- **FAISS**: We tested four popular index types: |
| 17 | + - `HNSW` (Hierarchical Navigable Small World) - Graph-based. |
| 18 | + - `HNSW_PQ` (HNSW with Product Quantization) - Graph + Compressed. |
| 19 | + - `IVF_FLAT` (Inverted File with Flat vectors) - Quantization-based. |
| 20 | + - `IVF_PQ` (Inverted File with Product Quantization) - Compressed. |
15 | 21 |
|
16 | 22 | ## 2. Datasets |
17 | 23 |
|
18 | 24 | We used two widely recognized datasets from `ann-benchmarks`: |
19 | 25 |
|
20 | 26 | 1. **SIFT-128-Euclidean** |
21 | | - * **Vectors**: 1,000,000 |
22 | | - * **Dimensions**: 128 |
23 | | - * **Metric**: Euclidean Distance |
24 | | - * **Difficulty**: Moderate. |
| 27 | + |
| 28 | + - **Vectors**: 1,000,000 |
| 29 | + - **Dimensions**: 128 |
| 30 | + - **Metric**: Euclidean Distance |
| 31 | + - **Difficulty**: Moderate. |
25 | 32 |
|
26 | 33 | 2. **GloVe-100-Angular** |
27 | | - * **Vectors**: ~1.2 Million (1,183,514) |
28 | | - * **Dimensions**: 100 |
29 | | - * **Metric**: Cosine Similarity |
30 | | - * **Difficulty**: Hard. As seen in the results, all algorithms achieve lower recall values compared to SIFT for the same parameters. |
| 34 | + - **Vectors**: ~1.2 Million (1,183,514) |
| 35 | + - **Dimensions**: 100 |
| 36 | + - **Metric**: Cosine Similarity |
| 37 | + - **Difficulty**: Hard. As seen in the results, all algorithms achieve lower |
| 38 | + recall values compared to SIFT for the same parameters. |
31 | 39 |
|
32 | 40 | ## 3. Hardware Environment |
33 | 41 |
|
34 | 42 | All benchmarks were executed on the following hardware: |
35 | 43 |
|
36 | | -* **CPU**: AMD Ryzen 9 7950X 16-Core Processor |
37 | | -* **RAM**: 128 GB DDR5 (4×32 GB) at 3600 MT/s (Corsair) |
38 | | -* **Disk**: Samsung SSD 970 EVO Plus 2TB |
39 | | -* **GPU**: None (All benchmarks ran on CPU) |
| 44 | +- **CPU**: AMD Ryzen 9 7950X 16-Core Processor |
| 45 | +- **RAM**: 128 GB DDR5 (4×32 GB) at 3600 MT/s (Corsair) |
| 46 | +- **Disk**: Samsung SSD 970 EVO Plus 2TB |
| 47 | +- **GPU**: None (All benchmarks ran on CPU) |
40 | 48 |
|
41 | 49 | ## 4. Benchmark Results |
42 | 50 |
|
43 | | -The following figures visualize the trade-off between **Recall@10** and **Latency (ms)**. |
| 51 | +The following figures visualize the trade-off between **Recall@10** and **Latency |
| 52 | +(ms)**. |
44 | 53 |
|
45 | | -* **X-Axis (Recall)**: Higher is better (Right). |
46 | | -* **Y-Axis (Latency)**: Lower is better (Down). |
47 | | -* **Goal**: The ideal performance is in the **bottom-right corner** (High Recall, Low Latency). |
| 54 | +- **X-Axis (Recall)**: Higher is better (Right). |
| 55 | +- **Y-Axis (Latency)**: Lower is better (Down). |
| 56 | +- **Goal**: The ideal performance is in the **bottom-right corner** (High Recall, Low |
| 57 | + Latency). |
48 | 58 |
|
49 | | -Each dot represents a specific configuration (parameter set) for an algorithm. We use scatter plots because connecting dots with lines implies a continuum that doesn't strictly exist across different discrete parameter combinations (e.g., `max_connections`, `ef_construction`, `nprobe`). |
| 59 | +Each dot represents a specific configuration (parameter set) for an algorithm. We use |
| 60 | +scatter plots because connecting dots with lines implies a continuum that doesn't |
| 61 | +strictly exist across different discrete parameter combinations (e.g., |
| 62 | +`max_connections`, `ef_construction`, `nprobe`). |
50 | 63 |
|
51 | | -For detailed parameter values and raw metrics, please refer to the markdown files in the [`./results/`](./results/) directory. |
| 64 | +For detailed parameter values and raw metrics, please refer to the markdown files in the |
| 65 | +[`./results/`](./results/) directory. |
52 | 66 |
|
53 | 67 | ### Note on Legend Metrics |
54 | 68 |
|
55 | | -The legend in the figures displays **Peak Memory** and **Avg Build** time. These metrics should be interpreted with the following context: |
| 69 | +The legend in the figures displays **Peak Memory** and **Avg Build** time. These metrics |
| 70 | +should be interpreted with the following context: |
56 | 71 |
|
57 | | -* **Peak Memory**: This represents the **global maximum RSS** (Resident Set Size) observed during the entire benchmark run for that algorithm. Since the script iterates through multiple parameter configurations (some heavier than others) in a single run, this value reflects the high-water mark of the most resource-intensive configuration, not necessarily the specific memory usage for every data point shown. |
58 | | -* **Avg Build**: This is the **arithmetic mean** of the build times across all configurations tested for that algorithm. As build time varies significantly with parameters (e.g., `max_connections`, `ef_construction`), this serves as a general ballpark figure rather than a precise measurement for each specific point. |
| 72 | +- **Peak Memory**: This represents the **global maximum RSS** (Resident Set Size) |
| 73 | + observed during the entire benchmark run for that algorithm. Since the script |
| 74 | + iterates through multiple parameter configurations (some heavier than others) in a |
| 75 | + single run, this value reflects the high-water mark of the most resource-intensive |
| 76 | + configuration, not necessarily the specific memory usage for every data point shown. |
| 77 | +- **Avg Build**: This is the **arithmetic mean** of the build times across all |
| 78 | + configurations tested for that algorithm. As build time varies significantly with |
| 79 | + parameters (e.g., `max_connections`, `ef_construction`), this serves as a general |
| 80 | + ballpark figure rather than a precise measurement for each specific point. |
59 | 81 |
|
60 | 82 | ### SIFT-128-Euclidean Results |
61 | 83 |
|
62 | 84 |  |
63 | | -*(PDF version: [figures/plot_sift-128-euclidean.pdf](figures/plot_sift-128-euclidean.pdf))* |
| 85 | +_(PDF version: [figures/plot_sift-128-euclidean.pdf](figures/plot_sift-128-euclidean.pdf))_ |
64 | 86 |
|
65 | 87 | ### GloVe-100-Angular Results |
66 | 88 |
|
67 | 89 |  |
68 | | -*(PDF version: [figures/plot_glove-100-angular.pdf](figures/plot_glove-100-angular.pdf))* |
| 90 | +_(PDF version: [figures/plot_glove-100-angular.pdf](figures/plot_glove-100-angular.pdf))_ |
69 | 91 |
|
70 | 92 | ## 5. ArcadeDB Configuration |
71 | 93 |
|
72 | | -For ArcadeDB, we selected the following default configuration which offers a balanced trade-off between build time, memory usage, and search performance: |
| 94 | +For ArcadeDB, we selected the following default configuration which offers a balanced |
| 95 | +trade-off between build time, memory usage, and search performance: |
73 | 96 |
|
74 | 97 | ```python |
75 | 98 | max_connections = 32 |
76 | 99 | beam_width = 200 |
77 | 100 | overquery_factor = 16 |
78 | 101 | ``` |
79 | | -**Note on `overquery_factor`**: Unlike FAISS or standard HNSW implementations, JVector does not use an `ef` (or `efSearch`) parameter during search. Instead, we implemented an **"overquery"** mechanism. This retrieves `k * overquery_factor` candidates from the index, sorts them by exact similarity, and returns the top `k`. This allows trading off latency for higher recall. |
| 102 | + |
| 103 | +**Note on Quantization**: No quantization (PQ/SQ) was used for the ArcadeDB JVector |
| 104 | +benchmarks. Quantization support is currently a Work In Progress (WIP) in the core Java |
| 105 | +engine. |
| 106 | + |
| 107 | +**Note on `overquery_factor`**: Unlike FAISS or standard HNSW implementations, JVector |
| 108 | +does not use an `ef` (or `efSearch`) parameter during search. Instead, we implemented an |
| 109 | +**"overquery"** mechanism. This retrieves `k * overquery_factor` candidates from the |
| 110 | +index, sorts them by exact similarity, and returns the top `k`. This allows trading off |
| 111 | +latency for higher recall. |
| 112 | + |
| 113 | +**Note on Build Time (Lazy Indexing)**: JVector employs lazy indexing, meaning the |
| 114 | +initial index object creation is nearly instantaneous. To capture the true cost of |
| 115 | +building the graph, our benchmark includes a "warmup" phase that triggers the actual |
| 116 | +indexing process. The reported **Build Time** for ArcadeDB is calculated as: `Index |
| 117 | +Creation Time + Warmup Time`. |
| 118 | + |
| 119 | +**Note on Memory Usage**: The ArcadeDB benchmark was executed with a JVM heap limit of |
| 120 | +`ARCADEDB_JVM_MAX_HEAP='16g'`. However, we observed that the actual Resident Set Size |
| 121 | +(RSS) memory consumption exceeded this limit significantly, reaching as high as **41GB** |
| 122 | +in some test cases. This discrepancy suggests significant off-heap memory usage or other |
| 123 | +overheads that require further investigation in the future. |
| 124 | + |
80 | 125 | On the **GloVe-100-Angular** dataset (~1.2M vectors), this configuration achieved: |
81 | | -* **Recall@10**: 0.8538 |
82 | | -* **Average Latency**: 36ms |
83 | | -* **Build Time**: ~530 seconds |
84 | 126 |
|
85 | | -We consider this "quite decent" for a persistent, disk-based vector store compared to purely in-memory libraries. |
| 127 | +- **Recall@10**: 0.8538 |
| 128 | +- **Average Latency**: 36ms |
| 129 | +- **Build Time**: ~530 seconds |
| 130 | + |
| 131 | +We consider this "quite decent" for a persistent, disk-based vector store compared to |
| 132 | +purely in-memory libraries. |
| 133 | + |
| 134 | +## 6. Persistence & Stability Observations |
| 135 | + |
| 136 | +We explicitly tested the persistence of the vector index by closing and reopening the |
| 137 | +ArcadeDB database during the benchmark. |
| 138 | + |
| 139 | +1. **Persistence Verified**: The index correctly persists to disk. We observed that |
| 140 | + **query latency remained consistent** before and after reopening the database, |
| 141 | + confirming that the index structure is preserved and loaded efficiently without |
| 142 | + needing a rebuild. |
| 143 | +2. **Recall Stability**: |
| 144 | + - **GloVe-100-Angular**: Recall values remained identical before and after the |
| 145 | + database restart, as expected. |
| 146 | + - **SIFT-128-Euclidean**: We observed a discrepancy in recall values before and |
| 147 | + after the restart. While usually small, the difference can sometimes be as high |
| 148 | + as **0.1**. The cause of this non-determinism for the Euclidean metric is |
| 149 | + currently unknown. However, since our primary production use cases rely on |
| 150 | + Cosine similarity (Angular), we have decided to deprioritize investigating this |
| 151 | + specific issue for now. |
0 commit comments