Skip to content

Commit 629dd75

Browse files
committed
docs: Expand README with detailed benchmarking results for Product Quantization (PQ) support in JVector index
1 parent 831766a commit 629dd75

1 file changed

Lines changed: 96 additions & 0 deletions

File tree

  • bindings/python/examples/benchmark-vector

bindings/python/examples/benchmark-vector/README.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,102 @@
1111
- Take the duration with a grain of salt, since there are other processes running on the machine. RSS and DB size are more stable. 4 threads were allocated per task, but there aren't always the same number of tasks runing in parallel, so effective CPU usage may vary.
1212
- If not mentioned, `MAX_CONNECTIONS` is fixed as 12, `BEAM_WIDTHS` as 64, and `OVERQUERY_FACTORS` as 1
1313

14+
### Commit/Date: main @ 6ef8858 (Thu Jan 15 16:40:51 2026 -0500)
15+
16+
- This commit adds Product Quantization (PQ) support to JVector index.
17+
- The below four tasks vary quantization mode (`NONE`, `INT8`, `PRODUCT`, `BINARY`).
18+
- `store_vectors_in_graph=False`, `add_hierarchy=True`, `max_connections=12`, `beam_width=64`, `overquery_factor=1`, `batch_size=10000` for all four runs.
19+
- This time every task was allocated with one thread.
20+
- As for PQ, there are four params we can tune, and we just used the defaults in ArcadeDB:
21+
- `pq_subspaces`
22+
- `pq_clusters`
23+
- `pq_center_globally`
24+
- `pq_training_limit`
25+
26+
#### MSMARCO-1M (1000 queries, Recall@50)
27+
28+
| quantization | ingest_s | ingest_rss_mb | create_index_s | create_index_rss_mb | build_graph_now_s | build_graph_now_rss_mb | search_s | search_rss_mb | recall@50_before_close | open_db_s | open_db_rss_mb | warmup_after_reopen_s | warmup_after_reopen_rss_mb | search_after_reopen_s | search_after_reopen_rss_mb | recall@50_after_reopen | peak_rss_mb | db_size_mb | total_duration |
29+
| :----------- | -------: | ------------: | -------------: | ------------------: | ----------------: | ---------------------: | -------: | ------------: | ---------------------: | --------: | -------------: | --------------------: | -------------------------: | --------------------: | -------------------------: | ---------------------: | ----------: | ---------: | :------------- |
30+
| NONE | 60.599 | 7518.07 | 13.073 | 552.203 | 4859.9 | 313.852 | 11.132 | 6.773 | 0.9099 | 4.066 | 5.902 | 8.522 | 0.125 | 7.676 | 4.117 | 0.9099 | 8816.13 | 5750.44 | 1h 22m |
31+
| INT8 | 60.978 | 7405.65 | 18.888 | 877.16 | 3099.27 | 124.875 | 18.011 | 16.293 | 0.9051 | 11.855 | 13.812 | 4.302 | 0.488 | 15.587 | 7.176 | 0.9039 | 8855.47 | 6738.94 | 53m |
32+
| PRODUCT | 61.418 | 7978.64 | 13.127 | 211.594 | 5024.72 | 241.484 | 2.571 | 8.305 | 0.8524 | 2.645 | 4.219 | 7.266 | 0.113 | 1.401 | 8.926 | 0.8525 | 8862.87 | 5996.45 | 1h 25m |
33+
| BINARY | 60.754 | 7734.86 | 25.235 | 452.316 | 3841.34 | 197.668 | 11.857 | 20.723 | 0.2861 | 5.973 | 14.68 | 4.298 | 1.539 | 5.865 | 5.824 | 0.2861 | 8837.07 | 5879.94 | 1h 5m |
34+
35+
#### Disk Usage Breakdown
36+
37+
##### `quantization=none`
38+
39+
```bash
40+
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=10000_seed=42/VectorData_0.1.65536.v0.bucket
41+
10M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=10000_seed=42/VectorData_0_2748779662794320.4.262144.v0.lsmvecidx
42+
59M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=10000_seed=42/VectorData_0_2748779662794320_vecgraph.5.262144.v0.vecgraph
43+
```
44+
45+
##### `quantization=int8`
46+
47+
```bash
48+
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=10000_seed=42/VectorData_0.1.65536.v0.bucket
49+
999M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=10000_seed=42/VectorData_0_2748780028226180.4.262144.v0.lsmvecidx
50+
59M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=10000_seed=42/VectorData_0_2748780028226180_vecgraph.5.262144.v0.vecgraph
51+
```
52+
53+
54+
55+
##### `quantization=PQ`
56+
57+
```bash
58+
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=product_store=off_hier=on_batch=10000_seed=42/VectorData_0.1.65536.v0.bucket
59+
11M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=product_store=off_hier=on_batch=10000_seed=42/VectorData_0_2748780503246723.4.262144.v0.lsmvecidx
60+
247M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=product_store=off_hier=on_batch=10000_seed=42/VectorData_0_2748780503246723.4.262144.v0.lsmvecidx.vecpq
61+
59M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=product_store=off_hier=on_batch=10000_seed=42/VectorData_0_2748780503246723_vecgraph.5.262144.v0.vecgraph
62+
```
63+
64+
##### `quantization=binary`
65+
66+
```bash
67+
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=binary_store=off_hier=on_batch=10000_seed=42/VectorData_0.1.65536.v0.bucket
68+
140M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=binary_store=off_hier=on_batch=10000_seed=42/VectorData_0_2748779801023527.4.262144.v0.lsmvecidx
69+
59M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=binary_store=off_hier=on_batch=10000_seed=42/VectorData_0_2748779801023527_vecgraph.5.262144.v0.vecgraph
70+
```
71+
72+
#### Findings
73+
74+
- **Recall:** `NONE` and `INT8` stay high (~0.91). `PRODUCT` drops (~0.85) under these PQ defaults (M/K not tuned). `BINARY` is much worse (~0.29).
75+
- **Search speed:** PQ is fastest in-query (≈2.6s vs 7–18s) once built, but recall loss is noticeable. `NONE` and `INT8` are slower than PQ but similar to each other.
76+
- **Index/graph build:** `INT8` builds the graph much faster (~3.1k s) than `NONE`/`PRODUCT` (~5k s). PQ build adds PQ codebook/encoding time but was still slower overall than `INT8`.
77+
- **Memory (RSS):** Peaks are all high (~8.8 GB) and similar across modes; quantization didn’t reduce peak RSS in this run.
78+
- **Disk usage:**
79+
- Bucket (f32) stays ~5.6 GB for all modes.
80+
- `NONE` index is tiny (~10–11 MB).
81+
- `INT8` index is large (~999 MB).
82+
- `PQ` adds a `.vecpq` file (~247 MB) plus small index (~11 MB).
83+
- `BINARY` index is moderate (~140 MB).
84+
- **Reopen:** Recall and timings after reopen track pre-close numbers; PQ remains lower recall, `NONE`/`INT8` remain high.
85+
- **Recommendation:** For quality, prefer `NONE` or `INT8`; use PQ only if you need the lowest query latency and can accept lower recall, and consider tuning PQ (M/K) to recover recall. Avoid `BINARY` here given the large recall drop.
86+
- All four of them saved the vectors in the db like `db.schema.get_or_create_property("VectorData", "vector", "ARRAY_OF_FLOATS")`. Maybe we should do this differently when quantization is enabled?
87+
88+
### Commit/Date: main @ 91a86e3 (Thu Jan 15 10:32:50 2026 -0500)
89+
90+
- Now we have `build_graph_now`, with which we can build the graph index without warmup. This is cleaner than relying on the first search to trigger the build.
91+
- `store_vectors_in_graph` is set to False for all four below tasks. Also `add_hierarchy` is set to True for all four tasks.
92+
- We also varied `batch_size`, which adds vectors in chunks to the database.
93+
94+
#### MSMARCO-1M (1000 queries, Recall@50)
95+
96+
| quantization | batch_size | load_corpus_s | load_corpus_rss_mb | ingest_rss_mb | create_index_s | create_index_rss_mb | build_graph_now_s | build_graph_now_rss_mb | warmup_s | warmup_rss_mb | search_s | search_rss_mb | recall@50_before_close | open_db_s | open_db_rss_mb | warmup_after_reopen_s | warmup_after_reopen_rss_mb | search_after_reopen_s | search_after_reopen_rss_mb | recall@50_after_reopen | peak_rss_mb | db_size_mb | total_duration |
97+
| :----------- | ---------: | ------------: | -----------------: | ------------: | -------------: | ------------------: | ----------------: | ---------------------: | -------: | ------------: | -------: | ------------: | ---------------------: | --------: | -------------: | --------------------: | -------------------------: | --------------------: | -------------------------: | ---------------------: | ----------: | ---------: | :------------- |
98+
| NONE | 100000 | 0 | 0 | 8617.99 | 16.339 | 22 | 4833.32 | 182.383 | 0.045 | 0.605 | 12.037 | 12.238 | 0.9117 | 3.466 | 13.805 | 10.615 | 0.145 | 8.177 | 8.902 | 0.9117 | 9335.16 | 5750.44 | 1h 22m |
99+
| NONE | 10000 | 0 | 0 | 7476.79 | 16.505 | 553.051 | 4861.65 | 293.746 | 0.04 | 0.766 | 9.038 | 8.281 | 0.9076 | 2.305 | 6.645 | 7.153 | 2.512 | 6.775 | 18.473 | 0.9076 | 8839.89 | 5750.44 | 1h 22m |
100+
| INT8 | 100000 | 0 | 0 | 8583.21 | 21.401 | 74.676 | 2665.78 | 85.578 | 0.111 | 3.875 | 18.321 | 19.668 | 0.9015 | 11.946 | -0.152 | 4.002 | 4.363 | 15.725 | 3.137 | 0.9015 | 9252.09 | 6738.94 | 46m |
101+
| INT8 | 10000 | 0 | 0 | 7554.18 | 22.836 | 138.258 | 2756.29 | 528.305 | 0.17 | 1.504 | 16.344 | 16.941 | 0.914 | 11.713 | 21.402 | 4.134 | 2.91 | 7.601 | 11.586 | 0.9125 | 8763.66 | 6738.94 | 48m |
102+
103+
#### Findings
104+
105+
- Recall: NONE vs INT8 are within noise (~0.90–0.91), so quantization didn’t change quality much.
106+
- Memory (RSS): Peak RSS tracks batch size (100k lower than 10k); switching NONE↔INT8 didn’t materially reduce peak RSS.
107+
- Disk: INT8 still inflates DB size (e.g., ~6.7GB vs ~5.7GB for NONE; graph/index structures add ~1GB).
108+
- Performance: INT8 builds the graph much faster (minutes vs ~1h+), but costs more on search/open (higher warmup/search times after build/reopen).
109+
14110
### Commit/Date: main @ da5e70d (Thu Jan 15 09:44:44 2026 -0500)
15111

16112
- This commit fixes the store_vectors_in_graph issue.

0 commit comments

Comments
 (0)