11# Vector API
22
3- Vector search capabilities in ArcadeDB use HNSW (Hierarchical Navigable Small World) indexing for fast approximate nearest neighbor search. Perfect for semantic search, recommendation systems, and similarity-based queries.
3+ Vector search capabilities in ArcadeDB use JVector (a graph-based index combining HNSW
4+ and DiskANN concepts) for fast approximate nearest neighbor search. Perfect for semantic
5+ search, recommendation systems, and similarity-based queries.
46
57## Overview
68
@@ -13,7 +15,7 @@ ArcadeDB's vector support enables:
1315
1416** Key Features:**
1517
16- - HNSW indexing for O(log N) search performance
18+ - Graph-based indexing for O(log N) search performance
1719- Multiple distance metrics (cosine, euclidean, inner product)
1820- Native NumPy integration (optional)
1921- Configurable precision/performance trade-offs
@@ -24,7 +26,8 @@ Utility functions for converting between Python and Java vector representations:
2426
2527### ` to_java_float_array(vector) `
2628
27- Convert a Python array-like object to a Java float array compatible with ArcadeDB's vector indexing.
29+ Convert a Python array-like object to a Java float array compatible with ArcadeDB's
30+ vector indexing.
2831
2932** Parameters:**
3033
@@ -98,7 +101,7 @@ print(type(py_list)) # <class 'list'>
98101
99102## VectorIndex Class
100103
101- Wrapper for ArcadeDB's HNSW vector index, providing similarity search capabilities.
104+ Wrapper for ArcadeDB's vector index, providing similarity search capabilities.
102105
103106### Creation via Database
104107
@@ -171,10 +174,13 @@ print(f"Created vector index: {index}")
171174
172175---
173176
174- ### ` VectorIndex.find_nearest(query_vector, k=10, overquery_factor=16, use_numpy=True, allowed_rids=None) `
177+ ### ` VectorIndex.find_nearest(query_vector, k=10, overquery_factor=16, allowed_rids=None) `
175178
176179Find k-nearest neighbors to the query vector.
177180
181+ ** Note:** The first call to ` find_nearest ` triggers the index construction if it hasn't
182+ been built yet. This "warm up" query may take longer than subsequent queries.
183+
178184** Parameters:**
179185
180186- ` query_vector ` : Query vector as:
@@ -184,14 +190,13 @@ Find k-nearest neighbors to the query vector.
184190- ` k ` (int): Number of neighbors to return (default: 10)
185191- ` overquery_factor ` (int): Multiplier for search-time over-querying (implicit efSearch)
186192 (default: 16)
187- - ` use_numpy ` (bool): Return vectors as NumPy if available (default: ` True ` )
188193- ` allowed_rids ` (List[ str] ): Optional list of RID strings (e.g. ` ["#1:0", "#2:5"] ` ) to
189194 restrict search (default: ` None ` )
190195
191196** Returns:**
192197
193- - ` List[Tuple[vertex , float]] ` : List of ` (vertex , distance) ` tuples
194- - ` vertex ` : Matched vertex object (MutableVertex )
198+ - ` List[Tuple[record , float]] ` : List of ` (record , distance) ` tuples
199+ - ` record ` : Matched ArcadeDB record object (Vertex, Document, or Edge )
195200 - ` distance ` : Similarity score (float)
196201 - Lower = more similar
197202 - Range depends on distance function
@@ -210,9 +215,9 @@ neighbors = index.find_nearest(query_vector, k=5)
210215allowed_rids = [" #10:5" , " #10:8" , " #10:12" ]
211216filtered_neighbors = index.find_nearest(query_vector, k = 5 , allowed_rids = allowed_rids)
212217
213- for vertex , distance in neighbors:
214- doc_id = vertex .get(" id" )
215- text = vertex .get(" text" )
218+ for record , distance in neighbors:
219+ doc_id = record .get(" id" )
220+ text = record .get(" text" )
216221 print (f " Distance: { distance:.4f } | ID: { doc_id} " )
217222 print (f " Text: { text[:100 ]} ... " )
218223```
@@ -225,76 +230,12 @@ for vertex, distance in neighbors:
225230| euclidean | [ 0, ∞) | ✓ (0 = identical) |
226231| inner_product | (-∞, ∞) | ✗ (higher = more similar) |
227232
228- ---
229-
230- ### ` VectorIndex.add_vertex(vertex) `
231-
232- Add a single vertex to the index.
233-
234- ** Parameters:**
235-
236- - ` vertex ` : Vertex object with vector property set
237-
238- ** Raises:**
239-
240- - ` ArcadeDBError ` : If vertex cannot be added
241-
242- ** Example:**
243-
244- ``` python
245- # Add during vertex creation
246- with db.transaction():
247- doc = db.new_vertex(" Document" )
248- doc.set(" id" , " doc_001" )
249- doc.set(" text" , " Introduction to vector search" )
250- doc.set(" embedding" , to_java_float_array(embedding))
251- doc.save()
252-
253- # Add to index
254- index.add_vertex(doc)
255- ```
256-
257- ** Important:**
258-
259233- Vertex must have the vector property populated
260234- Vector dimensionality must match index dimensions
261235- Call within a transaction for consistency
262236
263237---
264238
265- ### ` VectorIndex.remove_vertex(vertex_id) `
266-
267- Remove a vertex from the index.
268-
269- ** Parameters:**
270-
271- - ` vertex_id ` : ID of the vertex to remove (typically string or int)
272-
273- ** Raises:**
274-
275- - ` ArcadeDBError ` : If removal fails
276-
277- ** Example:**
278-
279- ``` python
280- # Remove by ID
281- vertex_id = " doc_001"
282- index.remove_vertex(vertex_id)
283- ```
284-
285- ** Note:** This removes from the vector index only, not from the database. To fully delete:
286-
287- ``` python
288- with db.transaction():
289- # Remove from index
290- index.remove_vertex(doc_id)
291-
292- # Delete from database
293- db.command(" sql" , f " DELETE FROM Document WHERE id = ' { doc_id} ' " )
294- ```
295-
296- ---
297-
298239## Complete Examples
299240
300241### Semantic Search with Sentence Transformers
@@ -355,9 +296,6 @@ with db.transaction():
355296 vertex.set(" embedding" , to_java_float_array(embedding))
356297 vertex.save()
357298
358- # Add to vector index
359- index.add_vertex(vertex)
360-
361299print (f " Indexed { len (documents)} documents " )
362300
363301# Search
@@ -424,7 +362,7 @@ with db.transaction():
424362 v.set(" price" , prod[" price" ])
425363 v.set(" features" , to_java_float_array(prod[" features" ]))
426364 v.save()
427- index.add_vertex(v)
365+ # Note: LSM vector index automatically indexes new records
428366
429367# Hybrid search: vector similarity + filters
430368query_features = np.random.rand(128 )
@@ -502,7 +440,7 @@ with db.transaction():
502440 v.set(" embedding" , to_java_float_array(embedding))
503441 v.save()
504442
505- index.add_vertex(v)
443+ # Note: LSM vector index automatically indexes new records
506444
507445# Search for similar images
508446query_image = " query.jpg"
@@ -521,25 +459,25 @@ db.close()
521459
522460## Performance Tuning
523461
524- ### HNSW Parameters
462+ ### Vector Index Parameters
525463
526- ** M (connections per node):**
464+ ** max_connections (connections per node):**
527465
528- - ** Lower (8-12 )** : Faster build, less memory, lower recall
529- - ** Medium (16-24 )** : Balanced (recommended)
530- - ** Higher (32-48 )** : Better recall, more memory, slower build
466+ - ** Lower (<32 )** : Faster build, less memory, lower recall
467+ - ** Medium (32 )** : Balanced (recommended)
468+ - ** Higher (>32 )** : Better recall, more memory, slower build
531469
532- ** ef (search size):**
470+ ** overquery_factor (search size):**
533471
534- - ** Lower (50-100 )** : Faster search, lower recall
535- - ** Medium (128-200 )** : Balanced (recommended)
536- - ** Higher (200-400 )** : Better recall, slower search
472+ - ** Lower (<16 )** : Faster search, lower recall
473+ - ** Medium (16 )** : Balanced (recommended)
474+ - ** Higher (>16 )** : Better recall, slower search
537475
538- ** ef_construction :**
476+ ** beam_width :**
539477
540- - ** Lower (100-150 )** : Faster build, lower quality
541- - ** Medium (128- 256)** : Balanced
542- - ** Higher (300-500 )** : Better quality, slower build
478+ - ** Lower (<256 )** : Faster build, lower quality
479+ - ** Medium (256)** : Balanced
480+ - ** Higher (>256 )** : Better quality, slower build
543481
544482### Distance Functions
545483
592530 v = db.new_vertex(" Doc" )
593531 v.set(" emb" , to_java_float_array(np.random.rand(512 ))) # Wrong size!
594532 v.save()
595- index.add_vertex(v) # Will fail
533+ # Indexing happens automatically and may fail asynchronously or on next access
596534
597535except ArcadeDBError as e:
598536 print (f " Error: { e} " )
604542 v.set(" id" , " doc1" )
605543 # Forgot to set embedding!
606544 v.save()
607- index.add_vertex(v) # Will fail
545+ # Indexing happens automatically
608546
609547except ArcadeDBError as e:
610548 print (f " Error: { e} " )
0 commit comments