@@ -18,7 +18,7 @@ What the tests cover:
1818- ✅ ** Overquery factor** tuning (` overquery_factor ` )
1919- ✅ ** Distance functions** (cosine default, euclidean variants)
2020- ✅ ** Persistence & size checks** (index files survive reopen)
21- - ✅ ** Distance sanity checks ** for orthogonal, parallel, opposite, and 45° vectors
21+ - ✅ ** Batch inserts ** through ` BatchContext `
2222
2323## Test Coverage (high level)
2424
@@ -31,66 +31,81 @@ What the tests cover:
3131- ` test_lsm_index_size ` – asserts index file presence/size
3232- ` test_lsm_persistence ` – reopen DB and reuse the index
3333- Distance suites – cosine/euclidean correctness for orthogonal/parallel/opposite/high-dim vectors
34+ - ` test_lsm_vector_search_comprehensive ` – end-to-end search path
3435
3536## SQL Vector Functions Tests
3637
3738SQL vector operations are tested separately in ` test_vector_sql.py ` , including vector math functions, distance calculations, aggregations, quantization (with known limitations), and SQL-based index creation and search.
3839
39- ## Common Pattern (mirrors ` test_lsm_vector_search ` )
40+ ## Common Patterns
4041
42+ ### Create JVector (LSM-backed) index
4143``` python
42- import arcadedb_embedded as arcadedb
43-
4444with arcadedb.create_database(" ./test_db" ) as db:
45+ # Schema operations are auto-transactional
4546 db.schema.create_vertex_type(" Doc" )
4647 db.schema.create_property(" Doc" , " embedding" , " ARRAY_OF_FLOATS" )
4748
48- index = db.create_vector_index(" Doc" , " embedding" , dimensions = 3 )
49-
50- with db.transaction():
51- v = db.new_vertex(" Doc" )
52- v.set(" embedding" , arcadedb.to_java_float_array([1.0 , 0.0 , 0.0 ]))
53- v.save()
54-
55- results = index.find_nearest([0.9 , 0.1 , 0.0 ], k = 1 )
56- vertex, distance = results[0 ]
57- emb = arcadedb.to_python_array(vertex.get(" embedding" ))
58- assert abs (emb[0 ] - 1.0 ) < 0.001
49+ index = db.create_vector_index(
50+ " Doc" ,
51+ " embedding" ,
52+ dimensions = 384 ,
53+ distance_function = " cosine" , # default
54+ max_connections = 32 , # graph degree
55+ beam_width = 256 # search/construction beam
56+ )
5957```
6058
61- ### Filtering with ` allowed_rids ` (from ` test_lsm_vector_search_with_filter ` )
59+ ### Search with filters and overquery factor
6260``` python
63- allowed = [ str (v.get_identity()) for v in inserted_vertices]
64- results = index.find_nearest([ 1.0 , 0.0 , 0.0 ], k = 2 , allowed_rids = allowed )
65- ```
61+ with arcadedb.create_database( " ./test_db " ) as db:
62+ db.schema.create_vertex_type( " Doc " )
63+ db.schema.create_property( " Doc " , " embedding " , " ARRAY_OF_FLOATS " )
6664
67- ### Overquery factor (from ` test_lsm_vector_search_overquery ` )
68- ``` python
69- results = index.find_nearest(query, k = 2 , overquery_factor = 2 )
65+ index = db.create_vector_index(
66+ " Doc" ,
67+ " embedding" ,
68+ dimensions = 3 ,
69+ )
70+
71+ # Insert test vertices with embeddings
72+ with db.transaction():
73+ doc1 = db.new_vertex(" Doc" , docId = 1 , embedding = [1.0 , 0.0 , 0.0 ])
74+ doc1.save()
75+ doc2 = db.new_vertex(" Doc" , docId = 2 , embedding = [0.0 , 1.0 , 0.0 ])
76+ doc2.save()
77+
78+ # Search with filters
79+ query = [1.0 , 0.0 , 0.0 ]
80+ results = index.find_nearest(
81+ query,
82+ k = 2 ,
83+ allowed_rids = [doc1.get_rid(), doc2.get_rid()],
84+ overquery_factor = 16 ,
85+ )
7086```
7187
72- ### Persistence check (from ` test_lsm_persistence ` )
88+ ### Batch insert vectors
7389``` python
74- with arcadedb.create_database(path) as db:
90+ with arcadedb.create_database(" ./test_db" ) as db:
91+ # Schema operations are auto-transactional
7592 db.schema.create_vertex_type(" Doc" )
93+ db.schema.create_property(" Doc" , " docId" , " INTEGER" )
7694 db.schema.create_property(" Doc" , " embedding" , " ARRAY_OF_FLOATS" )
77- db.create_vector_index(" Doc" , " embedding" , dimensions = 3 )
78- with db.transaction():
79- v = db.new_vertex(" Doc" )
80- v.set(" embedding" , arcadedb.to_java_float_array([1.0 , 0.0 , 0.0 ]))
81- v.save()
8295
83- with arcadedb.open_database(path) as db:
84- index = db.schema.get_vector_index(" Doc" , " embedding" )
85- assert index.get_size() == 1
96+ # Batch insert (auto-transactional)
97+ vectors = [[1.0 , 0.0 , 0.0 ], [0.0 , 1.0 , 0.0 ], [0.0 , 0.0 , 1.0 ]]
98+ with db.batch_context(batch_size = 1000 , parallel = 4 ) as batch:
99+ for i, vec in enumerate (vectors):
100+ batch.create_vertex(" Doc" , docId = i, embedding = vec)
86101```
87102
88103## Key Takeaways
89104
90- 1 . Tests exercise JVector/LSM indexes end-to-end via the Python bindings ( no hnswlib path) .
91- 2 . ` find_nearest ` returns ` (vertex, score) ` ; cosine distance follows JVector's ` (1 - cosθ)/2 ` convention .
92- 3 . ` allowed_rids ` and ` overquery_factor ` are covered for filtering and recall tuning .
93- 4 . Persistence and size checks verify indexes survive reopen and report counts correctly .
105+ 1 . JVector is fully Java-native and LSM-backed; no legacy hnswlib path remains .
106+ 2 . Use ` allowed_rids ` for pre-filtered searches and ` overquery_factor ` for recall/speed trade-offs .
107+ 3 . ` max_connections ` and ` beam_width ` map to JVector graph degree and search beam; tune per workload .
108+ 4 . All tests run through the Python bindings to ensure parity with the Java engine .
94109
95110## See Also
96111
0 commit comments