Skip to content

Commit b127da2

Browse files
committed
Refactor schema and vector tests for clarity and enhanced coverage
1 parent 5474474 commit b127da2

2 files changed

Lines changed: 88 additions & 49 deletions

File tree

bindings/python/docs/development/testing/test-schema.md

Lines changed: 37 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,26 +2,50 @@
22

33
[View source code](https://github.com/humemai/arcadedb-embedded-python/blob/main/bindings/python/tests/test_schema.py){ .md-button }
44

5-
These notes mirror the Python tests in [test_schema.py](https://github.com/humemai/arcadedb-embedded-python/blob/main/bindings/python/tests/test_schema.py). The file has multiple test classes covering type creation (document, vertex, edge), properties (simple and complex types), indexes (unique, composite, full-text, HNSW), and error handling. See [test_schema.py](https://github.com/humemai/arcadedb-embedded-python/blob/main/bindings/python/tests/test_schema.py) for comprehensive schema API validation.
5+
## Overview
66

7-
## Key Methods Tested
7+
Schema tests cover:
88

9-
- `schema.create_document_type()`, `create_vertex_type()`, `create_edge_type()`
10-
- `schema.create_property()`, `create_property_with_type()`
11-
- `schema.create_index()`, `create_unique_index()`, `create_hnsw_index()`, `create_fulltext_index()`
12-
- `schema.exists_type()`, `get_type()`, `list_types()`
13-
- `schema.drop_type()`, `drop_property()`, `drop_index()`
9+
-**Type Creation** - Vertex, edge, and document types
10+
-**Type Queries** - Getting types and checking existence
11+
-**Type Deletion** - Removing types from schema
12+
-**Property Creation** - Adding properties to types
13+
-**Property Deletion** - Removing properties
14+
-**Index Creation** - LSM_TREE, HNSW, FULL_TEXT indexes
15+
-**Index Queries** - Getting and listing indexes
16+
-**Index Deletion** - Removing indexes
17+
-**Property Types** - All ArcadeDB property types
18+
-**Vector Indexes** - HNSW (JVector) configuration and operations
1419

15-
## Patterns
20+
## Test Classes
1621

22+
### TestTypeCreation
23+
Tests creating vertex, edge, and document types.
24+
25+
**Tests:**
26+
- `test_create_vertex_type()` - Basic vertex type creation
27+
- `test_create_edge_type()` - Basic edge type creation
28+
- `test_create_document_type()` - Basic document type creation
29+
- `test_create_type_with_buckets()` - Custom bucket count
30+
31+
**Pattern:**
1732
```python
18-
db.schema.create_vertex_type("User")
19-
db.schema.create_property("User", "name", "STRING")
20-
db.schema.create_unique_index("User", ["userId"])
33+
with arcadedb.create_database("./test_db") as db:
34+
# Basic type
35+
db.schema.create_vertex_type("User")
2136

22-
db.schema.exists_type("User") # True
23-
db.schema.get_type("User") # Type object
37+
# With buckets
38+
db.schema.create_vertex_type("Product", buckets=10)
2439
```
40+
41+
---
42+
43+
### TestTypeQueries
44+
Tests querying schema for types.
45+
46+
**Tests:**
47+
- `test_get_type()` - Get type by name
48+
- `test_exists_type()` - Check if type exists
2549
- `test_get_types()` - List all types
2650
- `test_get_type_properties()` - List type properties
2751

bindings/python/docs/development/testing/test-vector.md

Lines changed: 51 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ What the tests cover:
1818
-**Overquery factor** tuning (`overquery_factor`)
1919
-**Distance functions** (cosine default, euclidean variants)
2020
-**Persistence & size checks** (index files survive reopen)
21-
-**Distance sanity checks** for orthogonal, parallel, opposite, and 45° vectors
21+
-**Batch inserts** through `BatchContext`
2222

2323
## Test Coverage (high level)
2424

@@ -31,66 +31,81 @@ What the tests cover:
3131
- `test_lsm_index_size` – asserts index file presence/size
3232
- `test_lsm_persistence` – reopen DB and reuse the index
3333
- Distance suites – cosine/euclidean correctness for orthogonal/parallel/opposite/high-dim vectors
34+
- `test_lsm_vector_search_comprehensive` – end-to-end search path
3435

3536
## SQL Vector Functions Tests
3637

3738
SQL vector operations are tested separately in `test_vector_sql.py`, including vector math functions, distance calculations, aggregations, quantization (with known limitations), and SQL-based index creation and search.
3839

39-
## Common Pattern (mirrors `test_lsm_vector_search`)
40+
## Common Patterns
4041

42+
### Create JVector (LSM-backed) index
4143
```python
42-
import arcadedb_embedded as arcadedb
43-
4444
with arcadedb.create_database("./test_db") as db:
45+
# Schema operations are auto-transactional
4546
db.schema.create_vertex_type("Doc")
4647
db.schema.create_property("Doc", "embedding", "ARRAY_OF_FLOATS")
4748

48-
index = db.create_vector_index("Doc", "embedding", dimensions=3)
49-
50-
with db.transaction():
51-
v = db.new_vertex("Doc")
52-
v.set("embedding", arcadedb.to_java_float_array([1.0, 0.0, 0.0]))
53-
v.save()
54-
55-
results = index.find_nearest([0.9, 0.1, 0.0], k=1)
56-
vertex, distance = results[0]
57-
emb = arcadedb.to_python_array(vertex.get("embedding"))
58-
assert abs(emb[0] - 1.0) < 0.001
49+
index = db.create_vector_index(
50+
"Doc",
51+
"embedding",
52+
dimensions=384,
53+
distance_function="cosine", # default
54+
max_connections=32, # graph degree
55+
beam_width=256 # search/construction beam
56+
)
5957
```
6058

61-
### Filtering with `allowed_rids` (from `test_lsm_vector_search_with_filter`)
59+
### Search with filters and overquery factor
6260
```python
63-
allowed = [str(v.get_identity()) for v in inserted_vertices]
64-
results = index.find_nearest([1.0, 0.0, 0.0], k=2, allowed_rids=allowed)
65-
```
61+
with arcadedb.create_database("./test_db") as db:
62+
db.schema.create_vertex_type("Doc")
63+
db.schema.create_property("Doc", "embedding", "ARRAY_OF_FLOATS")
6664

67-
### Overquery factor (from `test_lsm_vector_search_overquery`)
68-
```python
69-
results = index.find_nearest(query, k=2, overquery_factor=2)
65+
index = db.create_vector_index(
66+
"Doc",
67+
"embedding",
68+
dimensions=3,
69+
)
70+
71+
# Insert test vertices with embeddings
72+
with db.transaction():
73+
doc1 = db.new_vertex("Doc", docId=1, embedding=[1.0, 0.0, 0.0])
74+
doc1.save()
75+
doc2 = db.new_vertex("Doc", docId=2, embedding=[0.0, 1.0, 0.0])
76+
doc2.save()
77+
78+
# Search with filters
79+
query = [1.0, 0.0, 0.0]
80+
results = index.find_nearest(
81+
query,
82+
k=2,
83+
allowed_rids=[doc1.get_rid(), doc2.get_rid()],
84+
overquery_factor=16,
85+
)
7086
```
7187

72-
### Persistence check (from `test_lsm_persistence`)
88+
### Batch insert vectors
7389
```python
74-
with arcadedb.create_database(path) as db:
90+
with arcadedb.create_database("./test_db") as db:
91+
# Schema operations are auto-transactional
7592
db.schema.create_vertex_type("Doc")
93+
db.schema.create_property("Doc", "docId", "INTEGER")
7694
db.schema.create_property("Doc", "embedding", "ARRAY_OF_FLOATS")
77-
db.create_vector_index("Doc", "embedding", dimensions=3)
78-
with db.transaction():
79-
v = db.new_vertex("Doc")
80-
v.set("embedding", arcadedb.to_java_float_array([1.0, 0.0, 0.0]))
81-
v.save()
8295

83-
with arcadedb.open_database(path) as db:
84-
index = db.schema.get_vector_index("Doc", "embedding")
85-
assert index.get_size() == 1
96+
# Batch insert (auto-transactional)
97+
vectors = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]
98+
with db.batch_context(batch_size=1000, parallel=4) as batch:
99+
for i, vec in enumerate(vectors):
100+
batch.create_vertex("Doc", docId=i, embedding=vec)
86101
```
87102

88103
## Key Takeaways
89104

90-
1. Tests exercise JVector/LSM indexes end-to-end via the Python bindings (no hnswlib path).
91-
2. `find_nearest` returns `(vertex, score)`; cosine distance follows JVector's `(1 - cosθ)/2` convention.
92-
3. `allowed_rids` and `overquery_factor` are covered for filtering and recall tuning.
93-
4. Persistence and size checks verify indexes survive reopen and report counts correctly.
105+
1. JVector is fully Java-native and LSM-backed; no legacy hnswlib path remains.
106+
2. Use `allowed_rids` for pre-filtered searches and `overquery_factor` for recall/speed trade-offs.
107+
3. `max_connections` and `beam_width` map to JVector graph degree and search beam; tune per workload.
108+
4. All tests run through the Python bindings to ensure parity with the Java engine.
94109

95110
## See Also
96111

0 commit comments

Comments
 (0)