Refactor schema and vector tests for clarity and enhanced coverage

tae898 · tae898 · commit b127da2b00a6 · 2026-01-08T22:19:26.000+01:00
diff --git a/bindings/python/docs/development/testing/test-schema.md b/bindings/python/docs/development/testing/test-schema.md
@@ -2,26 +2,50 @@
 
 [View source code](https://github.com/humemai/arcadedb-embedded-python/blob/main/bindings/python/tests/test_schema.py){ .md-button }
 
-These notes mirror the Python tests in [test_schema.py](https://github.com/humemai/arcadedb-embedded-python/blob/main/bindings/python/tests/test_schema.py). The file has multiple test classes covering type creation (document, vertex, edge), properties (simple and complex types), indexes (unique, composite, full-text, HNSW), and error handling. See [test_schema.py](https://github.com/humemai/arcadedb-embedded-python/blob/main/bindings/python/tests/test_schema.py) for comprehensive schema API validation.
+## Overview
 
-## Key Methods Tested
+Schema tests cover:
 
-- `schema.create_document_type()`, `create_vertex_type()`, `create_edge_type()`
-- `schema.create_property()`, `create_property_with_type()`
-- `schema.create_index()`, `create_unique_index()`, `create_hnsw_index()`, `create_fulltext_index()`
-- `schema.exists_type()`, `get_type()`, `list_types()`
-- `schema.drop_type()`, `drop_property()`, `drop_index()`
+- ✅ **Type Creation** - Vertex, edge, and document types
+- ✅ **Type Queries** - Getting types and checking existence
+- ✅ **Type Deletion** - Removing types from schema
+- ✅ **Property Creation** - Adding properties to types
+- ✅ **Property Deletion** - Removing properties
+- ✅ **Index Creation** - LSM_TREE, HNSW, FULL_TEXT indexes
+- ✅ **Index Queries** - Getting and listing indexes
+- ✅ **Index Deletion** - Removing indexes
+- ✅ **Property Types** - All ArcadeDB property types
+- ✅ **Vector Indexes** - HNSW (JVector) configuration and operations
 
-## Patterns
+## Test Classes
 
+### TestTypeCreation
+Tests creating vertex, edge, and document types.
+
+**Tests:**
+- `test_create_vertex_type()` - Basic vertex type creation
+- `test_create_edge_type()` - Basic edge type creation
+- `test_create_document_type()` - Basic document type creation
+- `test_create_type_with_buckets()` - Custom bucket count
+
+**Pattern:**
 ```python
-db.schema.create_vertex_type("User")
-db.schema.create_property("User", "name", "STRING")
-db.schema.create_unique_index("User", ["userId"])
+with arcadedb.create_database("./test_db") as db:
+    # Basic type
+    db.schema.create_vertex_type("User")
 
-db.schema.exists_type("User")  # True
-db.schema.get_type("User")     # Type object
+    # With buckets
+    db.schema.create_vertex_type("Product", buckets=10)
 ```
+
+---
+
+### TestTypeQueries
+Tests querying schema for types.
+
+**Tests:**
+- `test_get_type()` - Get type by name
+- `test_exists_type()` - Check if type exists
 - `test_get_types()` - List all types
 - `test_get_type_properties()` - List type properties
 
diff --git a/bindings/python/docs/development/testing/test-vector.md b/bindings/python/docs/development/testing/test-vector.md
@@ -18,7 +18,7 @@ What the tests cover:
 - ✅ **Overquery factor** tuning (`overquery_factor`)
 - ✅ **Distance functions** (cosine default, euclidean variants)
 - ✅ **Persistence & size checks** (index files survive reopen)
-- ✅ **Distance sanity checks** for orthogonal, parallel, opposite, and 45° vectors
+- ✅ **Batch inserts** through `BatchContext`
 
 ## Test Coverage (high level)
 
@@ -31,66 +31,81 @@ What the tests cover:
 - `test_lsm_index_size` – asserts index file presence/size
 - `test_lsm_persistence` – reopen DB and reuse the index
 - Distance suites – cosine/euclidean correctness for orthogonal/parallel/opposite/high-dim vectors
+- `test_lsm_vector_search_comprehensive` – end-to-end search path
 
 ## SQL Vector Functions Tests
 
 SQL vector operations are tested separately in `test_vector_sql.py`, including vector math functions, distance calculations, aggregations, quantization (with known limitations), and SQL-based index creation and search.
 
-## Common Pattern (mirrors `test_lsm_vector_search`)
+## Common Patterns
 
+### Create JVector (LSM-backed) index
 ```python
-import arcadedb_embedded as arcadedb
-
 with arcadedb.create_database("./test_db") as db:
+    # Schema operations are auto-transactional
     db.schema.create_vertex_type("Doc")
     db.schema.create_property("Doc", "embedding", "ARRAY_OF_FLOATS")
 
-    index = db.create_vector_index("Doc", "embedding", dimensions=3)
-
-    with db.transaction():
-        v = db.new_vertex("Doc")
-        v.set("embedding", arcadedb.to_java_float_array([1.0, 0.0, 0.0]))
-        v.save()
-
-    results = index.find_nearest([0.9, 0.1, 0.0], k=1)
-    vertex, distance = results[0]
-    emb = arcadedb.to_python_array(vertex.get("embedding"))
-    assert abs(emb[0] - 1.0) < 0.001
+    index = db.create_vector_index(
+        "Doc",
+        "embedding",
+        dimensions=384,
+        distance_function="cosine",   # default
+        max_connections=32,            # graph degree
+        beam_width=256                 # search/construction beam
+    )
 ```
 
-### Filtering with `allowed_rids` (from `test_lsm_vector_search_with_filter`)
+### Search with filters and overquery factor
 ```python
-allowed = [str(v.get_identity()) for v in inserted_vertices]
-results = index.find_nearest([1.0, 0.0, 0.0], k=2, allowed_rids=allowed)
-```
+with arcadedb.create_database("./test_db") as db:
+    db.schema.create_vertex_type("Doc")
+    db.schema.create_property("Doc", "embedding", "ARRAY_OF_FLOATS")
 
-### Overquery factor (from `test_lsm_vector_search_overquery`)
-```python
-results = index.find_nearest(query, k=2, overquery_factor=2)
+    index = db.create_vector_index(
+        "Doc",
+        "embedding",
+        dimensions=3,
+    )
+
+    # Insert test vertices with embeddings
+    with db.transaction():
+        doc1 = db.new_vertex("Doc", docId=1, embedding=[1.0, 0.0, 0.0])
+        doc1.save()
+        doc2 = db.new_vertex("Doc", docId=2, embedding=[0.0, 1.0, 0.0])
+        doc2.save()
+
+    # Search with filters
+    query = [1.0, 0.0, 0.0]
+    results = index.find_nearest(
+        query,
+        k=2,
+        allowed_rids=[doc1.get_rid(), doc2.get_rid()],
+        overquery_factor=16,
+    )
 ```
 
-### Persistence check (from `test_lsm_persistence`)
+### Batch insert vectors
 ```python
-with arcadedb.create_database(path) as db:
+with arcadedb.create_database("./test_db") as db:
+    # Schema operations are auto-transactional
     db.schema.create_vertex_type("Doc")
+    db.schema.create_property("Doc", "docId", "INTEGER")
     db.schema.create_property("Doc", "embedding", "ARRAY_OF_FLOATS")
-    db.create_vector_index("Doc", "embedding", dimensions=3)
-    with db.transaction():
-        v = db.new_vertex("Doc")
-        v.set("embedding", arcadedb.to_java_float_array([1.0, 0.0, 0.0]))
-        v.save()
 
-with arcadedb.open_database(path) as db:
-    index = db.schema.get_vector_index("Doc", "embedding")
-    assert index.get_size() == 1
+    # Batch insert (auto-transactional)
+    vectors = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]
+    with db.batch_context(batch_size=1000, parallel=4) as batch:
+        for i, vec in enumerate(vectors):
+            batch.create_vertex("Doc", docId=i, embedding=vec)
 ```
 
 ## Key Takeaways
 
-1. Tests exercise JVector/LSM indexes end-to-end via the Python bindings (no hnswlib path).
-2. `find_nearest` returns `(vertex, score)`; cosine distance follows JVector's `(1 - cosθ)/2` convention.
-3. `allowed_rids` and `overquery_factor` are covered for filtering and recall tuning.
-4. Persistence and size checks verify indexes survive reopen and report counts correctly.
+1. JVector is fully Java-native and LSM-backed; no legacy hnswlib path remains.
+2. Use `allowed_rids` for pre-filtered searches and `overquery_factor` for recall/speed trade-offs.
+3. `max_connections` and `beam_width` map to JVector graph degree and search beam; tune per workload.
+4. All tests run through the Python bindings to ensure parity with the Java engine.
 
 ## See Also