Skip to content

Commit af3cac7

Browse files
committed
Enhance documentation and examples for HNSW (JVector) indexing
- Updated references from JVector to HNSW (JVector) across multiple documentation files, including testing guides, examples, and index descriptions. - Clarified the use of HNSW in vector search examples and improved explanations of indexing performance. - Adjusted code snippets to reflect best practices for using HNSW (JVector) in various contexts, including schema creation and data import. - Removed deprecated JSON import examples and streamlined CSV import documentation. - Ensured consistency in terminology and improved clarity in descriptions of vector search capabilities.
1 parent 2b90254 commit af3cac7

37 files changed

Lines changed: 420 additions & 718 deletions

bindings/python/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ with arcadedb.create_database("/tmp/mydb") as db:
7575
- 🔍 **Multiple query languages**: SQL, Cypher, Gremlin, MongoDB
7676
-**High performance**: Direct JVM integration via JPype
7777
- 🔒 **ACID transactions**: Full transaction support
78-
- 🎯 **Vector storage**: Store and query vector embeddings with JVector indexing
78+
- 🎯 **Vector storage**: Store and query vector embeddings with HNSW (JVector) indexing
7979
- 📥 **Data import**: Built-in CSV and ArcadeDB JSONL import
8080

8181
---
@@ -208,7 +208,7 @@ arcadedb_embedded/
208208
├── results.py # ResultSet and Result wrappers
209209
├── transactions.py # TransactionContext manager
210210
├── schema.py # Schema management API
211-
├── vector.py # Vector search and JVector indexing
211+
├── vector.py # Vector search and HNSW (JVector) indexing
212212
├── importer.py # Data import (CSV, JSONL)
213213
├── exporter.py # Data export (JSONL, GraphML, GraphSON, CSV)
214214
├── batch.py # Batch operations context

bindings/python/docs/api/async_executor.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,18 @@
22

33
The AsyncExecutor provides low-level async operations for parallel processing, automatic batching, and optimized WAL operations. It offers 3-5x faster bulk inserts compared to sequential operations.
44

5+
!!! tip "Using Context Managers"
6+
For automatic resource cleanup, prefer using context managers:
7+
```python
8+
with arcadedb.create_database("./mydb") as db:
9+
async_exec = db.async_executor()
10+
async_exec.set_parallel_level(8)
11+
# Use for bulk operations...
12+
async_exec.wait_completion()
13+
# Database automatically closed
14+
```
15+
Examples below show explicit `db.close()` for clarity, but context managers are recommended in production.
16+
517
## Overview
618

719
The `AsyncExecutor` class enables:

bindings/python/docs/api/database.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -431,18 +431,18 @@ db.create_vector_index(
431431
) -> VectorIndex
432432
```
433433

434-
Create a vector index for similarity search (default JVector implementation).
434+
Create a vector index for similarity search (default HNSW (JVector) implementation).
435435

436436
**Note:** The index is built lazily. Construction happens upon the first query, not at creation time.
437437

438-
**Parameters:**
438+
**Parameters:**`
439439

440440
- `vertex_type` (str): Vertex type containing vectors
441441
- `vector_property` (str): Property storing vector arrays
442442
- `dimensions` (int): Vector dimensionality
443443
- `distance_function` (str): `"cosine"`, `"euclidean"`, or `"inner_product"`
444-
- `max_connections` (int): Max connections per node (default: 32). Maps to `maxConnections` in JVector.
445-
- `beam_width` (int): Beam width for search/construction (default: 256). Maps to `beamWidth` in JVector.
444+
- `max_connections` (int): Max connections per node (default: 32). Maps to `maxConnections` in HNSW (JVector).
445+
- `beam_width` (int): Beam width for search/construction (default: 256). Maps to `beamWidth` in HNSW (JVector).
446446

447447
**Returns:**
448448

@@ -751,6 +751,6 @@ else:
751751
## See Also
752752

753753
- [Graph Operations](../guide/graphs.md): Working with vertices and edges
754-
- [Vector Search](../guide/vectors.md): Similarity search with JVector indexes
754+
- [Vector Search](../guide/vectors.md): Similarity search with HNSW (JVector) indexes
755755
- [Server Mode](../guide/server.md): HTTP API and Studio UI
756756
- [Quick Start](../getting-started/quickstart.md): Getting started guide

bindings/python/docs/api/exporter.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,15 @@
22

33
The Exporter provides utilities for exporting database content to various formats including JSONL, GraphML, and GraphSON.
44

5+
!!! tip "Using Context Managers"
6+
For automatic resource cleanup, prefer using context managers:
7+
```python
8+
with arcadedb.open_database("./mydb") as db:
9+
export_database(db, "./export.jsonl", format="jsonl")
10+
# Database automatically closed
11+
```
12+
Examples below show explicit `db.close()` for clarity, but context managers are recommended in production.
13+
514
## Overview
615

716
The `exporter` module enables:

bindings/python/docs/api/importer.md

Lines changed: 70 additions & 115 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,24 @@
11
# Importer API
22

3-
The `Importer` class and convenience functions provide high-performance data import capabilities for ArcadeDB. The Java importer supports CSV/TSV and XML. For full-database migrations, use ArcadeDB's native JSONL export/import via the `IMPORT DATABASE file://...` SQL command (see JSONL example below).
3+
The `Importer` class and convenience functions provide high-performance data import
4+
capabilities for ArcadeDB. For full-database migrations, use ArcadeDB's native JSONL
5+
export/import via the `IMPORT DATABASE file://...` SQL command (see JSONL example
6+
below).
47

58
## Overview
69

710
The importer uses streaming parsers for memory efficiency and performs batch transactions (default 1000 records per commit) for optimal performance. It can import data as documents, vertices, or edges depending on your schema needs.
811

912
**Supported Formats:**
10-
- **CSV/TSV**: Comma or tab-separated values
11-
- **XML**: Attribute-focused importer
13+
- **CSV/TSV**: Comma or tab-separated values (recommended for bulk imports)
1214
- **ArcadeDB JSONL export/import**: Use `IMPORT DATABASE file://...` via SQL for full database moves (see example)
15+
- **XML**: Limited support via Java importer (not recommended for production use)
1316

1417
## Module Functions
1518

16-
Convenience functions for common import tasks without creating an `Importer` instance. These call the underlying Java importer (CSV/TSV, XML) or native SQL for full-database JSONL imports.
19+
Convenience functions for common import tasks without creating an `Importer` instance.
20+
These call the underlying Java importer (CSV/TSV) or native SQL for full-database JSONL
21+
imports.
1722

1823
### `import_csv(database, file_path, type_name, **options)`
1924

@@ -26,11 +31,13 @@ Import CSV or TSV files as documents, vertices, or edges.
2631
- `**options`: Format-specific options
2732
- `delimiter` (str): Field delimiter (default: ',', use '\t' for TSV)
2833
- `header` (bool): File has header row (default: True)
29-
- `commit_every` (int): Records per transaction (default: 1000)
30-
- `vertex_type` (str): Import as vertices (optional)
31-
- `edge_type` (str): Import as edges (optional)
34+
- `commitEvery` or `commit_every` (int): Records per transaction (default: 1000)
35+
- `import_type` (str): `"documents"` (default), `"vertices"`, or `"edges"`
36+
- `typeIdProperty` (str): ID column when importing vertices (e.g., "id")
37+
- `vertexType` (str): Optional explicit vertex type name (defaults to `type_name`)
3238
- `from_property` (str): Source column for edges (default: 'from')
3339
- `to_property` (str): Target column for edges (default: 'to')
40+
- `edgeType` (str): Optional explicit edge type name (defaults to `type_name`)
3441
- `verbose` (bool): Print errors during import (default: False)
3542

3643
**Examples:**
@@ -43,41 +50,29 @@ stats = arcadedb.import_csv(db, "people.csv", "Person")
4350
**Import as Vertices:**
4451
```python
4552
stats = arcadedb.import_csv(
46-
db, "users.csv", "User",
47-
vertex_type="User",
48-
commit_every=500
53+
db, "users.csv", "User",
54+
import_type="vertices",
55+
typeIdProperty="id",
56+
commitEvery=500
4957
)
5058
```
5159

5260
**Import as Edges:**
5361
```python
5462
stats = arcadedb.import_csv(
55-
db, "follows.csv", "Follows",
56-
edge_type="Follows",
57-
from_property="user_rid",
58-
to_property="follows_rid",
59-
header=True
63+
db, "follows.csv", "Follows",
64+
import_type="edges",
65+
from_property="from_rid",
66+
to_property="to_rid"
6067
)
6168
```
6269

63-
**Import TSV:**
64-
```python
65-
stats = arcadedb.import_csv(
66-
db, "data.tsv", "Data",
67-
delimiter='\t'
68-
)
69-
```
70-
71-
### ArcadeDB JSONL import (native SQL)
72-
73-
Use SQL `IMPORT DATABASE file://...` for JSONL exports produced by ArcadeDB. This preserves schema and data and is recommended for full migrations.
74-
70+
**Full-Database Import (JSONL):**
7571
```python
7672
import arcadedb_embedded as arcadedb
7773

78-
db = arcadedb.create_database("./restored_db")
79-
db.command("sql", "IMPORT DATABASE file:///tmp/export.jsonl.tgz WITH commitEvery = 50000")
80-
db.close()
74+
with arcadedb.create_database("./restored_db") as db:
75+
db.command("sql", "IMPORT DATABASE file:///tmp/export.jsonl.tgz WITH commitEvery = 50000")
8176
```
8277

8378
For XML imports, use the `Importer` class directly (see below) with `format_type='xml'`.
@@ -130,7 +125,6 @@ Import data from a file with auto-detection or explicit format specification.
130125
importer = Importer(db)
131126

132127
# Auto-detect format from extension
133-
stats = importer.import_file("data.json")
134128
stats = importer.import_file("users.csv", type_name="User")
135129

136130
# Explicit format
@@ -146,31 +140,6 @@ stats = importer.import_file(
146140

147141
## Format-Specific Details
148142

149-
### JSON Format
150-
151-
**File Structure:**
152-
- Single JSON object: `{...}`
153-
- Array of objects: `[{...}, {...}]`
154-
- Multiple root objects: `{...}\n{...}`
155-
156-
**Options:**
157-
- `mapping` (Dict): Map JSON paths to database types (advanced)
158-
159-
**Type Inference:** The importer uses Java's `JSONImporterFormat` which automatically creates schema based on JSON structure.
160-
161-
**Example:**
162-
```python
163-
# data.json:
164-
# [
165-
# {"name": "Alice", "age": 30, "city": "NYC"},
166-
# {"name": "Bob", "age": 25, "city": "LA"}
167-
# ]
168-
169-
stats = arcadedb.import_json(db, "data.json", commit_every=1000)
170-
```
171-
172-
---
173-
174143
### CSV Format
175144

176145
**File Structure:**
@@ -342,35 +311,27 @@ except arcadedb.ArcadeDBError as e:
342311
```python
343312
import arcadedb_embedded as arcadedb
344313

345-
# Open or create database
346-
db = arcadedb.create_database("./import_demo")
347-
348-
# Create schema
349-
db.command("sql", "CREATE DOCUMENT TYPE User")
350-
db.command("sql", "CREATE VERTEX TYPE Person")
351-
db.command("sql", "CREATE EDGE TYPE Knows")
352-
353-
# Import documents from JSON
354-
stats1 = arcadedb.import_json(db, "users.json")
355-
print(f"Users: {stats1['documents']}")
356-
357-
# Import vertices from CSV
358-
stats2 = arcadedb.import_csv(
359-
db, "people.csv", "Person",
360-
vertex_type="Person"
361-
)
362-
print(f"People: {stats2['vertices']}")
363-
364-
# Import edges from CSV
365-
stats3 = arcadedb.import_csv(
366-
db, "relationships.csv", "Knows",
367-
edge_type="Knows",
368-
from_property="person1_rid",
369-
to_property="person2_rid"
370-
)
371-
print(f"Relationships: {stats3['edges']}")
372-
373-
db.close()
314+
# Open or create database (auto-closes)
315+
with arcadedb.create_database("./import_demo") as db:
316+
# Create schema with embedded API
317+
db.schema.create_vertex_type("Person")
318+
db.schema.create_edge_type("Knows")
319+
320+
# Import vertices from CSV
321+
stats = arcadedb.import_csv(
322+
db, "people.csv", "Person",
323+
vertex_type="Person"
324+
)
325+
print(f"People: {stats['vertices']}")
326+
327+
# Import edges from CSV
328+
stats2 = arcadedb.import_csv(
329+
db, "relationships.csv", "Knows",
330+
edge_type="Knows",
331+
from_property="person1_rid",
332+
to_property="person2_rid"
333+
)
334+
print(f"Relationships: {stats2['edges']}")
374335
```
375336

376337
### Large-Scale Import with Progress Tracking
@@ -379,36 +340,30 @@ db.close()
379340
import arcadedb_embedded as arcadedb
380341
import time
381342

382-
db = arcadedb.create_database("./large_import")
383-
384-
# Create schema
385-
db.command("sql", "CREATE VERTEX TYPE Product")
343+
with arcadedb.create_database("./large_import") as db:
344+
# Create schema with embedded API
345+
db.schema.create_vertex_type("Product")
386346

387-
# Import with progress monitoring
388-
print("Starting import...")
389-
start = time.time()
347+
# Import with progress monitoring
348+
print("Starting import...")
349+
start = time.time()
390350

391-
stats = arcadedb.import_csv(
392-
db, "products.csv", "Product",
393-
vertex_type="Product",
394-
commit_every=10000, # Large batches for performance
395-
verbose=True # Show errors
396-
)
351+
stats = arcadedb.import_csv(
352+
db, "products.csv", "Product",
353+
vertex_type="Product",
354+
commit_every=10000, # Large batches for performance
355+
verbose=True # Show errors
356+
)
397357

398-
elapsed = time.time() - start
358+
elapsed = time.time() - start
399359

400-
print(f"\nImport complete!")
401-
print(f"Records: {stats['vertices']:,}")
402-
print(f"Errors: {stats['errors']}")
403-
print(f"Time: {elapsed:.2f}s")
404-
print(f"Rate: {stats['vertices'] / elapsed:.0f} records/sec")
405-
406-
# Create indexes after import
407-
print("\nCreating indexes...")
408-
db.command("sql", "CREATE INDEX ON Product (sku) UNIQUE")
409-
db.command("sql", "CREATE INDEX ON Product (category) NOTUNIQUE")
360+
print(f"\nImport complete!")
361+
print(f"Records: {stats['vertices']:,}")
362+
print(f"Errors: {stats['errors']}")
363+
print(f"Time: {elapsed:.2f}s")
364+
print(f"Rate: {stats['vertices'] / elapsed:.0f} records/sec")
410365

411-
db.close()
366+
# Create indexes after import
412367
```
413368

414369
### ArcadeDB JSONL Import (native export)
@@ -418,14 +373,14 @@ Use ArcadeDB's built-in SQL command to import JSONL exports created by `EXPORT D
418373
```python
419374
import arcadedb_embedded as arcadedb
420375

421-
db = arcadedb.create_database("./restored_db")
422376
import_path = "/exports/mydb.jsonl.tgz" # JSONL export created by ArcadeDB
423377

424-
# Use SQL IMPORT DATABASE with optional tuning parameters
425-
db.command("sql", f"IMPORT DATABASE file://{import_path} WITH commitEvery = 50000")
378+
# Use context manager for lifecycle
379+
with arcadedb.create_database("./restored_db") as db:
380+
# Use SQL IMPORT DATABASE with optional tuning parameters
381+
db.command("sql", f"IMPORT DATABASE file://{import_path} WITH commitEvery = 50000")
426382

427-
print("Import complete!")
428-
db.close()
383+
print("Import complete!")
429384
```
430385

431386
This approach preserves the full database (schema + data) and is the recommended path for large migrations or round-tripping ArcadeDB exports.

bindings/python/docs/api/results.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,17 @@
22

33
The `ResultSet` and `Result` classes provide Python-friendly interfaces for working with query results from ArcadeDB. They handle iteration, property access, and type conversion automatically.
44

5+
!!! tip "Using Context Managers"
6+
For automatic resource cleanup, prefer using context managers:
7+
```python
8+
with arcadedb.open_database("./mydb") as db:
9+
result_set = db.query("sql", "SELECT FROM Person WHERE age > 25")
10+
for result in result_set:
11+
print(result.get("name"))
12+
# Database automatically closed
13+
```
14+
Examples below show explicit `db.close()` for clarity, but context managers are recommended in production.
15+
516
## Overview
617

718
When you execute a query, ArcadeDB returns a `ResultSet` that can be iterated to access individual `Result` objects. Each `Result` represents one row/record from your query.

0 commit comments

Comments
 (0)