Skip to content

Commit d6562cd

Browse files
committed
Add benchmark results and summaries for stackoverflow-large dataset
- Created README.md files for benchmark results on 21-Mar-2026 and 24-Mar-2026, detailing OLTP and OLAP performance metrics for the stackoverflow-large dataset. - Added summary markdown files for OLTP and OLAP benchmarks, including detailed per-operation metrics and cross-DB hash checks. - Updated scripts for running benchmarks to increase thread count and memory limits for improved performance in OLTP and graph OLAP tests. - Noted improvements in index times and query performance, particularly for graph OLAP queries.
1 parent 44051d7 commit d6562cd

15 files changed

Lines changed: 683 additions & 41 deletions

bindings/python/docs/examples/16_import_database_vs_transactional_graph_ingest.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,21 @@ Example 16 is the graph-ingest comparison harness for embedded Python.
2020
## Current Repository Guidance
2121

2222
- This example exists because ingest winners are workload-dependent
23-
- Current results show that `GraphBatch` is the best bulk graph ingest option in this
24-
repository
23+
- Current results for the 5M/5M benchmark shape show that `IMPORT DATABASE` with
24+
`--parallel 4` is the fastest option in this repository
25+
- `GraphBatch` remains competitive and is the strongest non-import path in the current
26+
snapshot
2527
- Async SQL is still useful as a baseline comparison, but it is not the recommended
2628
bulk graph ingest path here
27-
- SQL `IMPORT DATABASE` remains available, but it is not the preferred graph bulk-load
28-
path from Python
29+
- SQL `IMPORT DATABASE` is now a preferred bulk graph load path for this benchmark shape
30+
when parallel import is enabled
2931

3032
## Recent Benchmark Snapshot
3133

3234
For this shape:
3335

34-
- `vertices=2,000,000`
35-
- `edges=2,000,000`
36+
- `vertices=5,000,000`
37+
- `edges=5,000,000`
3638
- `vertex-int-props=10`
3739
- `vertex-str-props=10`
3840
- `edge-int-props=10`
@@ -43,16 +45,14 @@ For this shape:
4345

4446
Measured times:
4547

46-
- `Transactional` (`single-threaded`): `253.615s`
47-
- `GraphBatch` (`--parallel 1`): `177.150s`
48-
- `GraphBatch` (`--parallel 4`): `187.681s`
49-
- `GraphBatch` (`--parallel 8`): `134.836s`
50-
- `Async SQL` (`--async-parallel 1`): `230.192s`
51-
- `IMPORT DATABASE` (`--parallel 1`): `444.357s`
52-
- `IMPORT DATABASE` (`--parallel 4`): `336.206s`
48+
- `Transactional` (`1 thread`): `575.078s`
49+
- `Async SQL` (`--async-parallel 1`): `701.080s`
50+
- `GraphBatch` (`--parallel 1`): `507.983s`
51+
- `GraphBatch` (`--parallel 4`): `359.672s`
52+
- `IMPORT DATABASE` (`--parallel 1`): `453.481s`
53+
- `IMPORT DATABASE` (`--parallel 4`): `275.325s`
5354

54-
This is why the broader graph example set now treats `GraphBatch` as the default bulk
55-
graph ingest path from Python.
55+
All four methods produced the same final graph output for this benchmark shape.
5656

5757
## Run
5858

bindings/python/examples/07_stackoverflow_tables_oltp.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -653,10 +653,11 @@ def configure_arcadedb_async_loader(db, batch_size: int, parallelism: int = 1):
653653
return async_exec
654654

655655

656-
def reset_arcadedb_async_loader(db):
657-
db.async_executor().wait_completion()
656+
def reset_arcadedb_async_loader(db, async_exec):
657+
async_exec.wait_completion()
658+
async_exec.close()
658659
db.set_read_your_writes(True)
659-
db.async_executor().set_transaction_use_wal(True)
660+
async_exec.set_transaction_use_wal(True)
660661

661662

662663
def insert_batch_sqlite(conn, table: Dict[str, Any], rows: List[Dict[str, Any]]):
@@ -792,7 +793,7 @@ def on_error(exc: Exception):
792793

793794
return id_pools, next_ids, time.time() - start
794795
finally:
795-
reset_arcadedb_async_loader(db)
796+
reset_arcadedb_async_loader(db, async_exec)
796797

797798

798799
def load_tables(

bindings/python/examples/08_stackoverflow_tables_olap.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -627,10 +627,11 @@ def configure_arcadedb_async_loader(db, batch_size: int, parallelism: int = 1):
627627
return async_exec
628628

629629

630-
def reset_arcadedb_async_loader(db):
631-
db.async_executor().wait_completion()
630+
def reset_arcadedb_async_loader(db, async_exec):
631+
async_exec.wait_completion()
632+
async_exec.close()
632633
db.set_read_your_writes(True)
633-
db.async_executor().set_transaction_use_wal(True)
634+
async_exec.set_transaction_use_wal(True)
634635

635636

636637
def sqlite_type(field_type: str) -> str:
@@ -1587,7 +1588,7 @@ def on_error(exc: Exception):
15871588
print(f" {count:,} rows in {elapsed:.2f}s")
15881589
load_total = time.time() - load_start
15891590
finally:
1590-
reset_arcadedb_async_loader(db)
1591+
reset_arcadedb_async_loader(db, async_exec)
15911592

15921593
load_counts_start = time.time()
15931594
table_counts_after_load = count_table_rows_arcadedb(db)

bindings/python/examples/13_stackoverflow_hybrid_queries.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -495,10 +495,11 @@ def configure_arcadedb_async_loader(db, batch_size: int, parallelism: int = 1):
495495
return async_exec
496496

497497

498-
def reset_arcadedb_async_loader(db):
499-
db.async_executor().wait_completion()
498+
def reset_arcadedb_async_loader(db, async_exec):
499+
async_exec.wait_completion()
500+
async_exec.close()
500501
db.set_read_your_writes(True)
501-
db.async_executor().set_transaction_use_wal(True)
502+
async_exec.set_transaction_use_wal(True)
502503

503504

504505
def load_table_arcadedb_async(
@@ -1721,7 +1722,7 @@ def on_error(exc: Exception):
17211722
)
17221723
load_time = time.time() - load_start
17231724
finally:
1724-
reset_arcadedb_async_loader(db)
1725+
reset_arcadedb_async_loader(db, async_exec)
17251726

17261727
index_time = create_indexes_with_retry(
17271728
db,

bindings/python/examples/16_import_database_vs_transactional_graph_ingest.py

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@
1111
1212
Goal: compare graph ingest speed for equivalent synthetic data shape.
1313
14-
Observed benchmark result (2026-03-19):
14+
Observed benchmark result (2026-03-24):
1515
For:
16-
- vertices=2,000,000
17-
- edges=2,000,000
16+
- vertices=5,000,000
17+
- edges=5,000,000
1818
- vertex-int-props=10
1919
- vertex-str-props=10
2020
- edge-int-props=10
@@ -24,13 +24,15 @@
2424
- heap-size=8g
2525
2626
Measured ingest times:
27-
- Transactional (`single-threaded`): 253.615s
28-
- GraphBatch (`single-threaded`, `--parallel 1`): 177.150s
29-
- GraphBatch (`4 threads`, `--parallel 4`): 187.681s
30-
- GraphBatch (`8 threads`, `--parallel 8`): 134.836s
31-
- Async SQL (`single-threaded`, `--async-parallel 1`): 230.192s
32-
- IMPORT DATABASE (`single-threaded`, `--parallel 1`): 444.357s
33-
- IMPORT DATABASE (`4 threads`, `--parallel 4`): 336.206s
27+
- Transactional (`single-threaded`, `1 thread`): 575.078s
28+
- Async SQL (`single-threaded`, `--async-parallel 1`): 701.080s
29+
- GraphBatch (`single-threaded`, `--parallel 1`): 507.983s
30+
- GraphBatch (`4 threads`, `--parallel 4`): 359.672s
31+
- IMPORT DATABASE (`single-threaded`, `--parallel 1`): 453.481s
32+
- IMPORT DATABASE (`4 threads`, `--parallel 4`): 275.325s
33+
34+
Logical parity:
35+
- All four methods produced the same final graph output
3436
3537
Known limitation:
3638
Current `IMPORT DATABASE` behavior can vary by import path and data shape. In some

0 commit comments

Comments
 (0)