Skip to content

Commit 51354c1

Browse files
committed
Refactor benchmark scripts and add internal vector latency probe
- Deleted outdated benchmark results summary for graph OLAP across all datasets. - Introduced a new internal script for measuring vector query latency over HTTP, allowing for comparison between raw and expanded SQL forms. - Updated OLTP and OLAP matrix scripts to increase the number of runs from 1 to 3 and expanded the supported databases to include PostgreSQL and DuckDB. - Modified graph OLTP and OLAP matrix scripts to adjust memory limits and include Neo4j as a supported database. - Enhanced summary scripts to capture Neo4j versioning and GAV (Graph Analytical View) setup times, improving the clarity of benchmark results. - Updated documentation within scripts to reflect changes in GAV handling and database support.
1 parent a122168 commit 51354c1

15 files changed

Lines changed: 4516 additions & 584 deletions

bindings/python/docs/examples/09_stackoverflow_graph_oltp.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Example 09 is the graph-oriented OLTP benchmark in the Python examples set.
1010

1111
- Builds the Stack Overflow graph with the repository's directed schema conventions
1212
- Runs mixed graph CRUD operations against the selected backend
13+
- Supports ArcadeDB embedded and Neo4j client/server execution paths alongside the in-process backends
1314
- Measures throughput, latency, disk usage, and peak RSS
1415
- Supports deterministic single-thread verification for repeatability checks
1516

@@ -655,6 +656,9 @@ DELETE r
655656
- ArcadeDB graph preload now uses `GraphBatch` for the initial node and edge load,
656657
driven by the configured `--threads` value
657658
- `GraphBatch` is the repository's recommended bulk graph ingest path from Python
659+
- Neo4j runs through a Dockerized server plus Python driver wrapper, with the benchmark
660+
splitting the configured global memory/CPU budget between client and server via
661+
`--server-fraction`
658662
- Traversal expectations should be read as directed unless the query pattern
659663
explicitly traverses both directions
660664
- For cross-database comparability, `--threads 1` is the recommended baseline
@@ -665,6 +669,7 @@ DELETE r
665669

666670
- `arcadedb_sql`
667671
- `arcadedb_cypher`
672+
- `neo4j`
668673
- `ladybug` / `ladybugdb`
669674
- `graphqlite`
670675
- `duckdb`
@@ -694,12 +699,15 @@ python 09_stackoverflow_graph_oltp.py \
694699
- `--transactions`: number of OLTP operations
695700
- `--batch-size`: preload XML insert batch size
696701
- `--mem-limit`: Docker and JVM memory budget
702+
- `--server-fraction`: for Neo4j, fraction of the total CPU/memory budget reserved for the server process
697703
- `--sqlite-profile`: SQLite tuning profile when using SQLite-backed paths
698704

699705
## Result Notes
700706

701707
- `du_mib` is real post-run filesystem usage
702708
- `disk_after_*` fields are benchmark-reported logical size counters
709+
- Neo4j runs also record `client_rss_peak_*` and `server_rss_peak_*`, while `rss_peak_*`
710+
represents the combined observed peak
703711
- Per-operation latency is derived from `latency_summary.ops.{50,95,99}` with values
704712
converted from seconds to milliseconds
705713
- Operation totals come from `op_counts`

bindings/python/docs/examples/10_stackoverflow_graph_olap.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Example 10 is the graph-oriented OLAP benchmark in the Python examples set.
1616
- Builds the Stack Overflow graph with directed edge types
1717
- Runs a fixed analytical query suite across the selected backend
1818
- Optionally builds a GAV for ArcadeDB before running the Cypher workload
19+
- Supports Neo4j as a client/server backend using the same query suite, without a GAV-specific acceleration toggle
1920
- Records load/index/query timings, disk usage, and peak RSS
2021
- Records whether GAV was enabled and the time spent waiting for GAV `READY`
2122
- Supports repeated query runs and single-query filtering
@@ -188,12 +189,15 @@ The source creates unique `Id` indexes on all six vertex types before the query
188189
- ArcadeDB query execution is cypher-only in this example path
189190
- ArcadeDB GAV usage is opt-in through `--use-gav`; when enabled, the benchmark waits
190191
for the analytical view to reach `READY` before measuring the query suite
192+
- Neo4j runs execute the same OLAP query suite through a Dockerized server plus Python
193+
driver wrapper, with client/server resource accounting derived from `--server-fraction`
191194
- Traversal expectations should be interpreted as directed
192195

193196
## Supported Backends
194197

195198
- `arcadedb_sql`
196199
- `arcadedb_cypher`
200+
- `neo4j`
197201
- `ladybug` / `ladybugdb`
198202
- `graphqlite`
199203
- `duckdb`
@@ -224,5 +228,6 @@ python 10_stackoverflow_graph_olap.py \
224228
- `--query-order`: `fixed` or `shuffled`
225229
- `--only-query`: run a single named query
226230
- `--manual-checks`: enable additional validation queries
231+
- `--server-fraction`: for Neo4j, fraction of the total CPU/memory budget reserved for the server process
227232
- `--use-gav`: for ArcadeDB runs, create a Graph Analytical View and measure the
228233
wait until it reports `READY`

0 commit comments

Comments
 (0)