What do you want?
A POST /bulk_import endpoint that creates nodes and edges in a single LMDB write transaction, bypassing the query engine.
Use case: Loading large graphs (100K+ nodes, 1M+ edges) from external data sources like code intelligence indexes (SCIP), knowledge graphs, or data migrations. Currently this requires thousands of individual query calls, each with its own transaction commit and fsync overhead.
Proposed request format:
{
"nodes": [
{ "label": "Person", "temp_id": "n0", "properties": { "name": "Alice" } },
{ "label": "Person", "temp_id": "n1", "properties": { "name": "Bob" } }
],
"edges": [
{ "label": "Knows", "from": "n0", "to": "n1", "properties": { "since": "2024" } }
]
}
temp_id lets edges reference nodes created in the same batch
from/to also accept existing node UUIDs
- Entire batch is atomic (single LMDB transaction)
- Response returns
temp_id → UUID mapping
Why the query engine path is slow for bulk loading:
- One fsync per HTTP request (~5ms each, adds up to minutes at scale)
- Per-item iterator/traversal overhead (G::new_mut, collect_to_obj)
- Per-item bincode buffer allocation
We have a working implementation with 10 tests — happy to open a PR if this aligns with the project direction.
What do you want?
A
POST /bulk_importendpoint that creates nodes and edges in a single LMDB write transaction, bypassing the query engine.Use case: Loading large graphs (100K+ nodes, 1M+ edges) from external data sources like code intelligence indexes (SCIP), knowledge graphs, or data migrations. Currently this requires thousands of individual query calls, each with its own transaction commit and fsync overhead.
Proposed request format:
{ "nodes": [ { "label": "Person", "temp_id": "n0", "properties": { "name": "Alice" } }, { "label": "Person", "temp_id": "n1", "properties": { "name": "Bob" } } ], "edges": [ { "label": "Knows", "from": "n0", "to": "n1", "properties": { "since": "2024" } } ] }temp_idlets edges reference nodes created in the same batchfrom/toalso accept existing node UUIDstemp_id → UUIDmappingWhy the query engine path is slow for bulk loading:
We have a working implementation with 10 tests — happy to open a PR if this aligns with the project direction.