Skip to content

[Feature]: Bulk import endpoint for batch node/edge creation #896

@financialvice

Description

@financialvice

What do you want?

A POST /bulk_import endpoint that creates nodes and edges in a single LMDB write transaction, bypassing the query engine.

Use case: Loading large graphs (100K+ nodes, 1M+ edges) from external data sources like code intelligence indexes (SCIP), knowledge graphs, or data migrations. Currently this requires thousands of individual query calls, each with its own transaction commit and fsync overhead.

Proposed request format:

{
  "nodes": [
    { "label": "Person", "temp_id": "n0", "properties": { "name": "Alice" } },
    { "label": "Person", "temp_id": "n1", "properties": { "name": "Bob" } }
  ],
  "edges": [
    { "label": "Knows", "from": "n0", "to": "n1", "properties": { "since": "2024" } }
  ]
}
  • temp_id lets edges reference nodes created in the same batch
  • from/to also accept existing node UUIDs
  • Entire batch is atomic (single LMDB transaction)
  • Response returns temp_id → UUID mapping

Why the query engine path is slow for bulk loading:

  • One fsync per HTTP request (~5ms each, adds up to minutes at scale)
  • Per-item iterator/traversal overhead (G::new_mut, collect_to_obj)
  • Per-item bincode buffer allocation

We have a working implementation with 10 tests — happy to open a PR if this aligns with the project direction.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions