diff --git a/.wordlist.txt b/.wordlist.txt index 8c1a1543..d7ea289a 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -828,3 +828,7 @@ ObjectPool QuickJS ACLs filesystem +erroring +namespaces +tracebacks +eval diff --git a/index.md b/index.md index 40182925..3d03b87d 100644 --- a/index.md +++ b/index.md @@ -258,7 +258,14 @@ The full list and links can be found on the [Client Libraries](/getting-started/ ## Data import -When loading large graphs from CSV files, we recommend using [falkordb-bulk-loader](https://github.com/falkordb/falkordb-bulk-loader) +When loading large graphs from CSV files, use the [falkordb-bulk-loader](https://github.com/falkordb/falkordb-bulk-loader): + +```sh +pip install falkordb-bulk-loader +falkordb-bulk-insert GRAPHNAME -n nodes.csv -r edges.csv +``` + +See the [Bulk Loader documentation](/integration/bulk-loader) for the full reference. ## GitHub Discussions diff --git a/integration/bulk-loader.md b/integration/bulk-loader.md new file mode 100644 index 00000000..ca36e445 --- /dev/null +++ b/integration/bulk-loader.md @@ -0,0 +1,122 @@ +--- +title: "Bulk Loader" +description: "Import large graphs from CSV files into FalkorDB using the falkordb-bulk-loader tool" +parent: "Integration" +nav_order: 8 +--- + +# Bulk Loader + +The [falkordb-bulk-loader](https://github.com/falkordb/falkordb-bulk-loader) is a Python utility for building FalkorDB graphs from CSV files. It uses the `GRAPH.BULK` endpoint to import nodes and relationships efficiently in binary batches — much faster than issuing individual `CREATE` queries. + +## Requirements + +- Python 3.10 or later +- A running FalkorDB instance (see [Get Started](/getting-started)) + +## Installation + +```sh +pip install falkordb-bulk-loader +``` + +## Quick Start + +Given two CSV files — `Person.csv` (nodes) and `KNOWS.csv` (relationships) — import them into a graph named `SocialGraph`: + +```sh +falkordb-bulk-insert SocialGraph \ + -n Person.csv \ + -r KNOWS.csv +``` + +The label (for nodes) and relationship type (for relationships) are derived from the CSV filename. Multiple node and relation files can be provided by repeating the flags: + +```sh +falkordb-bulk-insert SocialGraph \ + -n Person.csv \ + -n Country.csv \ + -r KNOWS.csv \ + -r VISITED.csv +``` + +## Connecting to FalkorDB + +By default the loader connects to `redis://127.0.0.1:6379`. Use `--server-url` to point it at a different instance: + +```sh +falkordb-bulk-insert SocialGraph \ + --server-url redis://myhost:6379 \ + -n Person.csv +``` + +## Key Options + +| Flag | Extended flag | Description | +|:----:|---------------|-------------| +| `-u` | `--server-url TEXT` | Server URL (default: `redis://127.0.0.1:6379`) | +| `-n` | `--nodes TEXT` | Node CSV file (filename → label) | +| `-N` | `--nodes-with-label TEXT` | Explicit label followed by node CSV file | +| `-r` | `--relations TEXT` | Relationship CSV file (filename → type) | +| `-R` | `--relations-with-type TEXT` | Explicit type followed by relationship CSV file | +| `-o` | `--separator CHAR` | Field delimiter (default: `,`) | +| `-d` | `--enforce-schema` | Require typed column headers (see below) | +| `-j` | `--id-type TEXT` | Type of node ID property: `STRING` or `INTEGER` | +| `-s` | `--skip-invalid-nodes` | Skip duplicate node IDs instead of erroring | +| `-e` | `--skip-invalid-edges` | Skip edges with unknown endpoints instead of erroring | +| `-i` | `--index Label:Property` | Create a range index after import | +| `-f` | `--full-text-index Label:Property` | Create a full-text index after import | + +## Enforcing a Schema + +By default the loader infers each property's type. Use `--enforce-schema` (`-d`) when you want explicit control. Column headers must follow the `name:TYPE` format: + +**User.csv** +```csv +:ID(User),name:STRING,rank:INT +0,"Alice",5 +1,"Bob",8 +``` + +**FOLLOWS.csv** +```csv +:START_ID(User),:END_ID(User),weight:DOUBLE +0,1,0.9 +1,0,0.4 +``` + +```sh +falkordb-bulk-insert SocialGraph \ + --enforce-schema \ + -n User.csv \ + -r FOLLOWS.csv +``` + +Accepted type strings: `ID`, `START_ID`, `END_ID`, `IGNORE`, `STRING`, `INT` / `INTEGER` / `LONG`, `DOUBLE` / `FLOAT`, `BOOL` / `BOOLEAN`, `ARRAY`. + +## Bulk Updates + +The companion command `falkordb-bulk-update` reads a CSV in batches and issues a parameterized Cypher query for each row — useful for incremental updates or when you want full control over the Cypher: + +```sh +falkordb-bulk-update SocialGraph \ + --csv User.csv \ + --query "MERGE (:User {id: row[0], name: row[1], rank: row[2]})" +``` + +> **Note:** `falkordb-bulk-update` commits changes incrementally. Sanitize your CSV inputs beforehand to avoid leaving the graph in a partially-updated state. + +## Diagnostics + +Both `falkordb-bulk-insert` and `falkordb-bulk-update` install a `SIGUSR1` handler at startup. Sending `SIGUSR1` to a running loader process writes the tracebacks of all Python threads to `stderr`, which is useful for diagnosing hangs or unexpectedly slow loads without attaching a debugger: + +```sh +kill -SIGUSR1 +``` + +This relies on Python's `faulthandler` module and is only available on platforms that support `SIGUSR1` (i.e., not Windows). On unsupported platforms, registration is silently skipped. + +## Further Reading + +- [GitHub repository](https://github.com/falkordb/falkordb-bulk-loader) — full CLI reference, input constraints, and ID namespaces +- [GRAPH.BULK specification](/design/bulk-spec) — technical wire-format specification for the underlying endpoint diff --git a/integration/index.md b/integration/index.md index bb991eca..83041f96 100644 --- a/integration/index.md +++ b/integration/index.md @@ -21,5 +21,6 @@ Learn how to leverage FalkorDB's flexible APIs and SDKs to build high-performanc - [Spring Data FalkorDB](./spring-data-falkordb.md): Learn how to use FalkorDB with Spring Data for JPA-style object-graph mapping. - [Snowflake Integration](./snowflake.md): Learn how to run graph database operations directly within your Snowflake environment using the FalkorDB Native App. - [PyTorch Geometric](./pyg.md): Train Graph Neural Networks directly on graphs stored in FalkorDB using PyG's remote backend interface. +- [Bulk Loader](./bulk-loader.md): Import large graphs from CSV files using the falkordb-bulk-loader Python utility.