Skip to content

Graph extension (Proj. Manifold Stage 1/2)#4996

Closed
Hextaku wants to merge 1 commit into
clockworklabs:masterfrom
Hextaku:pr/manifold-stage-1
Closed

Graph extension (Proj. Manifold Stage 1/2)#4996
Hextaku wants to merge 1 commit into
clockworklabs:masterfrom
Hextaku:pr/manifold-stage-1

Conversation

@Hextaku

@Hextaku Hextaku commented May 11, 2026

Copy link
Copy Markdown

Summary

Adds a userland graph extension to SpacetimeDB: a new GraphId SATS type, a shared graph-algo crate with BFS/DFS/shortest-path traversal algorithms, and a graph-module Wasm cdylib with vertex/edge CRUD, traversal reducers, and procedure-based optimized queries. Zero engine changes — everything runs through the existing reducer/procedure API.

Full context: Project Manifold Wiki

What changes

Area Files
SATS type system crates/sats/src/graph_id.rs (new), algebraic_type.rs, product_type.rs, satn.rs, lib.rs
Engine wiring crates/lib/src/lib.rs, crates/expr/src/lib.rs, crates/pg/src/encoder.rs
Algorithms modules/graph-algo/ (new crate, 310 lines + 193 line bench)
Module modules/graph/ (new crate, 440 lines)
Workspace Cargo.toml (+2 members), Cargo.lock

Why

I'm working on a project that needs native graph capabilities, and SpacetimeDB's real-time subscription architecture would be a perfect fit — it would dramatically reduce scope compared to running a separate graph database and managing server-db sync at scale. The alternative (Apache AGE) introduces significant operational complexity without a clear path to solving the sync problem cleanly.

I reached out on Discord asking whether graph features would be accepted in PRs. While I didn't get a direct response from Clockwork Labs, nobody shut the idea down either. Rather than wait indefinitely, I decided to build it as a proof of concept — a userland module that adds real graph capabilities with zero engine changes, backed by cross-system benchmarks that characterize exactly how far you can get without touching the engine.

The goal is to demonstrate that:

Graph support is viable in SpacetimeDB today, with no architectural changes.

The implementation is minimal and follows existing patterns (the SATS type mirrors Identity/Timestamp/Uuid exactly).

The userland ceiling is well understood — and where it ends (large-graph shortest path), Stage 2's case for engine-level operators is backed by measured data, not speculation.

If this is something Clockwork Labs is interested in merging: great!
If not: the benchmarks are public, the module works, the data speaks for itself, and anybody is able to deploy this for their projects.
I'd really enjoyed the challenge and am at least going to take a look if I can also pull off Stage 2.

More detail: Stage 1 — Userland Graph Extension

Benchmark highlights

Benchmarks are NOT on this branch or part of the MR. But on https://github.com/Hextaku/SpacetimeDB-Manifold/tree/pr/manifold-stage-1-bench

SpacetimeDB's userland module (Wasm, zero engine changes) vs. Neo4j and Apache AGE on three SNAP datasets:

ego-Facebook (~4K vertices, ~88K edges):

Metric STDB (opt) Neo4j (opt) AGE
Load 376 ms 1.74 s 1.95 s
Neighbor count 1.2 ms 8 ms 12 ms
BFS full 20 ms 54 ms 77 ms
Shortest path 16 ms 15 ms 127 ms

com-DBLP (~317K vertices, ~1M edges):

Metric STDB (opt) Neo4j (opt) AGE
Load 4.8 s 23.0 s 30.1 s
Neighbor count 1.9 ms 112 ms 147 ms
BFS full 339 ms 453 ms 4.42 s
Shortest path 219 ms 7.4 ms 2.20 s

com-LiveJournal (~4M vertices, ~35M edges):

Metric STDB (opt) Neo4j (opt) AGE
Load 179 s 786 s 972 s
Neighbor count 1.5 ms 110 ms 1.26 s
BFS full 14.5 s 39.7 s 603 s
Shortest path 7.7 s 125 ms 4.47 s

SpacetimeDB wins loading (4-6x), neighbors (constant ~1.5 ms via index scan), and BFS full traversal on every dataset. Neo4j leads shortest path (30-62x) due to its native bidirectional algorithm — a gap Stage 2 is designed to close.

Full results: Benchmarks

What's next: Stage 2

Stage 2 moves from userland to the SpacetimeDB engine. The benchmark data makes the priorities clear:

Priority Feature LiveJournal impact
1 Bidirectional shortest path operator 7.7 s → ~80-150 ms (50-100x)
2 Index-hop BFS (no adjacency rebuild) 14.5 s → ~5-7 s, eliminates 7.2 s tx hold
3 SQL-pipeline graph queries (no HTTP) ~2-5 ms per query
4 Variable-length path subscriptions New capability
5 Index-free adjacency storage ~5-7x per-hop

More detail: Stage 2 — Native Graph Engine

Test state

cargo build → PASS cargo test -p spacetimedb-sats --lib → 64/65 PASS (1 pre-existing upstream timestamp proptest) cargo test -p graph-algo → 17/17 PASS cargo test -p spacetimedb-update → 4/5 PASS (1 pre-existing upstream uninstall flake)

Known limitations

  • GraphId client codegen (AlgebraicTypeDef / TypespaceForGenerate) not yet implemented — blocked on client-side graph type design (Stage 2).
  • Cypher parser GraphId literal syntax — requires Cypher AST → RelExpr mapping (Stage 2).
  • Module reducer logic tested indirectly via the benchmark harness; dedicated integration tests for error handling and traversal result persistence in a follow-up.

- Add openCypher parser crate for parsing Cypher queries
- Implement Cypher-to-RelExpr translator with fixed-depth and variable-length path support
- Add graph query execution via PostgreSQL wire protocol
- Implement multiple MATCH pattern / cross-join support and NOT expression support
- Add graph query subscription integration with compile_cypher entry point
- Add userland graph module with Vertex/Edge tables and CRUD reducers
- Implement graph traversal reducers (BFS, DFS, shortest-path) with benchmarks
- Add graph-algo crate with traversal benchmark harness
- Add graph module integration and smoke tests
@CLAassistant

CLAassistant commented May 11, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@cloutiertyler

Copy link
Copy Markdown
Contributor

@Hextaku Thank you for your contribution! We don't have the bandwidth to maintain this at present, but we can definitely use it as future reference!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants