|
| 1 | +# lite-unified-surrealql-lance-v1 — one store + one query surface, behind a feature gate |
| 2 | + |
| 3 | +> **Status:** CONJECTURE / design. **Test via feature gate; do NOT commit the |
| 4 | +> stack change.** Needs a convergence + cross-domain + truth-architect probe |
| 5 | +> (mechanism-vs-rhyme + the query-shape measurement) before any promotion. |
| 6 | +> **Date:** 2026-06-18. **Parent threads:** the DO-arm (`ExecTarget::SurrealQl`, |
| 7 | +> `lance-graph-contract::action`), `docs/STACK_SCAFFOLD.md`, the |
| 8 | +> "cold TS + kanban stay Lance-native" ruling. |
| 9 | +
|
| 10 | +## Epiphany (less is more) |
| 11 | + |
| 12 | +Today there are **two query engines over the same lance storage** (lance-graph's |
| 13 | +*datafusion* planner + surreal's *SurrealQL*) and **two storage engines** |
| 14 | +(lance vs rocksdb). The "lite unified" bet collapses both: **ONE store (lance-KV) |
| 15 | ++ ONE primary query surface (SurrealQL via the AR-API adapter)**, datafusion |
| 16 | +**feature-gated**, rocksdb **dropped**. Cypher/SQL/neo4j lower to SurrealQL — |
| 17 | +which is *natively* graph (`->edge->`), a better target than Cypher→datafusion-SQL. |
| 18 | + |
| 19 | +## The bet, as a feature gate (default-OFF) |
| 20 | + |
| 21 | +A `lite-unified` feature that, when ON: |
| 22 | +1. **Storage = surreal kv-lance** (one store; drop rocksdb). *Blocked on:* surreal |
| 23 | + kv-lance is implemented as a module but not yet feature-wired |
| 24 | + (`surrealdb/core/src/kvs/lance/`, the `.claude/lance-backend` integration). |
| 25 | +2. **Query/exec = SurrealQL** via the AR-API adapter. The polyglot parser |
| 26 | + (Cypher/GQL/Gremlin/SPARQL/neo4j) lowers to **SurrealQL** (or the DO-arm |
| 27 | + `ActionInvocation`) instead of datafusion SQL. *Missing today:* the |
| 28 | + polyglot→SurrealQL lowering (today it's polyglot→datafusion). |
| 29 | +3. **datafusion = `optional`, OFF** on this path. Kept behind a separate |
| 30 | + `datafusion-analytical` feature for the workloads that genuinely need |
| 31 | + vectorized/analytical SQL (joins, aggregations) — SurrealQL's weak spot. |
| 32 | +4. The DO-arm `ExecTarget::SurrealQl` becomes the **primary** exec path, not one |
| 33 | + of four. |
| 34 | + |
| 35 | +## What stays regardless (NOT datafusion) |
| 36 | + |
| 37 | +lance vector search, CAM-PQ / bgz17 codec stack, the cognitive substrate |
| 38 | +(BindSpace→MailboxSoA, the write contract, the SPO/AriGraph tissue). These are |
| 39 | +orthogonal to the query-engine choice. |
| 40 | + |
| 41 | +## Where it's a win vs a downgrade (the honest split) |
| 42 | + |
| 43 | +- **Win (the bulk):** graph traversal, AR CRUD, cognitive/SPO, vector search — |
| 44 | + SurrealQL-on-lance fits, and Cypher→SurrealQL graph is a *better* lowering. |
| 45 | + Footprint: drop the rocksdb C++ build outright; make datafusion (a large Rust |
| 46 | + dep) optional. |
| 47 | +- **Downgrade:** heavy analytical SQL (multi-way joins, aggregations, columnar |
| 48 | + scan) — datafusion's strength, SurrealQL's weakness. Hence datafusion stays |
| 49 | + feature-gated, not deleted. |
| 50 | + |
| 51 | +## Falsifier (truth-architect — measure before promoting) |
| 52 | + |
| 53 | +Take lance-graph's `datafusion_planner` test queries (the Cypher→SQL cases) and |
| 54 | +check **SurrealQL can express each**. Covered → drop datafusion for that path; |
| 55 | +analytical gaps → keep `datafusion-analytical` for those only. Also measure the |
| 56 | +real footprint delta (`cargo tree --no-default-features` + release `cargo bloat`) |
| 57 | +once kv-lance is feature-wired — the proxy is lance-graph ≈ 889 crates, surreal |
| 58 | +(all backends) ≈ 1148; the marginal SurrealQL-engine cost is ~260 crates, rocksdb |
| 59 | +is a separate C++ build. |
| 60 | + |
| 61 | +## Increments (all behind `lite-unified`, none committed to the default path) |
| 62 | + |
| 63 | +1. **Probe (no code):** convergence + cross-domain (mechanism-vs-rhyme) + |
| 64 | + truth-architect (the datafusion_planner query-shape coverage check). Gate. |
| 65 | +2. **Wire surreal kv-lance** as a feature (finish the `.claude/lance-backend` |
| 66 | + integration; add the `kv-lance` feature + lance dep + `mod lance` in `kvs/mod.rs`). |
| 67 | +3. **Polyglot→SurrealQL lowering** — the missing front-end leg (parallel to the |
| 68 | + existing polyglot→datafusion). |
| 69 | +4. **`datafusion` → `optional`** + a `datafusion-analytical` feature; default the |
| 70 | + common path to SurrealQL-on-lance under `lite-unified`. |
| 71 | +5. **Measure** footprint + query-shape coverage; promote CONJECTURE→FINDING or |
| 72 | + correct. |
| 73 | + |
| 74 | +## Blockers / open questions |
| 75 | + |
| 76 | +- **OQ-LU-1:** surreal kv-lance feature-wiring (the integration TODOs). |
| 77 | +- **OQ-LU-2:** does SurrealQL cover the lance-graph datafusion_planner query |
| 78 | + shapes the live workloads actually use? (the falsifier). |
| 79 | +- **OQ-LU-3:** is the polyglot→SurrealQL lowering cleaner than polyglot→datafusion |
| 80 | + for the non-graph dialects (SPARQL/Gremlin)? |
| 81 | +- Do NOT touch the default build until the probe is green. |
0 commit comments