Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions docs/contributor_guide/cypher-frontend/000-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Cypher frontend for pgGraph — overview

> **Status:** Plan / spec. No code on this branch yet.
> **Branch:** `feat/cypher-frontend`
> **Upstream dependency:** [cyrs](https://github.com/) — Rust openCypher v9 /
> GQL frontend. Frontend-only (parser, HIR, plan IR, sema, diagnostics).
> No executor; pgGraph supplies the executor.
> **Companion document:** `../../../../cyrs/feat-request.md` — the asks we
> are making upstream of cyrs to make this fit.

## What this is

pgGraph today exposes graph queries as PostgreSQL SQL functions:

```sql
SELECT * FROM graph.traverse(
'public.people'::regclass, 'alice',
/*max_depth=*/ 2, /*edge_types=*/ ARRAY['KNOWS'],
/*direction=*/ 'outgoing', ...
);
```

This is the right interface for *the engine*, but it's a poor surface
for *users asking graph questions*. Multi-hop pattern queries with
predicates are exactly what Cypher is good at expressing.

We will add a new SQL function `graph.cypher(text, jsonb)` that accepts
an openCypher v9 query string, parses it through the
[cyrs](https://github.com/) frontend, and dispatches each plan operator
to either pgGraph's existing in-memory engine (reads) or to SPI-issued
DML against the registered source tables (writes).

## What this is NOT

- It is **not** a replacement query language. SQL stays. The
`graph.cypher(...)` function is *additive*.
- It is **not** a "graph database mode" where the extension owns graph
storage. Your tables remain the source of truth (per pgGraph's
founding pitch).
- It is **not** a cost-based optimiser. cyrs produces logical plans;
pgGraph executes them. Anything resembling a planner is in cyrs or
in Postgres, never here.
- It is **not** scope creep for the extension. The extension already
uses SPI to write to user tables (catalog, sync, build). Cypher
writes use the same machinery against user-registered "label tables".

## Why cyrs

- Layered: we consume **HIR + Plan** (per `cyrs/docs/integration-depth.md`
decision table — "graph database → HIR + Plan"). Cheaper than building
our own parser; richer than consuming the agent JSON.
- Has dialect modes; the `OpenCypherV9` mode is exactly the surface
we want to expose.
- Diagnostics are first-class (`cyrs-diag`, codes `E0xxx` through
`E5xxx`), span-accurate, and can be projected through Postgres'
`ereport(ERROR, ...)`.
- The `WriteOp` set is complete for v9 (`CreateNode`, `CreateRel`,
`MergeNode`, `MergeRel`, `SetProperty`, `SetLabels`,
`RemoveProperty`, `RemoveLabels`, `Delete{detach}`). We do not need
to build a write-side IR ourselves.
- Frontend-only by design: no executor to fight with.

## High-level architecture

```
┌──────────────────────────────────────────────────────────────────┐
│ Postgres backend (one transaction) │
│ │
│ SELECT * FROM graph.cypher('MATCH ... RETURN ...', '{}'::jsonb)│
│ │ │
│ ┌──────────────────────────▼─────────────────────────────┐ │
│ │ graph crate — new module: cypher_facade │ │
│ │ │ │
│ │ 1. parse + HIR-lower (cyrs_hir) │ │
│ │ 2. sema (schema-aware) (cyrs_sema + our │ │
│ │ SchemaProvider impl backed by pgGraph catalog) │ │
│ │ 3. plan-lower (cyrs_plan) │ │
│ │ 4. dispatch: │ │
│ │ - ReadOp tree → engine + row evaluator │ │
│ │ - WriteOp list → SPI DML on source tables │ │
│ │ 5. materialise rows as JSONB → TableIterator │ │
│ └─────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼─────────────────────────────┐ │
│ │ Existing pgGraph machinery (unchanged): │ │
│ │ • engine.rs / bfs.rs / path_finder.rs (reads) │ │
│ │ • Spi::run_with_args(INSERT/UPDATE/DELETE) (writes) │ │
│ │ • sync triggers pick up writes for index refresh │ │
│ └────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
```

## Documents in this directory

- **000-overview.md** — this file.
- **010-architecture.md** — module layout, type contracts, where in
`graph/src/` each piece lands.
- **020-catalog-extensions.md** — new catalog tables, `SchemaProvider`
implementation, label↔table mapping, unique-constraint registration.
- **030-read-mapping.md** — `cyrs_plan::ReadOp` → pgGraph engine
call / SQL emission, operator by operator.
- **040-write-mapping.md** — `cyrs_plan::WriteOp` → SPI DML, operator
by operator, plus MERGE / DETACH DELETE semantics.
- **050-expr-and-types.md** — `cyrs_plan::Expr` evaluation
(push-to-SQL vs Rust-side), Cypher↔Postgres type bridge, null/3VL
alignment.
- **060-diagnostics-and-errors.md** — how cyrs diagnostics surface as
Postgres `ereport`, embedder-host diagnostic range, error UX.
- **070-milestones-and-tests.md** — milestone plan, openCypher TCK
subset wiring, integration test fixtures.
- **080-open-questions.md** — known unknowns to resolve before / during
implementation. Issues blocked on upstream cyrs work cite
`feat-request.md` sections.

## Reading order

If you're new: 000 → 010 → 070 → 080. (Architecture, then milestones
to know what we're cutting, then open questions to know what's not
settled.)

If you're implementing: 020 (you need the catalog before anything
else) → 030 / 040 / 050 in parallel → 060 → 070.
145 changes: 145 additions & 0 deletions docs/contributor_guide/cypher-frontend/010-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Architecture

## Module layout (additions to `graph/src/`)

```
graph/src/
├── lib.rs # add: pub mod cypher_facade;
│ # add: pg_extern fn cypher(text, jsonb)
├── cypher_facade/
│ ├── mod.rs # entry: compile() → Plan; execute() → rows
│ ├── schema_provider.rs # impl cyrs_schema::SchemaProvider over catalog
│ ├── plan_translator/
│ │ ├── mod.rs
│ │ ├── read.rs # ReadOp → engine calls / SQL
│ │ ├── write.rs # WriteOp → SPI DML
│ │ ├── expr.rs # cyrs_plan::Expr → SQL fragment OR Rust eval
│ │ ├── path.rs # MATCH p = ... path materialisation
│ │ └── shortest.rs # ShortestPath op → path_finder
│ ├── row_eval.rs # in-process row evaluator for Filter /
│ │ # Project / Aggregate / OrderBy / Skip /
│ │ # Limit / Distinct / Unwind when SQL can't
│ ├── param_bind.rs # JSONB params → cyrs param map + pg type
│ ├── diag_to_pg.rs # cyrs_diag::Diagnostic → ereport
│ └── tests/
│ └── ...
├── catalog/
│ ├── mod.rs # existing; add label/rel mapping read+write
│ ├── labels.rs # NEW: label↔table↔column mapping
│ └── unique.rs # NEW: registered uniqueness constraints
└── sql/ # NEW migrations for new catalog tables
└── cypher_catalog.sql
```

No changes to `engine.rs`, `bfs.rs`, `path_finder.rs`, `edge_store.rs`,
`node_store.rs`, `sync.rs`. The facade is strictly additive.

## Cargo.toml additions

```toml
[dependencies]
cyrs-hir = { version = "...", default-features = false }
cyrs-plan = { version = "...", default-features = false }
cyrs-schema = { version = "...", default-features = false }
cyrs-sema = { version = "..." }
cyrs-diag = { version = "..." }
smol_str = "0.3" # cyrs surfaces SmolStr; we'll see it in matches
```

Version pin: a single git tag or crates.io minor version. See
`080-open-questions.md` Q-PKG-1.

## Public surface

Exactly one new pgrx SQL function:

```sql
-- Returns the row stream of a Cypher query.
-- result_jsonb is one row per RETURN row, columns flattened into a single JSONB object.
CREATE FUNCTION graph.cypher(query text, params jsonb DEFAULT '{}'::jsonb)
RETURNS TABLE (row jsonb)
LANGUAGE c STRICT VOLATILE;
```

`VOLATILE` because writes are allowed. A future `graph.cypher_read(...)`
companion declared `STABLE` is a possible optimisation (gates writes,
allows query-planner re-use) but is out of scope for v1.

## Pipeline contract

```rust
// cypher_facade/mod.rs (sketch)

pub fn execute(query: &str, params: serde_json::Value)
-> Result<Vec<JsonB>, FacadeError>
{
// 1. parse + HIR-lower.
let hir = cyrs_hir::lower::lower_statement(query)?;

// 2. schema-aware sema. Schema = pgGraph catalog snapshot.
let schema = SchemaProvider::from_catalog(snapshot_catalog()?);
let diags = cyrs_sema::check(&hir, &schema);
if diags.iter().any(|d| d.severity == Severity::Error) {
return Err(FacadeError::SemaErrors(diags));
}

// 3. plan-lower.
let plan = cyrs_plan::lower::lower_statement(&hir)?;

// 4. bind params (params.jsonb → cyrs param table, typed via 2.4 of feat-request).
let bound = param_bind::bind(&plan, params)?;

// 5. execute read tree, applying writes per row.
let rows = plan_translator::execute(&plan, &bound, &schema)?;

Ok(rows)
}
```

Steps 1–3 are pure functions; their result is cacheable on `(query,
catalog_fingerprint, schema_digest)`. The `catalog_fingerprint`
already exists (`catalog::catalog_fingerprint`). The `schema_digest`
comes from `cyrs_schema::SchemaProvider::schema_digest()`. We'll
share these for a per-backend statement cache in a later milestone.

## Boundaries with the existing engine

The facade calls into pgGraph's existing read path through a thin
adapter layer it owns. We don't expose engine internals back to cyrs.

| Read op | Engine entry point we'll call |
| ----------------------------- | ---------------------------------------------- |
| `Source { label, bind }` | `sql_search::source_table_search_rows` or `Spi` table scan |
| `Expand { single }` | new helper over `engine::Engine::adjacent` (one hop) |
| `Expand { variable-length }` | `sql_traversal::execute_traverse_rows` + `TraverseRequest` |
| `ShortestPath` (cy-feat §1.1) | `path_finder` (the existing shortest-path module) |
| All other ops | `row_eval` (in-process), composing engine results |

Write ops compose existing SPI helpers; the facade owns the SQL it
emits because pgGraph's current SPI users target the catalog/sync
path, not arbitrary user-table DML.

## Threading and transactions

- `graph.cypher(...)` is called inside a Postgres query, which is
inside a transaction. All SPI calls inherit that transaction.
- A whole Cypher statement therefore commits or rolls back atomically
with the rest of the user's transaction. No special savepoints
needed.
- The facade is single-threaded per invocation; we don't introduce
worker threads. pgGraph's background workers are unchanged.

## Error model

| Origin | Surfaces as |
| ---------------------------------------- | -------------------------------------------- |
| `cyrs_syntax` parse errors | `ereport(ERROR, ..., SQLSTATE 42601)` |
| `cyrs_sema` `Error` diagnostics | `ereport(ERROR, ..., SQLSTATE 42P10)` |
| `cyrs_plan` `PlanLowerError` | `ereport(ERROR, ..., SQLSTATE XX000)` |
| Embedder rejection (e.g. unmapped label) | `ereport(ERROR, ..., SQLSTATE 0A000)` |
| Underlying SPI error | bubble up the original SQLSTATE |
| `cyrs_sema` `Warning` / `Note` | `ereport(NOTICE / WARNING, ...)` per severity|

cyrs diagnostic spans become Postgres `errposition()` offsets where
available — wraps the `HirId → byte span` accessor request (§4.2 of
`feat-request.md`).
Loading