Skip to content

Commit bf84021

Browse files
committed
docs: docs(adr): V3 as spine + polyglot transpiler (Rust/Python/C#)
1 parent a1fb170 commit bf84021

1 file changed

Lines changed: 125 additions & 0 deletions

File tree

docs/V3-TRANSPILER-ADR.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# ADR: V3 as the spine + the polyglot transpiler (Rust / Python / C#)
2+
3+
**Status:** Proposed (RFC). Design contract; not yet implemented or
4+
compile-verified.
5+
**Date:** 2026-06-28
6+
**Context:** completes the "spine vs adapter" question left open by
7+
`SURREAL-AST-AS-ADAPTER.md` + `SURREAL-AST-TRAP-PREFLIGHT.md`, and names the
8+
transpiler superpower (re-emit the OGAR AST to any language via adapter).
9+
10+
---
11+
12+
## Decision
13+
14+
1. **V3 (the content-addressed rail record) is the spine.** SurrealQL /
15+
ClickHouse / PostgreSQL / TTL DDL are demoted to **peer adapters** that
16+
lower *from* V3 + `ClassView`. SurrealQL stops being a spine candidate.
17+
2. **The V3 record is dual-mode and tenant-structured** (below).
18+
3. **Codegen is an adapter family**: just as DDL adapters project the schema,
19+
`LangBackend` adapters re-emit *source code* (Rust / Python / C#) from the
20+
same IR. The IR is the interlingua; codegen is the transpiler.
21+
22+
---
23+
24+
## 1 · The V3 record (the spine primitive)
25+
26+
### 1.1 Dual-mode facet — `12 B = 96 bits = 6×16 = 4×24`, classid tags which
27+
```
28+
FacetCascade { facet_classid: u32, payload: [u8; 12] } // 16 B, content address
29+
30+
classid tag = Cascade → [FacetTier; 6] // 6 × (part_of:8, is_a:8) — POSITION (hierarchy)
31+
classid tag = Triplet → [SpoTriple; 4] // 4 × (subject:8, pred:8, object:8) — LOCAL EDGES (graph)
32+
```
33+
- Cascade = depth-with-implied-predicates (mereology:taxonomy); subsumption is
34+
a bit-op. Triplet = breadth-with-explicit-predicates; **an SPO triple is a
35+
triplet-mode facet**, which unifies the SPO corpus with the facet primitive
36+
(today they are unjoined substrates).
37+
- The tag rides in the classid (zero extra bytes; precedent: `TailVariant`).
38+
39+
### 1.2 The 512-byte record = 32 tenants
40+
```
41+
NodeRow 512 B ≡ [Facet; 32] (AoS row)
42+
≡ 32 tenants × [GUID; N] (SoA — "tenant" = a GUID member column)
43+
tenant 0 Self GUID
44+
tenant 1 Edges (EdgeBlock 12+4)
45+
tenants 2..31 30 composition slots → GUID references to other classes
46+
```
47+
`ClassView::tenant_schema(classid) -> [TenantRole; 32]`, **static per classid**
48+
(keeps each tenant a homogeneous, SIMD-scannable GUID column). Roles:
49+
`{ Self, Edges, Structural, Do, Think, Adapter }` (+ `nested`). The
50+
`Do` (ActionDef / do-arm) · `Think` (cognitive plane) · `Adapter` (projection)
51+
tenants are the three arms reached *through* the classid. Nesting = a
52+
content-addressed FK column → a columnar composition DAG.
53+
54+
> Reconciliation with current code: today `NodeRow` = `key(16) | edges(16) |
55+
> value(480)` with `value` **opaque**. The `[Facet; 32]` / `tenant_schema` is
56+
> the typed schema this ADR imposes on those same bytes — `ClassView` is the
57+
> missing brick that turns the 480-byte slab into 30 typed tenant slots.
58+
59+
### 1.3 Capacity is the SoC lint, not a limit
60+
`>64 fields` · `>256/tier` · `>6 deep` · `>4 edges` · `>30 slots` → the class
61+
lacks separation of concerns. **The encoding makes good SoC the only
62+
representable shape**: overflow in any dimension is the signal; "reference
63+
another class" (grow a limb) is always the fix. We own OGAR, so minting the
64+
new limb is free and convergence keeps it shared. Detector and refactor are
65+
the same mechanism. (The law is already written as a falsifier in
66+
`ruff_spo_address/examples/medcare_probe.rs` §[G]; promote it to a `ruff`
67+
diagnostic.)
68+
69+
---
70+
71+
## 2 · The transpiler (the superpower)
72+
73+
The IR (`ruff_spo_triplet::ModelGraph`) is already bidirectional
74+
(`expand``reassemble`), and `ruff_cpp_codegen` already proves
75+
`ModelGraph → Rust source`. Generalize that one backend into an adapter family:
76+
77+
```
78+
SOURCE (py/cpp/cs) ─ruff_*_spo─▶ ModelGraph ─mint─▶ Facet (content address, dedup across langs)
79+
80+
TARGET (py/rust/cs) ◀─LangBackend─── ModelGraph
81+
```
82+
- `LangBackend { fn render(&self, &ModelGraph) -> String }` — one adapter per
83+
target, peers of the DDL adapters.
84+
- Rust ◀ `ruff_cpp_codegen` (exists) · Python ◀ extend `ruff_python_codegen`
85+
(the formatter's generator) · C# ◀ new `ruff_csharp_codegen`.
86+
- Content-addressing gives **cross-language dedup**: the same construct in
87+
Python/C++/C# mints the same `Facet` (CI convergence test).
88+
89+
### Honest boundary — structure transpiles, behaviour does not
90+
`OGAR-AS-IR.md`: "the behavioural arm cannot survive lowering and stays in the
91+
IR." The existing backend renders `MethodSig` *signatures*, not method bodies.
92+
So the deliverable is a **schema / interface / DTO / ORM-model transpiler**
93+
(API contracts, type defs, model shells) — enormous on its own. Full behaviour
94+
transpilation (method bodies → executable logic) is a later arm via
95+
`ActionDef` / `KausalSpec`, explicitly out of this ADR.
96+
97+
---
98+
99+
## 3 · Consequences
100+
101+
- **Positive:** one content-addressed spine; SurrealQL/DDL become honest peer
102+
adapters (closes the trap); the SPO corpus and the facet primitive unify;
103+
capacity-as-lint is enforced structurally; codegen-via-adapter gives polyglot
104+
re-export; cross-app/cross-language dedup for free.
105+
- **Costs / risks:** (1) the rail address is **lossy** — it is a CAM *key*, not
106+
the content; the lossless shape lives in `ClassView` + the value tenants.
107+
(2) **Minting governance** — a content address is only stable if the
108+
rank-minter is frozen; the cross-language convergence test must be
109+
CI-enforced *before* scaling. (3) **"Everything in OGAR" = OGAR is the fleet
110+
bottleneck** — the zero-dep contract crate must be the only stable surface;
111+
the `#[deprecated]` `*Bridge` churn already shows the strain.
112+
- **Scale honesty:** the substrate is ~11 K nodes / ~24 K triples today, not
113+
the aspirational 2 M; the `ruff_spo_triplet` per-language pipeline is the
114+
lever that scales it.
115+
116+
## 4 · Status of the pieces (verified `main`)
117+
Real: `ModelGraph` interlingua (bidirectional), `ruff_cpp_spo` / `ruff_ruby_spo`
118+
frontends, `ruff_csharp_spo` loader, the 16-byte mint (`ruff_spo_address`), one
119+
backend (`ruff_cpp_codegen` → Rust), `bridge_codebook_convergence` (identity).
120+
To build: Python→ModelGraph normalization, C# harvester generalization, the
121+
`LangBackend` trait + Python/C# backends, the dual-mode `FacetMode`, the
122+
`tenant_schema`, the round-trip + convergence CI, the `OGAR-SOC` lint.
123+
124+
## 5 · Companion
125+
Implementation plan: `ruff` PR "OGAR Polyglot AST Integration (RFC)".

0 commit comments

Comments
 (0)