|
| 1 | +# Development Guide |
| 2 | + |
| 3 | +## Build from Source |
| 4 | + |
| 5 | +```bash |
| 6 | +# Proxy (Rust) |
| 7 | +cargo build -p proxy |
| 8 | +cargo test -p proxy |
| 9 | + |
| 10 | +# Admin UI (React) |
| 11 | +cd admin-ui && npm install && npm run dev |
| 12 | +# → http://localhost:5173 |
| 13 | + |
| 14 | +# Production UI bundle |
| 15 | +cd admin-ui && npm run build |
| 16 | +``` |
| 17 | + |
| 18 | +Hot reload: |
| 19 | +```bash |
| 20 | +cargo watch -x "run -p proxy" |
| 21 | +``` |
| 22 | + |
| 23 | +## Pre-commit Hook |
| 24 | + |
| 25 | +`.githooks/pre-commit` runs `cargo fmt --check`, `cargo clippy`, and `admin-ui` tests. Enable once per clone: |
| 26 | + |
| 27 | +```bash |
| 28 | +git config core.hooksPath .githooks |
| 29 | +``` |
| 30 | + |
| 31 | +## Project Structure |
| 32 | + |
| 33 | +``` |
| 34 | +betweenrows/ |
| 35 | +├── Cargo.toml workspace root (proxy, migration crates) |
| 36 | +├── migration/src/ SeaORM migrations (41 total) |
| 37 | +├── docs/ User-facing documentation |
| 38 | +│ ├── permission-system.md Policy system user guide |
| 39 | +│ ├── security-vectors.md Security attack vectors & test plan |
| 40 | +│ ├── permission-stories.md Detailed permission use cases |
| 41 | +│ └── roadmap.md Project roadmap and backlog |
| 42 | +├── scripts/demo_ecommerce/ Demo schema + seed data |
| 43 | +├── admin-ui/ React admin console |
| 44 | +│ └── src/ |
| 45 | +│ ├── api/ axios + fetch-event-source clients |
| 46 | +│ ├── auth/ AuthContext, ProtectedRoute, LoginPage |
| 47 | +│ ├── components/ Layout, DataSourceForm, CatalogDiscoveryWizard, |
| 48 | +│ │ PolicyForm, PolicyAssignmentPanel, RoleForm, |
| 49 | +│ │ RoleMemberPanel, RoleInheritancePanel, AuditTimeline, … |
| 50 | +│ ├── pages/ Users*, DataSources*, DataSourceCatalogPage, |
| 51 | +│ │ Policies*, Roles*, QueryAuditPage |
| 52 | +│ └── types/ TypeScript interfaces |
| 53 | +└── proxy/src/ |
| 54 | + ├── main.rs entry point: CLI, DB init, EngineCache, servers |
| 55 | + ├── server.rs process_socket_with_idle_timeout (idle + startup timeouts) |
| 56 | + ├── handler.rs pgwire StartupHandler + query handlers |
| 57 | + ├── auth.rs Argon2 auth, user creation |
| 58 | + ├── crypto.rs AES-256-GCM encrypt/decrypt |
| 59 | + ├── admin/ REST API: mod, dto, jwt, handlers, discovery_job, |
| 60 | + │ policy_handlers, role_handlers, audit_handlers, |
| 61 | + │ admin_audit |
| 62 | + ├── discovery/ DiscoveryProvider trait + Postgres impl |
| 63 | + ├── entity/ SeaORM entities (proxy_user, data_source, role, |
| 64 | + │ role_member, role_inheritance, data_source_access, |
| 65 | + │ policy, policy_assignment, policy_version, |
| 66 | + │ admin_audit_log, query_audit_log, …) |
| 67 | + ├── role_resolver.rs BFS role resolution, cycle detection, effective assignments |
| 68 | + ├── engine/mod.rs EngineCache, VirtualCatalogProvider, build_arrow_schema() |
| 69 | + └── hooks/ QueryHook trait, ReadOnlyHook, PolicyHook |
| 70 | +``` |
| 71 | + |
| 72 | +## Architecture |
| 73 | + |
| 74 | +``` |
| 75 | +psql / app |
| 76 | + ↓ PostgreSQL wire protocol (port 5434) |
| 77 | +BetweenRows (Rust) |
| 78 | + ├─ Authenticates user (Argon2id) |
| 79 | + ├─ Checks data source access (data_source_access table — direct, role-based, or all) |
| 80 | + ├─ Runs query hook pipeline: |
| 81 | + │ ReadOnlyHook — blocks writes (SQLSTATE 25006) |
| 82 | + │ PolicyHook — row filters, column masks, column access control |
| 83 | + └─ Executes via DataFusion + tokio-postgres federation |
| 84 | + ↓ |
| 85 | +Upstream PostgreSQL |
| 86 | +``` |
| 87 | + |
| 88 | +## Tech Stack |
| 89 | + |
| 90 | +| Layer | Library | Version | |
| 91 | +|---|---|---| |
| 92 | +| Protocol | pgwire | 0.38 | |
| 93 | +| Query engine | DataFusion | 52 | |
| 94 | +| PG federation | datafusion-table-providers | 0.10 | |
| 95 | +| Async runtime | Tokio | 1 | |
| 96 | +| Admin store | SeaORM + SQLite/PG | 1 | |
| 97 | +| Password hashing | argon2 (Argon2id) | 0.5 | |
| 98 | +| Secret encryption | aes-gcm (AES-256-GCM) | 0.10 | |
| 99 | +| Admin REST API | axum + tower-http | 0.8 / 0.6 | |
| 100 | +| Admin auth | jsonwebtoken (HMAC-SHA256) | 9 | |
| 101 | +| CLI | clap | 4 | |
| 102 | +| Admin UI | React 19 + Vite 6 + Tailwind 4 + TanStack Query 5 | — | |
| 103 | + |
| 104 | +## Security |
| 105 | + |
| 106 | +### Access Control Architecture |
| 107 | + |
| 108 | +Access control is enforced **before** any query reaches the engine: |
| 109 | + |
| 110 | +1. `validate_data_source()` — datasource must exist and be active |
| 111 | +2. `check_access(user_id, datasource_name)` — user must have access via `data_source_access` (direct, role-based, or all-scoped) |
| 112 | +3. If either check fails → `FATAL` PG error, connection rejected before `get_ctx()` is ever called |
| 113 | + |
| 114 | +### Why the Shared Pool Is Safe |
| 115 | + |
| 116 | +The upstream connection pool carries **no user identity** — it is pure TCP connectivity to the upstream Postgres server. All identity and access decisions are made at the pgwire auth layer (steps 1–2 above), not at the pool layer. |
| 117 | + |
| 118 | +Per-user isolation is enforced by: |
| 119 | +- **Data plane** — `data_source_access` allowlist (no matching row → connection rejected). Access can be granted directly to a user, via role membership (including inherited roles), or to all users. |
| 120 | +- **Policy hook** — per-query row filters, column masks, and access controls injected via DataFusion's logical plan tree, based on the authenticated user's policy assignments (direct, role-based, or wildcard) |
| 121 | +- **Virtual catalog** — the stored catalog is an allowlist; tables/columns not explicitly saved are invisible to the engine |
| 122 | + |
| 123 | +The shared pool is safe for all authorized users of a datasource: Pool = "how to talk to upstream". Auth + RLS = "what this user can see". These are orthogonal. |
| 124 | + |
| 125 | +### Policy Enforcement Resistance |
| 126 | + |
| 127 | +`PolicyHook` injects row filters and column transforms at the DataFusion logical plan level via `transform_up`. The filter is applied below the `TableScan` node — it cannot be bypassed by table aliases, CTEs, or subqueries, since DataFusion inlines those into the plan before transformation. |
| 128 | + |
| 129 | +Template variable substitution (`{user.tenant}`, etc.) uses parse-then-substitute: the filter expression is parsed into a `DataFusion Expr` tree first, then placeholder identifiers are replaced with typed `Expr::Literal` values. The user's tenant/username never passes through the SQL parser, preventing injection even if the value contains SQL syntax. |
| 130 | + |
| 131 | +### Permissions Model |
| 132 | + |
| 133 | +BetweenRows enforces a two-layer access control model: |
| 134 | + |
| 135 | +**Management plane** — controlled by `is_admin` flag. Admins manage users, data sources, policies, and catalogs via the Admin API. Non-admins have no Admin API access. |
| 136 | + |
| 137 | +**Data plane** — controlled by two independent mechanisms: |
| 138 | +1. *Connection access* — `data_source_access` entries. A user can connect to a datasource via direct assignment, role membership (including inherited roles), or all-user scope. Being an admin does **not** automatically grant data plane access. |
| 139 | +2. *Query policy* — `PolicyHook` applies row filters, column masks, and column access controls per-query based on assigned policies (direct, role-based, or all-scoped). If the datasource `access_mode` is `"policy_required"`, tables with no matching permit policy return empty results. Policies can reference built-in identity fields (`{user.tenant}`, `{user.username}`, `{user.id}`) and custom user attributes (`{user.KEY}`) for attribute-based access control (ABAC). Optional decision functions (JavaScript/WASM) provide programmable policy gates. |
| 140 | + |
| 141 | +See `docs/permission-system.md` for the full policy system user guide. |
| 142 | + |
| 143 | +## Performance |
| 144 | + |
| 145 | +### Arrow Type Alignment (query time) |
| 146 | + |
| 147 | +During catalog discovery, column types are captured using `datafusion-table-providers`' own `get_schema()` function rather than a manual PG-to-Arrow mapping. This guarantees that the stored schema matches exactly what the library produces at query time. |
| 148 | + |
| 149 | +**Why it matters:** an earlier hand-written `pg_type_to_arrow()` mapped `numeric` → `Decimal128(38,10)` and `timestamp` → `Timestamp(Microsecond)`, but the library internally uses `Decimal128(38,20)` and `Timestamp(Nanosecond)`. The mismatch triggered a full schema-cast on every result batch, adding 12–23 s to queries returning ~2 k rows. With `get_schema()`, stored types and runtime types are identical — no cast overhead. |
| 150 | + |
| 151 | +**Do not** replace this with a manual PG type map. If new PG types need support, add them to `parse_arrow_type()` / `arrow_type_to_string()` in `engine/mod.rs` alongside a round-trip test. |
| 152 | + |
| 153 | +### Lazy Connection Pool |
| 154 | + |
| 155 | +The upstream Postgres connection pool (`LazyPool` in `engine/mod.rs`) is **not** created when a client connects — it is created on the first query that touches a user table. Catalog queries (`pg_catalog`, `information_schema`) work instantly without an upstream connection. |
| 156 | + |
| 157 | +This means: |
| 158 | +- TablePlus / psql sidebar population (all `pg_catalog` queries) is instant. |
| 159 | +- Clients that never issue user-table queries pay zero upstream connection cost. |
| 160 | + |
| 161 | +**Do not** move pool creation back into `create_session_context_from_catalog()` or `EngineCache::get_context()`. |
| 162 | + |
| 163 | +### Shared Pool Across Context Rebuilds |
| 164 | + |
| 165 | +`EngineCache` stores one `Arc<LazyPool>` per datasource in a separate `pools` map. `invalidate(name)` (called after catalog re-discovery) removes only the `SessionContext`, keeping the pool. The next `get_context()` call reuses the existing pool rather than creating a new one. |
| 166 | + |
| 167 | +`invalidate_all(name)` (called after datasource connection params are edited or the datasource is deleted) removes both the `SessionContext` and the pool. |
| 168 | + |
| 169 | +**Do not** call `invalidate_all` after catalog operations. **Do not** call plain `invalidate` after datasource edit/delete — the pool would be stale. |
| 170 | + |
| 171 | +### Idle Connection Timeout |
| 172 | + |
| 173 | +pgwire 0.38 has no built-in idle timeout — `socket.next().await` blocks indefinitely after authentication. This prevents Fly.io `auto_stop_machines` from ever triggering when a GUI client like TablePlus is open, because the VM only stops when it has zero connections. |
| 174 | + |
| 175 | +`proxy/src/server.rs` replaces pgwire's `process_socket` with a custom message loop (`process_socket_with_idle_timeout`) that adds a `tokio::select!` branch racing each `socket.next()` against a `sleep(idle_timeout)`. The timer resets after every received message — a running query does not count as idle. |
| 176 | + |
| 177 | +Default timeout is 15 minutes (`BR_IDLE_TIMEOUT_SECS=900`). TCP keepalive (60 s time, 10 s interval) is also set on each accepted socket to detect dead connections from crashed clients or network failures. |
| 178 | + |
| 179 | +### Background Warmup |
| 180 | + |
| 181 | +After authentication succeeds in `handler.rs`, a background task pre-builds the `SessionContext` (DB queries to load the stored catalog) and eagerly initialises the `LazyPool`. This amortises first-query latency during the window between the client's auth handshake and its first query. |
| 182 | + |
| 183 | +### Performance Regression Testing |
| 184 | + |
| 185 | +There is currently no automated performance regression suite. Meaningful regression detection requires integration-level tests against a real Postgres instance that can verify filter pushdown is still active, connection pool reuse is intact, and end-to-end query latency stays within bounds. This is planned for a future iteration. |
| 186 | + |
| 187 | +## Data Model |
| 188 | + |
| 189 | +All primary keys are UUIDs. The admin store uses SQLite by default (configurable via `DATABASE_URL`). |
| 190 | + |
| 191 | +``` |
| 192 | +proxy_user (id UUID, username, password_hash, tenant, is_admin, is_active, …) |
| 193 | +data_source (id UUID, name, ds_type, config JSON, secure_config encrypted, |
| 194 | + is_active, access_mode, last_sync_at, last_sync_result, …) |
| 195 | +data_source_access (id UUID, user_id?, role_id?, data_source_id, assignment_scope, …) |
| 196 | +role (id UUID, name UNIQUE, description, is_active, …) |
| 197 | +role_member (id UUID, role_id → role, user_id → proxy_user) |
| 198 | +role_inheritance (id UUID, parent_role_id → role, child_role_id → role) |
| 199 | +discovered_schema (id UUID v5, data_source_id, schema_name, is_selected) |
| 200 | +discovered_table (id UUID v5, discovered_schema_id, table_name, table_type, is_selected) |
| 201 | +discovered_column (id UUID v5, discovered_table_id, column_name, ordinal_position, |
| 202 | + data_type, is_nullable, column_default, arrow_type) |
| 203 | +
|
| 204 | +policy (id UUID v7, name, description, policy_type, is_enabled, version, targets JSON, definition JSON, …) |
| 205 | +policy_version (id UUID v7, policy_id, version, snapshot JSON, change_type, changed_by) |
| 206 | +policy_assignment (id UUID v7, policy_id, data_source_id, user_id?, role_id?, |
| 207 | + assignment_scope, priority) |
| 208 | +admin_audit_log (id UUID v7, resource_type, resource_id, action, actor_id, changes JSON, created_at) |
| 209 | +query_audit_log (id UUID v7, user_id, username, data_source_id, datasource_name, |
| 210 | + original_query, rewritten_query, policies_applied JSON, |
| 211 | + execution_time_ms, client_ip, client_info, created_at) |
| 212 | +``` |
| 213 | + |
| 214 | +Catalog entity IDs (schemas, tables, columns) are deterministic UUID v5 fingerprints derived from their natural keys. Re-discovering the same upstream object always produces the same ID, so re-syncs are safe upserts. |
| 215 | + |
| 216 | +## Docker (Development) |
| 217 | + |
| 218 | +```bash |
| 219 | +docker compose up # dev (hot reload) |
| 220 | +docker compose -f compose.yaml -f compose.prod.yaml up --build # prod |
| 221 | +``` |
0 commit comments