Skip to content

Commit 7b7b666

Browse files
feat: rebrand to BetweenRows, auto-persist secrets, and restructure docs
Rename QueryProxy → BetweenRows across UI, CLI, Dockerfile, and config. Add auto-persisted encryption key and JWT secret (env var → file → generate) with data directory detection and persistence warnings. Rewrite README as user-facing quickstart, move dev/architecture details to CONTRIBUTING.md. Add startup banner, suppress SQLx logging below DEBUG, support linux/aarch64 for javy, add compose.quickstart.yaml, and document governance workflows in roadmap.
1 parent 182a4b2 commit 7b7b666

17 files changed

Lines changed: 1384 additions & 537 deletions

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@ target/
2929
.env.test.local
3030
.env.production.local
3131

32+
# Auto-persisted state (secrets, markers)
33+
.betweenrows/
34+
3235
# local files
3336
mydocs/
3437
pythonpoc/

CONTRIBUTING.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# Development Guide
2+
3+
## Build from Source
4+
5+
```bash
6+
# Proxy (Rust)
7+
cargo build -p proxy
8+
cargo test -p proxy
9+
10+
# Admin UI (React)
11+
cd admin-ui && npm install && npm run dev
12+
# → http://localhost:5173
13+
14+
# Production UI bundle
15+
cd admin-ui && npm run build
16+
```
17+
18+
Hot reload:
19+
```bash
20+
cargo watch -x "run -p proxy"
21+
```
22+
23+
## Pre-commit Hook
24+
25+
`.githooks/pre-commit` runs `cargo fmt --check`, `cargo clippy`, and `admin-ui` tests. Enable once per clone:
26+
27+
```bash
28+
git config core.hooksPath .githooks
29+
```
30+
31+
## Project Structure
32+
33+
```
34+
betweenrows/
35+
├── Cargo.toml workspace root (proxy, migration crates)
36+
├── migration/src/ SeaORM migrations (41 total)
37+
├── docs/ User-facing documentation
38+
│ ├── permission-system.md Policy system user guide
39+
│ ├── security-vectors.md Security attack vectors & test plan
40+
│ ├── permission-stories.md Detailed permission use cases
41+
│ └── roadmap.md Project roadmap and backlog
42+
├── scripts/demo_ecommerce/ Demo schema + seed data
43+
├── admin-ui/ React admin console
44+
│ └── src/
45+
│ ├── api/ axios + fetch-event-source clients
46+
│ ├── auth/ AuthContext, ProtectedRoute, LoginPage
47+
│ ├── components/ Layout, DataSourceForm, CatalogDiscoveryWizard,
48+
│ │ PolicyForm, PolicyAssignmentPanel, RoleForm,
49+
│ │ RoleMemberPanel, RoleInheritancePanel, AuditTimeline, …
50+
│ ├── pages/ Users*, DataSources*, DataSourceCatalogPage,
51+
│ │ Policies*, Roles*, QueryAuditPage
52+
│ └── types/ TypeScript interfaces
53+
└── proxy/src/
54+
├── main.rs entry point: CLI, DB init, EngineCache, servers
55+
├── server.rs process_socket_with_idle_timeout (idle + startup timeouts)
56+
├── handler.rs pgwire StartupHandler + query handlers
57+
├── auth.rs Argon2 auth, user creation
58+
├── crypto.rs AES-256-GCM encrypt/decrypt
59+
├── admin/ REST API: mod, dto, jwt, handlers, discovery_job,
60+
│ policy_handlers, role_handlers, audit_handlers,
61+
│ admin_audit
62+
├── discovery/ DiscoveryProvider trait + Postgres impl
63+
├── entity/ SeaORM entities (proxy_user, data_source, role,
64+
│ role_member, role_inheritance, data_source_access,
65+
│ policy, policy_assignment, policy_version,
66+
│ admin_audit_log, query_audit_log, …)
67+
├── role_resolver.rs BFS role resolution, cycle detection, effective assignments
68+
├── engine/mod.rs EngineCache, VirtualCatalogProvider, build_arrow_schema()
69+
└── hooks/ QueryHook trait, ReadOnlyHook, PolicyHook
70+
```
71+
72+
## Architecture
73+
74+
```
75+
psql / app
76+
↓ PostgreSQL wire protocol (port 5434)
77+
BetweenRows (Rust)
78+
├─ Authenticates user (Argon2id)
79+
├─ Checks data source access (data_source_access table — direct, role-based, or all)
80+
├─ Runs query hook pipeline:
81+
│ ReadOnlyHook — blocks writes (SQLSTATE 25006)
82+
│ PolicyHook — row filters, column masks, column access control
83+
└─ Executes via DataFusion + tokio-postgres federation
84+
85+
Upstream PostgreSQL
86+
```
87+
88+
## Tech Stack
89+
90+
| Layer | Library | Version |
91+
|---|---|---|
92+
| Protocol | pgwire | 0.38 |
93+
| Query engine | DataFusion | 52 |
94+
| PG federation | datafusion-table-providers | 0.10 |
95+
| Async runtime | Tokio | 1 |
96+
| Admin store | SeaORM + SQLite/PG | 1 |
97+
| Password hashing | argon2 (Argon2id) | 0.5 |
98+
| Secret encryption | aes-gcm (AES-256-GCM) | 0.10 |
99+
| Admin REST API | axum + tower-http | 0.8 / 0.6 |
100+
| Admin auth | jsonwebtoken (HMAC-SHA256) | 9 |
101+
| CLI | clap | 4 |
102+
| Admin UI | React 19 + Vite 6 + Tailwind 4 + TanStack Query 5 ||
103+
104+
## Security
105+
106+
### Access Control Architecture
107+
108+
Access control is enforced **before** any query reaches the engine:
109+
110+
1. `validate_data_source()` — datasource must exist and be active
111+
2. `check_access(user_id, datasource_name)` — user must have access via `data_source_access` (direct, role-based, or all-scoped)
112+
3. If either check fails → `FATAL` PG error, connection rejected before `get_ctx()` is ever called
113+
114+
### Why the Shared Pool Is Safe
115+
116+
The upstream connection pool carries **no user identity** — it is pure TCP connectivity to the upstream Postgres server. All identity and access decisions are made at the pgwire auth layer (steps 1–2 above), not at the pool layer.
117+
118+
Per-user isolation is enforced by:
119+
- **Data plane**`data_source_access` allowlist (no matching row → connection rejected). Access can be granted directly to a user, via role membership (including inherited roles), or to all users.
120+
- **Policy hook** — per-query row filters, column masks, and access controls injected via DataFusion's logical plan tree, based on the authenticated user's policy assignments (direct, role-based, or wildcard)
121+
- **Virtual catalog** — the stored catalog is an allowlist; tables/columns not explicitly saved are invisible to the engine
122+
123+
The shared pool is safe for all authorized users of a datasource: Pool = "how to talk to upstream". Auth + RLS = "what this user can see". These are orthogonal.
124+
125+
### Policy Enforcement Resistance
126+
127+
`PolicyHook` injects row filters and column transforms at the DataFusion logical plan level via `transform_up`. The filter is applied below the `TableScan` node — it cannot be bypassed by table aliases, CTEs, or subqueries, since DataFusion inlines those into the plan before transformation.
128+
129+
Template variable substitution (`{user.tenant}`, etc.) uses parse-then-substitute: the filter expression is parsed into a `DataFusion Expr` tree first, then placeholder identifiers are replaced with typed `Expr::Literal` values. The user's tenant/username never passes through the SQL parser, preventing injection even if the value contains SQL syntax.
130+
131+
### Permissions Model
132+
133+
BetweenRows enforces a two-layer access control model:
134+
135+
**Management plane** — controlled by `is_admin` flag. Admins manage users, data sources, policies, and catalogs via the Admin API. Non-admins have no Admin API access.
136+
137+
**Data plane** — controlled by two independent mechanisms:
138+
1. *Connection access*`data_source_access` entries. A user can connect to a datasource via direct assignment, role membership (including inherited roles), or all-user scope. Being an admin does **not** automatically grant data plane access.
139+
2. *Query policy*`PolicyHook` applies row filters, column masks, and column access controls per-query based on assigned policies (direct, role-based, or all-scoped). If the datasource `access_mode` is `"policy_required"`, tables with no matching permit policy return empty results. Policies can reference built-in identity fields (`{user.tenant}`, `{user.username}`, `{user.id}`) and custom user attributes (`{user.KEY}`) for attribute-based access control (ABAC). Optional decision functions (JavaScript/WASM) provide programmable policy gates.
140+
141+
See `docs/permission-system.md` for the full policy system user guide.
142+
143+
## Performance
144+
145+
### Arrow Type Alignment (query time)
146+
147+
During catalog discovery, column types are captured using `datafusion-table-providers`' own `get_schema()` function rather than a manual PG-to-Arrow mapping. This guarantees that the stored schema matches exactly what the library produces at query time.
148+
149+
**Why it matters:** an earlier hand-written `pg_type_to_arrow()` mapped `numeric``Decimal128(38,10)` and `timestamp``Timestamp(Microsecond)`, but the library internally uses `Decimal128(38,20)` and `Timestamp(Nanosecond)`. The mismatch triggered a full schema-cast on every result batch, adding 12–23 s to queries returning ~2 k rows. With `get_schema()`, stored types and runtime types are identical — no cast overhead.
150+
151+
**Do not** replace this with a manual PG type map. If new PG types need support, add them to `parse_arrow_type()` / `arrow_type_to_string()` in `engine/mod.rs` alongside a round-trip test.
152+
153+
### Lazy Connection Pool
154+
155+
The upstream Postgres connection pool (`LazyPool` in `engine/mod.rs`) is **not** created when a client connects — it is created on the first query that touches a user table. Catalog queries (`pg_catalog`, `information_schema`) work instantly without an upstream connection.
156+
157+
This means:
158+
- TablePlus / psql sidebar population (all `pg_catalog` queries) is instant.
159+
- Clients that never issue user-table queries pay zero upstream connection cost.
160+
161+
**Do not** move pool creation back into `create_session_context_from_catalog()` or `EngineCache::get_context()`.
162+
163+
### Shared Pool Across Context Rebuilds
164+
165+
`EngineCache` stores one `Arc<LazyPool>` per datasource in a separate `pools` map. `invalidate(name)` (called after catalog re-discovery) removes only the `SessionContext`, keeping the pool. The next `get_context()` call reuses the existing pool rather than creating a new one.
166+
167+
`invalidate_all(name)` (called after datasource connection params are edited or the datasource is deleted) removes both the `SessionContext` and the pool.
168+
169+
**Do not** call `invalidate_all` after catalog operations. **Do not** call plain `invalidate` after datasource edit/delete — the pool would be stale.
170+
171+
### Idle Connection Timeout
172+
173+
pgwire 0.38 has no built-in idle timeout — `socket.next().await` blocks indefinitely after authentication. This prevents Fly.io `auto_stop_machines` from ever triggering when a GUI client like TablePlus is open, because the VM only stops when it has zero connections.
174+
175+
`proxy/src/server.rs` replaces pgwire's `process_socket` with a custom message loop (`process_socket_with_idle_timeout`) that adds a `tokio::select!` branch racing each `socket.next()` against a `sleep(idle_timeout)`. The timer resets after every received message — a running query does not count as idle.
176+
177+
Default timeout is 15 minutes (`BR_IDLE_TIMEOUT_SECS=900`). TCP keepalive (60 s time, 10 s interval) is also set on each accepted socket to detect dead connections from crashed clients or network failures.
178+
179+
### Background Warmup
180+
181+
After authentication succeeds in `handler.rs`, a background task pre-builds the `SessionContext` (DB queries to load the stored catalog) and eagerly initialises the `LazyPool`. This amortises first-query latency during the window between the client's auth handshake and its first query.
182+
183+
### Performance Regression Testing
184+
185+
There is currently no automated performance regression suite. Meaningful regression detection requires integration-level tests against a real Postgres instance that can verify filter pushdown is still active, connection pool reuse is intact, and end-to-end query latency stays within bounds. This is planned for a future iteration.
186+
187+
## Data Model
188+
189+
All primary keys are UUIDs. The admin store uses SQLite by default (configurable via `DATABASE_URL`).
190+
191+
```
192+
proxy_user (id UUID, username, password_hash, tenant, is_admin, is_active, …)
193+
data_source (id UUID, name, ds_type, config JSON, secure_config encrypted,
194+
is_active, access_mode, last_sync_at, last_sync_result, …)
195+
data_source_access (id UUID, user_id?, role_id?, data_source_id, assignment_scope, …)
196+
role (id UUID, name UNIQUE, description, is_active, …)
197+
role_member (id UUID, role_id → role, user_id → proxy_user)
198+
role_inheritance (id UUID, parent_role_id → role, child_role_id → role)
199+
discovered_schema (id UUID v5, data_source_id, schema_name, is_selected)
200+
discovered_table (id UUID v5, discovered_schema_id, table_name, table_type, is_selected)
201+
discovered_column (id UUID v5, discovered_table_id, column_name, ordinal_position,
202+
data_type, is_nullable, column_default, arrow_type)
203+
204+
policy (id UUID v7, name, description, policy_type, is_enabled, version, targets JSON, definition JSON, …)
205+
policy_version (id UUID v7, policy_id, version, snapshot JSON, change_type, changed_by)
206+
policy_assignment (id UUID v7, policy_id, data_source_id, user_id?, role_id?,
207+
assignment_scope, priority)
208+
admin_audit_log (id UUID v7, resource_type, resource_id, action, actor_id, changes JSON, created_at)
209+
query_audit_log (id UUID v7, user_id, username, data_source_id, datasource_name,
210+
original_query, rewritten_query, policies_applied JSON,
211+
execution_time_ms, client_ip, client_info, created_at)
212+
```
213+
214+
Catalog entity IDs (schemas, tables, columns) are deterministic UUID v5 fingerprints derived from their natural keys. Re-discovering the same upstream object always produces the same ID, so re-syncs are safe upserts.
215+
216+
## Docker (Development)
217+
218+
```bash
219+
docker compose up # dev (hot reload)
220+
docker compose -f compose.yaml -f compose.prod.yaml up --build # prod
221+
```

Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ COPY --from=ui-builder /app/admin-ui/dist /usr/local/share/admin-ui
7171

7272
ENV BR_PROXY_BIND_ADDR=0.0.0.0:5434
7373
ENV BR_ADMIN_BIND_ADDR=0.0.0.0:5435
74+
ENV BR_ADMIN_DATABASE_URL=sqlite:///data/proxy_admin.db?mode=rwc
7475

7576
EXPOSE 5434
7677
EXPOSE 5435

0 commit comments

Comments
 (0)