Skip to content

Commit 12b3ace

Browse files
committed
docs(plan): add GraphQL-based pgpm export feature plan
1 parent 5948811 commit 12b3ace

1 file changed

Lines changed: 351 additions & 0 deletions

File tree

docs/plan/graphql-export.md

Lines changed: 351 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,351 @@
1+
# Plan: GraphQL-Based `pgpm export`
2+
3+
## Problem
4+
5+
The current `pgpm export` command fetches all data via direct SQL queries against the PostgreSQL database (using `pg-cache` / `pg` pool). Customers who use Constructive's hosted platform do **not** have direct SQL access to their databases — they only have access to the **GraphQL API** served by the Constructive server (`graphql/server`).
6+
7+
This means customers cannot run `pgpm export` today because the command requires a direct database connection.
8+
9+
## Goal
10+
11+
Create an alternative data-fetching backend for `pgpm export` that uses **GraphQL queries** instead of raw SQL, so the export can run against a customer's GraphQL endpoint (e.g. `https://api.example.com/graphql`).
12+
13+
The developer should be able to run something like:
14+
15+
```bash
16+
pgpm export --graphql-endpoint https://api.example.com/graphql --token <auth-token>
17+
```
18+
19+
---
20+
21+
## Current Architecture
22+
23+
### Export Flow Overview
24+
25+
The export command has three layers:
26+
27+
1. **CLI layer**`pgpm/cli/src/commands/export.ts`
28+
2. **Core orchestrator**`pgpm/core/src/export/export-migrations.ts`
29+
3. **Meta exporter**`pgpm/core/src/export/export-meta.ts`
30+
31+
### SQL Queries Used Today
32+
33+
#### CLI layer (`pgpm/cli/src/commands/export.ts`)
34+
35+
| # | Query | Purpose |
36+
|---|-------|---------|
37+
| 1 | `SELECT datname FROM pg_catalog.pg_database ...` | List available Postgres databases |
38+
| 2 | `SELECT id, name FROM metaschema_public.database` | List database entries in metaschema |
39+
| 3 | `SELECT * FROM metaschema_public.schema WHERE database_id = $1` | List schemas for a database |
40+
41+
#### Core orchestrator (`pgpm/core/src/export/export-migrations.ts`)
42+
43+
| # | Query | Purpose |
44+
|---|-------|---------|
45+
| 4 | `SELECT * FROM metaschema_public.database WHERE id = $1` | Get database record |
46+
| 5 | `SELECT * FROM metaschema_public.schema WHERE database_id = $1` | Get schemas |
47+
| 6 | `SELECT * FROM db_migrate.sql_actions WHERE database_id = $1 ORDER BY id` | Get migration SQL actions |
48+
49+
#### Meta exporter (`pgpm/core/src/export/export-meta.ts`)
50+
51+
~50+ queries of the form:
52+
53+
```sql
54+
SELECT * FROM <schema>.<table> WHERE database_id = $1
55+
```
56+
57+
Across three schemas:
58+
- **`metaschema_public`**`database`, `schema`, `table`, `field`, `policy`, `index`, `trigger`, `trigger_function`, `rls_function`, `limit_function`, `procedure`, `foreign_key_constraint`, `primary_key_constraint`, `unique_constraint`, `check_constraint`, `full_text_search`, `schema_grant`, `table_grant`
59+
- **`services_public`**`domains`, `sites`, `apis`, `apps`, `site_modules`, `site_themes`, `site_metadata`, `api_modules`, `api_extensions`, `api_schemas`
60+
- **`metaschema_modules_public`**`rls_module`, `user_auth_module`, `memberships_module`, `permissions_module`, `limits_module`, `levels_module`, `users_module`, `hierarchy_module`, `membership_types_module`, `invites_module`, `emails_module`, `sessions_module`, `secrets_module`, `profiles_module`, `encrypted_secrets_module`, `connected_accounts_module`, `phone_numbers_module`, `crypto_addresses_module`, `crypto_auth_module`, `field_module`, `table_template_module`, `secure_table_provision`, `uuid_module`, `default_ids_module`, `denormalized_table_field`
61+
62+
Additionally, `export-meta.ts` queries `information_schema.columns` to dynamically discover table columns for each table (via `getTableColumns()`).
63+
64+
---
65+
66+
## GraphQL Availability Analysis
67+
68+
The Constructive GraphQL server (PostGraphile v5) auto-generates a GraphQL schema from the exposed Postgres schemas. The schemas exposed are configured per-API via `services_public.api_schemas`.
69+
70+
### Tables accessible via GraphQL
71+
72+
All tables in these schemas are exposed through PostGraphile and will have auto-generated GraphQL queries:
73+
74+
- `metaschema_public.*` — All 18+ tables
75+
- `services_public.*` — All 10+ tables
76+
- `metaschema_modules_public.*` — All 25+ modules tables
77+
78+
For each table, PostGraphile generates:
79+
- `all<TableName>s` (connection query with filtering, pagination)
80+
- `<tableName>ById` (single-row lookup by primary key)
81+
82+
For example, `metaschema_public.schema` becomes:
83+
```graphql
84+
query {
85+
allSchemas(condition: { databaseId: "..." }) {
86+
nodes {
87+
id
88+
databaseId
89+
name
90+
schemaName
91+
description
92+
isPublic
93+
}
94+
}
95+
}
96+
```
97+
98+
### Tables/queries NOT accessible via GraphQL
99+
100+
| Query | Why | Mitigation |
101+
|-------|-----|------------|
102+
| `pg_catalog.pg_database` | System catalog, never exposed | Not needed — in GraphQL mode the user already knows their database |
103+
| `db_migrate.sql_actions` | Internal migration schema, not in exposed schemas | **Blocker** — see below |
104+
| `information_schema.columns` | System schema, never exposed | Not needed — field metadata already lives in `metaschema_public.field` |
105+
106+
### Blocker: `db_migrate.sql_actions`
107+
108+
The `db_migrate.sql_actions` table contains the actual migration SQL that was generated. This is the core data for the database-schema portion of the export. This table lives in the `db_migrate` schema which is **not** exposed via GraphQL.
109+
110+
**Options:**
111+
112+
1. **Expose `db_migrate` schema via a dedicated admin API** — Add it as an optional schema in the API configuration (requires server-side change)
113+
2. **Create a custom GraphQL query/mutation** — A Graphile plugin that adds a `exportSqlActions(databaseId)` query returning the migration data
114+
3. **Split the export** — Only export metadata via GraphQL (the meta portion), skip the migration SQL portion. The metadata export is the larger and more complex part anyway.
115+
4. **Provide a REST endpoint** — Add a `/api/export/sql-actions` REST route that returns this data
116+
117+
**Recommended:** Option 3 initially (metadata-only export via GraphQL), with Option 2 as a follow-up to support full export.
118+
119+
---
120+
121+
## Proposed Architecture
122+
123+
### Data Source Abstraction
124+
125+
Create an `ExportDataSource` interface that abstracts how data is fetched. The existing SQL path becomes one implementation; the new GraphQL path becomes another.
126+
127+
#### Interface
128+
129+
```typescript
130+
// pgpm/core/src/export/data-source.ts
131+
132+
export interface ExportDataSource {
133+
/** Fetch all rows from a metaschema/services/modules table filtered by database_id */
134+
fetchTable(schema: string, table: string, databaseId: string): Promise<Record<string, unknown>[]>;
135+
136+
/** Fetch the database record by id */
137+
fetchDatabase(databaseId: string): Promise<Record<string, unknown> | null>;
138+
139+
/** Fetch schemas for a database */
140+
fetchSchemas(databaseId: string): Promise<Record<string, unknown>[]>;
141+
142+
/** Fetch migration SQL actions (may not be available in GraphQL mode) */
143+
fetchSqlActions?(databaseId: string): Promise<Record<string, unknown>[]>;
144+
145+
/** List available databases (for interactive selection) */
146+
listDatabases?(): Promise<{ id: string; name: string }[]>;
147+
148+
/** Clean up connections */
149+
close(): Promise<void>;
150+
}
151+
```
152+
153+
#### SQL Implementation (existing behavior)
154+
155+
```typescript
156+
// pgpm/core/src/export/data-source-sql.ts
157+
158+
export class SqlDataSource implements ExportDataSource {
159+
constructor(private pool: Pool) {}
160+
161+
async fetchTable(schema: string, table: string, databaseId: string) {
162+
const result = await this.pool.query(
163+
`SELECT * FROM ${schema}.${table} WHERE database_id = $1`,
164+
[databaseId]
165+
);
166+
return result.rows;
167+
}
168+
// ... etc
169+
}
170+
```
171+
172+
#### GraphQL Implementation (new)
173+
174+
```typescript
175+
// pgpm/core/src/export/data-source-graphql.ts
176+
177+
export class GraphQLDataSource implements ExportDataSource {
178+
constructor(
179+
private endpoint: string,
180+
private token?: string
181+
) {}
182+
183+
async fetchTable(schema: string, table: string, databaseId: string) {
184+
// Build GraphQL query using PostGraphile naming conventions:
185+
// schema: metaschema_public, table: field
186+
// => allFields(condition: { databaseId: "..." }) { nodes { ... } }
187+
const queryName = toGraphQLCollectionName(schema, table);
188+
const query = buildAllNodesQuery(queryName, databaseId);
189+
const result = await this.executeQuery(query);
190+
return extractNodes(result, queryName);
191+
}
192+
// ... etc
193+
}
194+
```
195+
196+
### Naming Convention Mapping
197+
198+
PostGraphile transforms Postgres names to GraphQL names using inflection. The mapping follows:
199+
200+
| Postgres | GraphQL Query | GraphQL Type |
201+
|----------|--------------|--------------|
202+
| `metaschema_public.database` | `allDatabases` | `Database` |
203+
| `metaschema_public.schema` | `allSchemas` | `Schema` |
204+
| `metaschema_public.table` | `allTables` | `Table` |
205+
| `metaschema_public.field` | `allFields` | `Field` |
206+
| `metaschema_public.foreign_key_constraint` | `allForeignKeyConstraints` | `ForeignKeyConstraint` |
207+
| `services_public.domains` | `allDomains` | `Domain` |
208+
| `services_public.apis` | `allApis` | `Api` |
209+
| `metaschema_modules_public.rls_module` | `allRlsModules` | `RlsModule` |
210+
211+
Column names: `database_id` => `databaseId`, `schema_name` => `schemaName`, etc.
212+
213+
**Important:** The actual inflection is controlled by `graphile-settings` (specifically the `InflektPreset`). The developer should use **GraphQL introspection** at runtime to discover the exact names rather than hardcoding them. The existing `QueryBuilder` in `graphql/query` already handles this.
214+
215+
---
216+
217+
## Files to Create / Modify
218+
219+
### New Files
220+
221+
| File | Purpose |
222+
|------|---------|
223+
| `pgpm/core/src/export/data-source.ts` | `ExportDataSource` interface definition |
224+
| `pgpm/core/src/export/data-source-sql.ts` | SQL implementation (refactor existing code) |
225+
| `pgpm/core/src/export/data-source-graphql.ts` | GraphQL implementation (new) |
226+
| `pgpm/core/src/export/graphql-naming.ts` | PostGraphile naming convention helpers (postgres name => GraphQL query/field names) |
227+
| `pgpm/cli/src/commands/export.ts` | Modify to accept `--graphql-endpoint` and `--token` flags |
228+
229+
### Files to Modify
230+
231+
| File | Change |
232+
|------|--------|
233+
| `pgpm/core/src/export/export-migrations.ts` | Refactor `exportMigrationsToDisk` to accept `ExportDataSource` instead of using `pg-cache` directly |
234+
| `pgpm/core/src/export/export-meta.ts` | Refactor `exportMeta` to accept `ExportDataSource` instead of using `pg-cache` directly. The `queryAndParse` helper should use `dataSource.fetchTable()` instead of `pool.query()` |
235+
| `pgpm/cli/src/commands/export.ts` | Add `--graphql-endpoint` / `--token` CLI flags; construct `GraphQLDataSource` when endpoint is provided, `SqlDataSource` otherwise |
236+
237+
### Existing Packages to Leverage
238+
239+
| Package | What to Use |
240+
|---------|-------------|
241+
| `graphql/query` (`@constructive-io/graphql-query`) | `QueryBuilder` for building GraphQL AST queries; `MetaObject` types |
242+
| `graphql/query/src/executor.ts` | `QueryExecutor` for local execution (if connecting via connection string rather than HTTP endpoint) |
243+
| `graphql` (v16) | `print()` for serializing DocumentNode to string; `parse()` for parsing responses |
244+
245+
---
246+
247+
## Implementation Phases
248+
249+
### Phase 1: Data Source Abstraction
250+
251+
1. Define the `ExportDataSource` interface in `pgpm/core/src/export/data-source.ts`
252+
2. Create `SqlDataSource` that wraps the existing `pg` pool logic
253+
3. Refactor `exportMeta()` to accept a data source instead of pool options
254+
4. Refactor `exportMigrationsToDisk()` similarly
255+
5. Verify existing SQL export still works identically (no behavior change)
256+
257+
### Phase 2: GraphQL Data Source
258+
259+
1. Create `GraphQLDataSource` class
260+
2. Implement `fetchTable()` — builds a GraphQL `allXxx(condition: { databaseId: "..." }) { nodes { ... } }` query and executes it via HTTP POST to the endpoint
261+
3. Implement `fetchDatabase()` and `fetchSchemas()` as specific instances of `fetchTable()`
262+
4. Handle authentication (Bearer token in `Authorization` header)
263+
5. Handle pagination — PostGraphile uses cursor-based pagination; the implementation must page through all results if there are more than the default page size
264+
265+
### Phase 3: CLI Integration
266+
267+
1. Add `--graphql-endpoint <url>` and `--token <token>` flags to the export command
268+
2. When `--graphql-endpoint` is provided:
269+
- Skip the `pg_catalog.pg_database` query (not available)
270+
- Fetch databases from `metaschema_public.database` via GraphQL instead
271+
- Construct `GraphQLDataSource` and pass it through to the core
272+
3. When no endpoint is provided, use `SqlDataSource` (existing behavior)
273+
4. Handle the `db_migrate.sql_actions` gap — in GraphQL mode, only export metadata (skip the database migration portion), or error with a helpful message
274+
275+
### Phase 4: Field Discovery
276+
277+
The current `export-meta.ts` uses `information_schema.columns` to dynamically discover which columns exist in each table. In GraphQL mode, this information is not directly available. Options:
278+
279+
1. **Use GraphQL introspection** — Query the GraphQL schema's `__type` for each type to discover available fields. This is the cleanest approach.
280+
2. **Use the `config` map** — The hardcoded `config` in `export-meta.ts` already lists expected fields per table. In GraphQL mode, these can be cross-referenced with introspection results.
281+
282+
**Recommended:** Use GraphQL introspection (`__schema` / `__type` queries) to discover fields, then map them back to Postgres column names using reverse inflection.
283+
284+
---
285+
286+
## Key Design Decisions
287+
288+
### 1. HTTP vs Direct Execution
289+
290+
Two modes of GraphQL execution should be supported:
291+
292+
- **HTTP mode** (primary): POST queries to a remote GraphQL endpoint. This is what customers will use.
293+
- **Local mode** (optional): Use `QueryExecutor` from `graphql/query` to execute queries directly against a local database via PostGraphile. This could be useful for development/testing.
294+
295+
### 2. Pagination Strategy
296+
297+
PostGraphile uses Relay-style cursor pagination. The `GraphQLDataSource.fetchTable()` must handle tables with many rows by paginating:
298+
299+
```graphql
300+
query {
301+
allFields(condition: { databaseId: "..." }, first: 100, after: "cursor...") {
302+
nodes { ... }
303+
pageInfo {
304+
hasNextPage
305+
endCursor
306+
}
307+
}
308+
}
309+
```
310+
311+
Loop until `hasNextPage` is false.
312+
313+
### 3. Column Name Translation
314+
315+
GraphQL field names are camelCase (`databaseId`), but the `csv-to-pg` Parser and the rest of the export pipeline expect snake_case Postgres column names (`database_id`). The GraphQL data source must translate response keys back to snake_case before returning rows.
316+
317+
### 4. Authentication
318+
319+
The GraphQL endpoint requires authentication. The export command should accept:
320+
- `--token <jwt>` — A JWT token passed as `Authorization: Bearer <token>`
321+
- The token must have sufficient privileges to read from `metaschema_public`, `services_public`, and `metaschema_modules_public` schemas
322+
323+
---
324+
325+
## Testing Strategy
326+
327+
1. **Unit tests**: Mock the `ExportDataSource` interface and verify `exportMeta`/`exportMigrations` work with both implementations
328+
2. **Integration tests**: Stand up a local GraphQL server, run the GraphQL export against it, compare output to SQL export output — they should be identical
329+
3. **E2E test**: Run `pgpm export --graphql-endpoint http://localhost:5678/graphql` against a running dev server
330+
331+
---
332+
333+
## Open Questions
334+
335+
1. **Should `db_migrate.sql_actions` be exposed via GraphQL?** If so, this requires a server-side change (adding `db_migrate` to the exposed schemas or creating a custom Graphile plugin). Without this, GraphQL export can only produce the metadata/service portion, not the database-schema migration SQL.
336+
2. **Should the GraphQL export produce the same output format?** The SQL export produces pgpm plan files + deploy/revert/verify SQL files. The metadata-only export is a subset of this.
337+
3. **Should we support introspection-based field discovery or rely on the hardcoded config map?** Introspection is cleaner but adds complexity; the config map is already maintained.
338+
339+
---
340+
341+
## Summary
342+
343+
| What | Where |
344+
|------|-------|
345+
| New interface | `pgpm/core/src/export/data-source.ts` |
346+
| SQL adapter (refactor) | `pgpm/core/src/export/data-source-sql.ts` |
347+
| GraphQL adapter (new) | `pgpm/core/src/export/data-source-graphql.ts` |
348+
| Naming helpers | `pgpm/core/src/export/graphql-naming.ts` |
349+
| CLI changes | `pgpm/cli/src/commands/export.ts` |
350+
| Core refactors | `pgpm/core/src/export/export-meta.ts`, `pgpm/core/src/export/export-migrations.ts` |
351+
| Leverage | `graphql/query` (QueryBuilder), `graphql` v16 (print/parse) |

0 commit comments

Comments
 (0)