|
| 1 | +# Plan: GraphQL-Based `pgpm export` |
| 2 | + |
| 3 | +## Problem |
| 4 | + |
| 5 | +The current `pgpm export` command fetches all data via direct SQL queries against the PostgreSQL database (using `pg-cache` / `pg` pool). Customers who use Constructive's hosted platform do **not** have direct SQL access to their databases — they only have access to the **GraphQL API** served by the Constructive server (`graphql/server`). |
| 6 | + |
| 7 | +This means customers cannot run `pgpm export` today because the command requires a direct database connection. |
| 8 | + |
| 9 | +## Goal |
| 10 | + |
| 11 | +Create an alternative data-fetching backend for `pgpm export` that uses **GraphQL queries** instead of raw SQL, so the export can run against a customer's GraphQL endpoint (e.g. `https://api.example.com/graphql`). |
| 12 | + |
| 13 | +The developer should be able to run something like: |
| 14 | + |
| 15 | +```bash |
| 16 | +pgpm export --graphql-endpoint https://api.example.com/graphql --token <auth-token> |
| 17 | +``` |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Current Architecture |
| 22 | + |
| 23 | +### Export Flow Overview |
| 24 | + |
| 25 | +The export command has three layers: |
| 26 | + |
| 27 | +1. **CLI layer** — `pgpm/cli/src/commands/export.ts` |
| 28 | +2. **Core orchestrator** — `pgpm/core/src/export/export-migrations.ts` |
| 29 | +3. **Meta exporter** — `pgpm/core/src/export/export-meta.ts` |
| 30 | + |
| 31 | +### SQL Queries Used Today |
| 32 | + |
| 33 | +#### CLI layer (`pgpm/cli/src/commands/export.ts`) |
| 34 | + |
| 35 | +| # | Query | Purpose | |
| 36 | +|---|-------|---------| |
| 37 | +| 1 | `SELECT datname FROM pg_catalog.pg_database ...` | List available Postgres databases | |
| 38 | +| 2 | `SELECT id, name FROM metaschema_public.database` | List database entries in metaschema | |
| 39 | +| 3 | `SELECT * FROM metaschema_public.schema WHERE database_id = $1` | List schemas for a database | |
| 40 | + |
| 41 | +#### Core orchestrator (`pgpm/core/src/export/export-migrations.ts`) |
| 42 | + |
| 43 | +| # | Query | Purpose | |
| 44 | +|---|-------|---------| |
| 45 | +| 4 | `SELECT * FROM metaschema_public.database WHERE id = $1` | Get database record | |
| 46 | +| 5 | `SELECT * FROM metaschema_public.schema WHERE database_id = $1` | Get schemas | |
| 47 | +| 6 | `SELECT * FROM db_migrate.sql_actions WHERE database_id = $1 ORDER BY id` | Get migration SQL actions | |
| 48 | + |
| 49 | +#### Meta exporter (`pgpm/core/src/export/export-meta.ts`) |
| 50 | + |
| 51 | +~50+ queries of the form: |
| 52 | + |
| 53 | +```sql |
| 54 | +SELECT * FROM <schema>.<table> WHERE database_id = $1 |
| 55 | +``` |
| 56 | + |
| 57 | +Across three schemas: |
| 58 | +- **`metaschema_public`** — `database`, `schema`, `table`, `field`, `policy`, `index`, `trigger`, `trigger_function`, `rls_function`, `limit_function`, `procedure`, `foreign_key_constraint`, `primary_key_constraint`, `unique_constraint`, `check_constraint`, `full_text_search`, `schema_grant`, `table_grant` |
| 59 | +- **`services_public`** — `domains`, `sites`, `apis`, `apps`, `site_modules`, `site_themes`, `site_metadata`, `api_modules`, `api_extensions`, `api_schemas` |
| 60 | +- **`metaschema_modules_public`** — `rls_module`, `user_auth_module`, `memberships_module`, `permissions_module`, `limits_module`, `levels_module`, `users_module`, `hierarchy_module`, `membership_types_module`, `invites_module`, `emails_module`, `sessions_module`, `secrets_module`, `profiles_module`, `encrypted_secrets_module`, `connected_accounts_module`, `phone_numbers_module`, `crypto_addresses_module`, `crypto_auth_module`, `field_module`, `table_template_module`, `secure_table_provision`, `uuid_module`, `default_ids_module`, `denormalized_table_field` |
| 61 | + |
| 62 | +Additionally, `export-meta.ts` queries `information_schema.columns` to dynamically discover table columns for each table (via `getTableColumns()`). |
| 63 | + |
| 64 | +--- |
| 65 | + |
| 66 | +## GraphQL Availability Analysis |
| 67 | + |
| 68 | +The Constructive GraphQL server (PostGraphile v5) auto-generates a GraphQL schema from the exposed Postgres schemas. The schemas exposed are configured per-API via `services_public.api_schemas`. |
| 69 | + |
| 70 | +### Tables accessible via GraphQL |
| 71 | + |
| 72 | +All tables in these schemas are exposed through PostGraphile and will have auto-generated GraphQL queries: |
| 73 | + |
| 74 | +- `metaschema_public.*` — All 18+ tables |
| 75 | +- `services_public.*` — All 10+ tables |
| 76 | +- `metaschema_modules_public.*` — All 25+ modules tables |
| 77 | + |
| 78 | +For each table, PostGraphile generates: |
| 79 | +- `all<TableName>s` (connection query with filtering, pagination) |
| 80 | +- `<tableName>ById` (single-row lookup by primary key) |
| 81 | + |
| 82 | +For example, `metaschema_public.schema` becomes: |
| 83 | +```graphql |
| 84 | +query { |
| 85 | + allSchemas(condition: { databaseId: "..." }) { |
| 86 | + nodes { |
| 87 | + id |
| 88 | + databaseId |
| 89 | + name |
| 90 | + schemaName |
| 91 | + description |
| 92 | + isPublic |
| 93 | + } |
| 94 | + } |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +### Tables/queries NOT accessible via GraphQL |
| 99 | + |
| 100 | +| Query | Why | Mitigation | |
| 101 | +|-------|-----|------------| |
| 102 | +| `pg_catalog.pg_database` | System catalog, never exposed | Not needed — in GraphQL mode the user already knows their database | |
| 103 | +| `db_migrate.sql_actions` | Internal migration schema, not in exposed schemas | **Blocker** — see below | |
| 104 | +| `information_schema.columns` | System schema, never exposed | Not needed — field metadata already lives in `metaschema_public.field` | |
| 105 | + |
| 106 | +### Blocker: `db_migrate.sql_actions` |
| 107 | + |
| 108 | +The `db_migrate.sql_actions` table contains the actual migration SQL that was generated. This is the core data for the database-schema portion of the export. This table lives in the `db_migrate` schema which is **not** exposed via GraphQL. |
| 109 | + |
| 110 | +**Options:** |
| 111 | + |
| 112 | +1. **Expose `db_migrate` schema via a dedicated admin API** — Add it as an optional schema in the API configuration (requires server-side change) |
| 113 | +2. **Create a custom GraphQL query/mutation** — A Graphile plugin that adds a `exportSqlActions(databaseId)` query returning the migration data |
| 114 | +3. **Split the export** — Only export metadata via GraphQL (the meta portion), skip the migration SQL portion. The metadata export is the larger and more complex part anyway. |
| 115 | +4. **Provide a REST endpoint** — Add a `/api/export/sql-actions` REST route that returns this data |
| 116 | + |
| 117 | +**Recommended:** Option 3 initially (metadata-only export via GraphQL), with Option 2 as a follow-up to support full export. |
| 118 | + |
| 119 | +--- |
| 120 | + |
| 121 | +## Proposed Architecture |
| 122 | + |
| 123 | +### Data Source Abstraction |
| 124 | + |
| 125 | +Create an `ExportDataSource` interface that abstracts how data is fetched. The existing SQL path becomes one implementation; the new GraphQL path becomes another. |
| 126 | + |
| 127 | +#### Interface |
| 128 | + |
| 129 | +```typescript |
| 130 | +// pgpm/core/src/export/data-source.ts |
| 131 | + |
| 132 | +export interface ExportDataSource { |
| 133 | + /** Fetch all rows from a metaschema/services/modules table filtered by database_id */ |
| 134 | + fetchTable(schema: string, table: string, databaseId: string): Promise<Record<string, unknown>[]>; |
| 135 | + |
| 136 | + /** Fetch the database record by id */ |
| 137 | + fetchDatabase(databaseId: string): Promise<Record<string, unknown> | null>; |
| 138 | + |
| 139 | + /** Fetch schemas for a database */ |
| 140 | + fetchSchemas(databaseId: string): Promise<Record<string, unknown>[]>; |
| 141 | + |
| 142 | + /** Fetch migration SQL actions (may not be available in GraphQL mode) */ |
| 143 | + fetchSqlActions?(databaseId: string): Promise<Record<string, unknown>[]>; |
| 144 | + |
| 145 | + /** List available databases (for interactive selection) */ |
| 146 | + listDatabases?(): Promise<{ id: string; name: string }[]>; |
| 147 | + |
| 148 | + /** Clean up connections */ |
| 149 | + close(): Promise<void>; |
| 150 | +} |
| 151 | +``` |
| 152 | + |
| 153 | +#### SQL Implementation (existing behavior) |
| 154 | + |
| 155 | +```typescript |
| 156 | +// pgpm/core/src/export/data-source-sql.ts |
| 157 | + |
| 158 | +export class SqlDataSource implements ExportDataSource { |
| 159 | + constructor(private pool: Pool) {} |
| 160 | + |
| 161 | + async fetchTable(schema: string, table: string, databaseId: string) { |
| 162 | + const result = await this.pool.query( |
| 163 | + `SELECT * FROM ${schema}.${table} WHERE database_id = $1`, |
| 164 | + [databaseId] |
| 165 | + ); |
| 166 | + return result.rows; |
| 167 | + } |
| 168 | + // ... etc |
| 169 | +} |
| 170 | +``` |
| 171 | + |
| 172 | +#### GraphQL Implementation (new) |
| 173 | + |
| 174 | +```typescript |
| 175 | +// pgpm/core/src/export/data-source-graphql.ts |
| 176 | + |
| 177 | +export class GraphQLDataSource implements ExportDataSource { |
| 178 | + constructor( |
| 179 | + private endpoint: string, |
| 180 | + private token?: string |
| 181 | + ) {} |
| 182 | + |
| 183 | + async fetchTable(schema: string, table: string, databaseId: string) { |
| 184 | + // Build GraphQL query using PostGraphile naming conventions: |
| 185 | + // schema: metaschema_public, table: field |
| 186 | + // => allFields(condition: { databaseId: "..." }) { nodes { ... } } |
| 187 | + const queryName = toGraphQLCollectionName(schema, table); |
| 188 | + const query = buildAllNodesQuery(queryName, databaseId); |
| 189 | + const result = await this.executeQuery(query); |
| 190 | + return extractNodes(result, queryName); |
| 191 | + } |
| 192 | + // ... etc |
| 193 | +} |
| 194 | +``` |
| 195 | + |
| 196 | +### Naming Convention Mapping |
| 197 | + |
| 198 | +PostGraphile transforms Postgres names to GraphQL names using inflection. The mapping follows: |
| 199 | + |
| 200 | +| Postgres | GraphQL Query | GraphQL Type | |
| 201 | +|----------|--------------|--------------| |
| 202 | +| `metaschema_public.database` | `allDatabases` | `Database` | |
| 203 | +| `metaschema_public.schema` | `allSchemas` | `Schema` | |
| 204 | +| `metaschema_public.table` | `allTables` | `Table` | |
| 205 | +| `metaschema_public.field` | `allFields` | `Field` | |
| 206 | +| `metaschema_public.foreign_key_constraint` | `allForeignKeyConstraints` | `ForeignKeyConstraint` | |
| 207 | +| `services_public.domains` | `allDomains` | `Domain` | |
| 208 | +| `services_public.apis` | `allApis` | `Api` | |
| 209 | +| `metaschema_modules_public.rls_module` | `allRlsModules` | `RlsModule` | |
| 210 | + |
| 211 | +Column names: `database_id` => `databaseId`, `schema_name` => `schemaName`, etc. |
| 212 | + |
| 213 | +**Important:** The actual inflection is controlled by `graphile-settings` (specifically the `InflektPreset`). The developer should use **GraphQL introspection** at runtime to discover the exact names rather than hardcoding them. The existing `QueryBuilder` in `graphql/query` already handles this. |
| 214 | + |
| 215 | +--- |
| 216 | + |
| 217 | +## Files to Create / Modify |
| 218 | + |
| 219 | +### New Files |
| 220 | + |
| 221 | +| File | Purpose | |
| 222 | +|------|---------| |
| 223 | +| `pgpm/core/src/export/data-source.ts` | `ExportDataSource` interface definition | |
| 224 | +| `pgpm/core/src/export/data-source-sql.ts` | SQL implementation (refactor existing code) | |
| 225 | +| `pgpm/core/src/export/data-source-graphql.ts` | GraphQL implementation (new) | |
| 226 | +| `pgpm/core/src/export/graphql-naming.ts` | PostGraphile naming convention helpers (postgres name => GraphQL query/field names) | |
| 227 | +| `pgpm/cli/src/commands/export.ts` | Modify to accept `--graphql-endpoint` and `--token` flags | |
| 228 | + |
| 229 | +### Files to Modify |
| 230 | + |
| 231 | +| File | Change | |
| 232 | +|------|--------| |
| 233 | +| `pgpm/core/src/export/export-migrations.ts` | Refactor `exportMigrationsToDisk` to accept `ExportDataSource` instead of using `pg-cache` directly | |
| 234 | +| `pgpm/core/src/export/export-meta.ts` | Refactor `exportMeta` to accept `ExportDataSource` instead of using `pg-cache` directly. The `queryAndParse` helper should use `dataSource.fetchTable()` instead of `pool.query()` | |
| 235 | +| `pgpm/cli/src/commands/export.ts` | Add `--graphql-endpoint` / `--token` CLI flags; construct `GraphQLDataSource` when endpoint is provided, `SqlDataSource` otherwise | |
| 236 | + |
| 237 | +### Existing Packages to Leverage |
| 238 | + |
| 239 | +| Package | What to Use | |
| 240 | +|---------|-------------| |
| 241 | +| `graphql/query` (`@constructive-io/graphql-query`) | `QueryBuilder` for building GraphQL AST queries; `MetaObject` types | |
| 242 | +| `graphql/query/src/executor.ts` | `QueryExecutor` for local execution (if connecting via connection string rather than HTTP endpoint) | |
| 243 | +| `graphql` (v16) | `print()` for serializing DocumentNode to string; `parse()` for parsing responses | |
| 244 | + |
| 245 | +--- |
| 246 | + |
| 247 | +## Implementation Phases |
| 248 | + |
| 249 | +### Phase 1: Data Source Abstraction |
| 250 | + |
| 251 | +1. Define the `ExportDataSource` interface in `pgpm/core/src/export/data-source.ts` |
| 252 | +2. Create `SqlDataSource` that wraps the existing `pg` pool logic |
| 253 | +3. Refactor `exportMeta()` to accept a data source instead of pool options |
| 254 | +4. Refactor `exportMigrationsToDisk()` similarly |
| 255 | +5. Verify existing SQL export still works identically (no behavior change) |
| 256 | + |
| 257 | +### Phase 2: GraphQL Data Source |
| 258 | + |
| 259 | +1. Create `GraphQLDataSource` class |
| 260 | +2. Implement `fetchTable()` — builds a GraphQL `allXxx(condition: { databaseId: "..." }) { nodes { ... } }` query and executes it via HTTP POST to the endpoint |
| 261 | +3. Implement `fetchDatabase()` and `fetchSchemas()` as specific instances of `fetchTable()` |
| 262 | +4. Handle authentication (Bearer token in `Authorization` header) |
| 263 | +5. Handle pagination — PostGraphile uses cursor-based pagination; the implementation must page through all results if there are more than the default page size |
| 264 | + |
| 265 | +### Phase 3: CLI Integration |
| 266 | + |
| 267 | +1. Add `--graphql-endpoint <url>` and `--token <token>` flags to the export command |
| 268 | +2. When `--graphql-endpoint` is provided: |
| 269 | + - Skip the `pg_catalog.pg_database` query (not available) |
| 270 | + - Fetch databases from `metaschema_public.database` via GraphQL instead |
| 271 | + - Construct `GraphQLDataSource` and pass it through to the core |
| 272 | +3. When no endpoint is provided, use `SqlDataSource` (existing behavior) |
| 273 | +4. Handle the `db_migrate.sql_actions` gap — in GraphQL mode, only export metadata (skip the database migration portion), or error with a helpful message |
| 274 | + |
| 275 | +### Phase 4: Field Discovery |
| 276 | + |
| 277 | +The current `export-meta.ts` uses `information_schema.columns` to dynamically discover which columns exist in each table. In GraphQL mode, this information is not directly available. Options: |
| 278 | + |
| 279 | +1. **Use GraphQL introspection** — Query the GraphQL schema's `__type` for each type to discover available fields. This is the cleanest approach. |
| 280 | +2. **Use the `config` map** — The hardcoded `config` in `export-meta.ts` already lists expected fields per table. In GraphQL mode, these can be cross-referenced with introspection results. |
| 281 | + |
| 282 | +**Recommended:** Use GraphQL introspection (`__schema` / `__type` queries) to discover fields, then map them back to Postgres column names using reverse inflection. |
| 283 | + |
| 284 | +--- |
| 285 | + |
| 286 | +## Key Design Decisions |
| 287 | + |
| 288 | +### 1. HTTP vs Direct Execution |
| 289 | + |
| 290 | +Two modes of GraphQL execution should be supported: |
| 291 | + |
| 292 | +- **HTTP mode** (primary): POST queries to a remote GraphQL endpoint. This is what customers will use. |
| 293 | +- **Local mode** (optional): Use `QueryExecutor` from `graphql/query` to execute queries directly against a local database via PostGraphile. This could be useful for development/testing. |
| 294 | + |
| 295 | +### 2. Pagination Strategy |
| 296 | + |
| 297 | +PostGraphile uses Relay-style cursor pagination. The `GraphQLDataSource.fetchTable()` must handle tables with many rows by paginating: |
| 298 | + |
| 299 | +```graphql |
| 300 | +query { |
| 301 | + allFields(condition: { databaseId: "..." }, first: 100, after: "cursor...") { |
| 302 | + nodes { ... } |
| 303 | + pageInfo { |
| 304 | + hasNextPage |
| 305 | + endCursor |
| 306 | + } |
| 307 | + } |
| 308 | +} |
| 309 | +``` |
| 310 | + |
| 311 | +Loop until `hasNextPage` is false. |
| 312 | + |
| 313 | +### 3. Column Name Translation |
| 314 | + |
| 315 | +GraphQL field names are camelCase (`databaseId`), but the `csv-to-pg` Parser and the rest of the export pipeline expect snake_case Postgres column names (`database_id`). The GraphQL data source must translate response keys back to snake_case before returning rows. |
| 316 | + |
| 317 | +### 4. Authentication |
| 318 | + |
| 319 | +The GraphQL endpoint requires authentication. The export command should accept: |
| 320 | +- `--token <jwt>` — A JWT token passed as `Authorization: Bearer <token>` |
| 321 | +- The token must have sufficient privileges to read from `metaschema_public`, `services_public`, and `metaschema_modules_public` schemas |
| 322 | + |
| 323 | +--- |
| 324 | + |
| 325 | +## Testing Strategy |
| 326 | + |
| 327 | +1. **Unit tests**: Mock the `ExportDataSource` interface and verify `exportMeta`/`exportMigrations` work with both implementations |
| 328 | +2. **Integration tests**: Stand up a local GraphQL server, run the GraphQL export against it, compare output to SQL export output — they should be identical |
| 329 | +3. **E2E test**: Run `pgpm export --graphql-endpoint http://localhost:5678/graphql` against a running dev server |
| 330 | + |
| 331 | +--- |
| 332 | + |
| 333 | +## Open Questions |
| 334 | + |
| 335 | +1. **Should `db_migrate.sql_actions` be exposed via GraphQL?** If so, this requires a server-side change (adding `db_migrate` to the exposed schemas or creating a custom Graphile plugin). Without this, GraphQL export can only produce the metadata/service portion, not the database-schema migration SQL. |
| 336 | +2. **Should the GraphQL export produce the same output format?** The SQL export produces pgpm plan files + deploy/revert/verify SQL files. The metadata-only export is a subset of this. |
| 337 | +3. **Should we support introspection-based field discovery or rely on the hardcoded config map?** Introspection is cleaner but adds complexity; the config map is already maintained. |
| 338 | + |
| 339 | +--- |
| 340 | + |
| 341 | +## Summary |
| 342 | + |
| 343 | +| What | Where | |
| 344 | +|------|-------| |
| 345 | +| New interface | `pgpm/core/src/export/data-source.ts` | |
| 346 | +| SQL adapter (refactor) | `pgpm/core/src/export/data-source-sql.ts` | |
| 347 | +| GraphQL adapter (new) | `pgpm/core/src/export/data-source-graphql.ts` | |
| 348 | +| Naming helpers | `pgpm/core/src/export/graphql-naming.ts` | |
| 349 | +| CLI changes | `pgpm/cli/src/commands/export.ts` | |
| 350 | +| Core refactors | `pgpm/core/src/export/export-meta.ts`, `pgpm/core/src/export/export-migrations.ts` | |
| 351 | +| Leverage | `graphql/query` (QueryBuilder), `graphql` v16 (print/parse) | |
0 commit comments