Skip to content

Add postgres_synced_tables bundle resource#5268

Open
pietern wants to merge 46 commits into
mainfrom
postgres-synced-table
Open

Add postgres_synced_tables bundle resource#5268
pietern wants to merge 46 commits into
mainfrom
postgres-synced-table

Conversation

@pietern
Copy link
Copy Markdown
Contributor

@pietern pietern commented May 19, 2026

Changes

New postgres_synced_tables resource that syncs a Unity Catalog Delta table into a Postgres table on a Lakebase Autoscaling branch. Supported on both direct and terraform deployment engines.

Tests

Acceptance coverage: basic and recreate exercise each engine, plus the existing no_drift and migrate invariants pick up the new resource. Both engines produce identical human-readable output and identical wire bodies; only the captured request streams diverge by filename (out.requests.{direct,terraform}.json).

Verified end to end on a live workspace: the bundle deploys a project, lakebase catalog, pipeline-storage schema, and synced table; the pipeline materializes in under a minute; SELECT against the destination through the UC federated view returns the rows from the source Delta table; and bundle destroy cleans up the full chain.

This pull request and its description were written by Isaac.

pietern added 30 commits May 18, 2026 11:32
Register postgres_synced_tables in apitypes.yml, add a testserver route
for the operation-polling URL, and wire up the TestAll fixture entry.

Co-authored-by: Isaac
…/bind/mock allow-lists

Add postgres_synced_tables to the per-resource allow-lists and mock fixtures
that enumerate all bundle resource types:
- unsupportedResources in apply_bundle_permissions_test (no ACL API)
- allResourceTypes expected list and allowList in run_as_test (no run_as concept)
- mockBundle in apply_target_mode_test + notUserNamed carve-out
- TestResourcesBindSupport fixture + GetSyncedTable mock expectation
- Refresh acceptance/bundle/refschema/out.fields.txt snapshot

Co-authored-by: Isaac
Running ./task generate-direct-resources populates the missing
ignore_remote_changes block for postgres_synced_tables. Every spec
field is now marked spec:input_only, so the planner stops flagging
the empty spec returned by GET as drift.

The manual recreate_on_changes block in resources.yml is unchanged on
purpose: it covers the intent side (a user editing databricks.yml must
still trigger delete+create because no UpdateSyncedTable endpoint
exists). Added a comment at the top of the block explaining how the
two declarations cooperate.

Same pattern as secret_scopes, which is the other no-Update resource.
Adds a no_drift invariant test config for postgres_synced_tables.
This is the regression guard for the V12 forever-recreate bug — if
RemapState ever drops the ignore_remote_changes coverage on a spec
field, this test will catch the bug at CI time instead of customer
deploy time.

Excluded from the Cloud variant for the same reason as the other
postgres_* configs: Lakebase Autoscaling is AWS-only and the
production fixture used by the cloud variant doesn't have a Lakebase
project bound to the test workspace.
## Changes

New `postgres_catalogs` resource binding a Unity Catalog catalog to a Postgres database on a Lakebase Autoscaling branch. Supported on both direct and terraform deployment engines.

The spec fields are classified as both `recreate_on_changes` and `ignore_remote_changes: input_only`. The two cover orthogonal diffs the planner runs — recreate fires on local edits to an immutable field, and ignore_remote silences the phantom drift from GET not echoing spec back today. Lift the `input_only` entries once the backend starts returning spec.

## Tests

Acceptance coverage: `basic` and `recreate` exercise each engine, plus the existing `no_drift` and `migrate` invariants pick up the new resource. Both engines produce identical human-readable output and identical wire bodies; only the captured request streams diverge by filename (`out.requests.{direct,terraform}.json`).

Verified end to end on a live workspace: the bundle deploys a project and catalog, a row written directly into the bound Postgres database becomes visible through the UC federated view, and a follow-up write shows up on re-read.

_This PR was written by Claude Code._
## Changes

New `postgres_catalogs` resource binding a Unity Catalog catalog to a Postgres database on a Lakebase Autoscaling branch. Supported on both direct and terraform deployment engines.

The spec fields are classified as both `recreate_on_changes` and `ignore_remote_changes: input_only`. The two cover orthogonal diffs the planner runs — recreate fires on local edits to an immutable field, and ignore_remote silences the phantom drift from GET not echoing spec back today. Lift the `input_only` entries once the backend starts returning spec.

## Tests

Acceptance coverage: `basic` and `recreate` exercise each engine, plus the existing `no_drift` and `migrate` invariants pick up the new resource. Both engines produce identical human-readable output and identical wire bodies; only the captured request streams diverge by filename (`out.requests.{direct,terraform}.json`).

Verified end to end on a live workspace: the bundle deploys a project and catalog, a row written directly into the bound Postgres database becomes visible through the UC federated view, and a follow-up write shows up on re-read.

This pull request and its description were written by Isaac.
This pull request and its description were written by Isaac.
Resolves sibling-add conflicts across:
- Bundle config registration (Resources struct, AllResources, SupportedResources)
- Direct engine all.go + apitypes.yml + resources.yml
- Testserver fake_workspace, postgres CRUD switch, handler routes
- All-resources allow-lists (type_test, statemgmt fixtures, mutator tests)
- No-drift invariant matrix
- Changelog

All conflicts were the same shape: both branches added a new entry next
to the existing postgres_* siblings. Kept both, ordered catalogs before
synced_tables to match the production sequencing (catalog must exist
before a synced table can reference it).
Mirrors what postgres-catalog did: the resource is now produced for
both the direct and terraform engines.

- New tfdyn converter in bundle/deploy/terraform/tfdyn/, with unit
  tests that lock in the wire shape (spec block, scheduling_policy
  enum, primary_key_columns list, nested new_pipeline_spec).
- Wired into GroupToTerraformName (databricks_postgres_synced_table),
  the postgres-resource set in interpolate.go and util.go, and removed
  from lifecycle_test.go's direct-only ignore list.
- Acceptance test test.toml now runs both engines (direct + terraform)
  on AWS only, matching the catalog config. basic/script writes
  out.requests.$DATABRICKS_BUNDLE_ENGINE.json so the captured wire
  bodies are visible per engine.
- Renamed the existing single-engine out.requests.json to
  out.requests.direct.json and generated out.requests.terraform.json.
- Regenerated affected baselines.

The migrate invariant test now passes for postgres_synced_tables too,
since the resource is no longer direct-only.
The bundle now declares its own postgres_project + postgres_catalog
chain alongside the synced table, so the cloud variant can deploy
against a real workspace without out-of-band setup.

- Source table is samples.nyctaxi.trips directly (ships on every
  UC-enabled workspace; no intermediate CREATE TABLE needed).
- A single UC schema is still created in main for the pipeline's
  internal storage (storage_catalog/storage_schema), which must
  pre-exist on the workspace.
- recreate test toggles timeseries_key instead of scheduling_policy,
  so the second deploy doesn't require CDF on samples.nyctaxi.trips
  (which is read-only).
- Cross-resource references go through the catalog's catalog_id
  (synced_table_id) and the project's id (branch path), exercising the
  interpolate-postgres-resources path on both engines.
- test.toml gains [[Server]] stubs for the SQL statements API and the
  UC tables-delete API so the local variant can run the schema create.
- Regenerated baselines for both engines.
Drop the schema-create / schema-delete shell commands from the test
scripts and declare the storage schema as a schemas resource in the
bundle. Same lifecycle as everything else — bundle destroy walks the
dependency graph and tears it down in order, so a partial failure
leaks one fewer thing.

new_pipeline_spec now references the schema via:

  storage_catalog: ${resources.schemas.pipeline_storage.catalog_name}
  storage_schema:  ${resources.schemas.pipeline_storage.name}

which exercises one more piece of cross-resource interpolation.

Also drops the SQL / UC tables-delete server stubs from test.toml
since the local scripts no longer hit those endpoints.
Found on aws-prod-ucws: a deploy targeting samples.nyctaxi.trips as
the synced-table source returns

  Cannot create more than 20 synced database table(s) per source
  table. (400 BAD_REQUEST)

There's a hard server-side limit of 20 synced tables per source, and
samples.nyctaxi.trips is depleted on shared workspaces. The original
script created a per-test source table for this reason (see
synced_database_tables/basic for the same workaround). I removed it
chasing simplicity; this restores it.

The pipeline-storage schema stays bundle-managed (the schemas
resource added in the previous commit); only the source-table side
goes back to being script-managed.
The previous jq filter deleted only the random fields (timestamps,
uid, pipeline_id, message). It left detailed_state, which is timing-
dependent on cloud: real workspaces are still in
SYNCED_TABLE_PROVISIONING_PIPELINE_RESOURCES at the GET, while the
fake testserver always returns SYNCED_TABLE_ONLINE. The cloud
response also carries ongoing_sync_progress and project which the
fake doesn't.

Switch to projecting just the deterministic identity + UC
provisioning state, which is ACTIVE in both environments.
@pietern pietern temporarily deployed to test-trigger-is May 19, 2026 12:08 — with GitHub Actions Inactive
Comment thread acceptance/bundle/resources/postgres_synced_tables/recreate/output.txt Outdated
Comment thread bundle/config/resources/postgres_synced_table.go Outdated
Comment thread libs/testserver/handlers.go
pietern added 2 commits May 20, 2026 13:17
…-synced-table

# Conflicts:
#	NEXT_CHANGELOG.md
#	acceptance/bundle/invariant/test.toml
#	acceptance/bundle/refschema/out.fields.txt
#	acceptance/bundle/resources/postgres_catalogs/basic/script
#	bundle/config/mutator/resourcemutator/apply_bundle_permissions_test.go
#	bundle/config/mutator/resourcemutator/apply_target_mode_test.go
#	bundle/config/resources.go
#	bundle/config/resources_test.go
#	bundle/deploy/terraform/interpolate.go
#	bundle/deploy/terraform/pkg.go
#	bundle/deploy/terraform/util.go
#	bundle/direct/dresources/all.go
#	bundle/direct/dresources/all_test.go
#	bundle/direct/dresources/apitypes.yml
#	bundle/direct/dresources/postgres_catalog.go
#	bundle/statemgmt/state_load_test.go
#	libs/testserver/fake_workspace.go
#	libs/testserver/handlers.go
#	libs/testserver/postgres.go
- Drop the duplicate postgres_catalogs block that the merge pulled in
  alongside the existing one in resources.yml.
- Remove postgres_catalogs from knownMissingInRemoteType now that the
  new PostgresCatalogRemote shim from #5265 surfaces the spec fields.

Co-authored-by: Isaac
Base automatically changed from postgres-catalog to main May 20, 2026 11:24
pietern added 4 commits May 20, 2026 13:29
Adopt the same embedded-spec Remote pattern that #5273 / #5265 introduced
for postgres_catalogs: PostgresSyncedTableRemote embeds SyncedTableSyncedTableSpec
plus output-only fields, so every StateType path is also a valid RemoteType
path. RemapState just copies the embedded shape; drift on spec fields is
suppressed via the spec:input_only classifications generated from the
OpenAPI schema until GET starts echoing the spec.

Drop the now-empty postgres_synced_tables entry from
knownMissingInRemoteType, and regenerate acceptance/bundle/refschema/out.fields.txt
so the embedded spec fields show up as ALL rather than INPUT|STATE.

Co-authored-by: Isaac
The bundle materializes 4 resources (catalog, project, synced table,
schema). After toggling timeseries_key, plan output is "1 to add, 0 to
change, 1 to delete, 3 unchanged", not "0 unchanged". Update the
contains.py assertion and regenerate output.

Co-authored-by: Isaac
UC's explore/data path expects /{catalog}/{schema}/{table}, not a single
dotted segment. Match the vector_search_indexes precedent (#5123): split
the three-part name on '.' and join with '/'.

Before: /explore/data/main.public.trips_synced (404s)
After:  /explore/data/main/public/trips_synced

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is May 20, 2026 14:07 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 20, 2026 14:07 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 20, 2026 15:11 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 20, 2026 15:11 — with GitHub Actions Inactive
Postgres synced tables:
my_table:
Name: ${resources.postgres_catalogs.my_catalog.catalog_id}.public.trips_synced
URL: [DATABRICKS_URL]/explore/data/$%7Bresources/postgres_catalogs/my_catalog.catalog_id%7D.public.trips_synced
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion, Name and URL can be fixed, right?

[[Repls]]
# Normalize postgres operation IDs (unique per operation).
Old = '/operations/[A-Za-z0-9+/=-]+'
New = '/operations/[OPERATION_ID]'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that ever used anywhere?

@@ -0,0 +1 @@
# All configuration inherited from parent test.toml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is not needed, can be cleaned up

@@ -0,0 +1,54 @@
{
"method": "POST",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot spot any difference between direct and terraform requests files, can be merged?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest.

- Use literal lakebase_test_$UNIQUE_NAME in synced_table_id so bundle
  summary renders resolved Name/URL instead of raw ${resources....} refs.
  Cross-resource string interpolation is only resolved at deploy time
  (see resolve_variable_references.go), so the previous reference made
  Name/URL useless in summary output.
- Drop the unused /operations/[OPERATION_ID] Repl in test.toml.
- Delete recreate/test.toml (was just an inheritance comment).
- Merge the identical out.requests.{direct,terraform}.json into a single
  out.requests.json.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is May 20, 2026 18:30 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 20, 2026 18:30 — with GitHub Actions Inactive
After deploy, BaseResource.ID holds the API resource name
"synced_tables/{catalog}.{schema}.{table}". Strip the prefix and use the
trailing three-part name for GetName and InitializeURL; fall back to
SyncedTableId when ID is not yet populated.

This makes bundle summary show resolved values even when SyncedTableId
in the YAML references another resource field (e.g.
${resources.postgres_catalogs.X.catalog_id}.foo.bar), which is otherwise
never resolved at summary time because cross-resource string
interpolation only runs at deploy time. Revert the test templates to use
the cross-resource ref so the test exercises this path.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is May 20, 2026 18:41 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 20, 2026 18:41 — with GitHub Actions Inactive
pietern added 2 commits May 20, 2026 20:51
GetName already encapsulates the "prefer ID, fall back to SyncedTableId"
logic and returns the three-part id. Call it from InitializeURL instead
of repeating CutPrefix, and rename the misleading "name" local to align
with the comment update.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is May 20, 2026 18:52 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 20, 2026 18:52 — with GitHub Actions Inactive
@pietern pietern requested review from denik and janniklasrose May 20, 2026 19:14
@pietern pietern enabled auto-merge May 20, 2026 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants