Skip to content

Clean up direct-engine state entries for remotely deleted resources removed from config#5496

Merged
simonfaltum merged 7 commits into
mainfrom
simonfaltum/b22-direct-stale-state
Jul 2, 2026
Merged

Clean up direct-engine state entries for remotely deleted resources removed from config#5496
simonfaltum merged 7 commits into
mainfrom
simonfaltum/b22-direct-stale-state

Conversation

@simonfaltum

Copy link
Copy Markdown
Member

Why

Found during a full-repo review of the CLI. With the direct engine, a resource that was deleted remotely and then removed from the bundle config could never leave the deployment state. The plan dropped its Delete entry, so the state cleanup never ran and the stale entry came back on every deploy. For pipelines and dashboards the same scenario was worse: the deploy hard-failed in metadata computation with "failed to compute relative path", and since the state never got cleaned, every following deploy failed the same way.

Changes

Before, deploying after a resource was deleted remotely and removed from config either left a stale state entry forever (jobs) or failed the deploy outright (pipelines, dashboards); now the deploy plans a delete for the stale entry, cleans it from state, and completes.

  • bundle/direct/bundle_plan.go: when planning a Delete and the remote read reports the resource is gone, keep the Delete entry instead of removing it from the plan. Delete() already tolerates missing resources and removes the entry from state when applied.
  • bundle/deployplan/plan.go: remove RemoveEntry, whose only caller was the code path above.
  • bundle/deploy/metadata/compute.go: add the missing l.File == "" guard to the pipelines and dashboards loops, matching the guard the jobs loop received in direct: handle remotely deleted resources #3710. State-only resources have no config location, and filepath.Rel errors on an empty path.

A side effect visible in test output: bundle destroy with the direct engine now lists a remotely deleted resource under "The following resources will be deleted", since its state entry is now explicitly cleaned instead of silently dropped.

Test plan

  • New acceptance test acceptance/bundle/resources/jobs/remote_delete/removed_from_config covering deploy, remote delete of a job and a pipeline, config removal, deploy twice, and summary, for both engines. With the direct engine the first deploy cleans the stale entries and the second deploy and summary are clean. The terraform variant also exercises the new metadata guard, since terraform keeps the gone resources in its state.
  • New unit test TestComputeMetadataMutatorStateOnlyResources verifying state-only jobs, pipelines, and dashboards are skipped without error.
  • Unit tests for bundle/direct, bundle/deployplan, and bundle/deploy/metadata.
  • Existing acceptance tests around the changed paths (jobs/remote_delete, volumes/remote-delete, volumes/remote-change-name, vector_search_indexes/drift, permissions, deployment/bind, deployment/unbind, bundle/destroy, bundle/generate, bundle/invariant, clusters, quality_monitors, registered_models, synced tables). Only jobs/remote_delete/destroy changed output (direct destroy now lists the gone job) and was regenerated with -update.
  • ./task fmt-q, ./task lint-q, ./task checks.

This pull request and its description were written by Isaac.

… from config

When a resource was deleted remotely and also removed from the bundle config, CalculatePlan dropped the Delete entry from the plan, so the stale state entry was never removed and reappeared on every deploy. Keep the Delete entry instead; Delete() already tolerates resources that are gone and removes the entry from state. Also add the missing location guard in metadata.Compute for pipelines and dashboards (jobs got it in #3710), which made the same scenario hard-fail the deploy.

Co-authored-by: Isaac
@simonfaltum simonfaltum requested a review from denik June 9, 2026 19:59
@simonfaltum simonfaltum temporarily deployed to test-trigger-is June 9, 2026 19:59 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is June 9, 2026 19:59 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 2b34822

Run: 28579395758

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 3 13 230 1041 5:19
🟨​ aws windows 7 3 13 232 1039 6:50
💚​ aws-ucws linux 10 13 314 959 4:51
💚​ aws-ucws windows 10 13 316 957 4:00
💚​ azure linux 4 15 230 1040 4:18
💚​ azure windows 4 15 232 1038 3:36
💚​ azure-ucws linux 4 15 316 956 4:49
💚​ azure-ucws windows 4 15 318 954 3:48
💚​ gcp linux 4 15 229 1042 3:55
💚​ gcp windows 4 15 231 1040 3:31
23 interesting tests: 13 SKIP, 7 KNOWN, 3 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestFetchRepositoryInfoAPI_FromRepo 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
💚​ TestFetchRepositoryInfoAPI_FromRepo/root 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
💚​ TestFetchRepositoryInfoAPI_FromRepo/subdir 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
Top 4 slowest tests (at least 2 minutes):
duration env testname
2:53 azure-ucws windows TestAccept
2:43 azure windows TestAccept
2:42 gcp windows TestAccept
2:38 aws-ucws windows TestAccept

Co-authored-by: Isaac
@simonfaltum

Copy link
Copy Markdown
Member Author

The direct engine now keeps the delete entry for a remotely deleted resource so its stale state entry gets cleaned up during destroy, which makes the destroy output of acceptance/bundle/resources/dashboards/delete-trashed-out-of-band legitimately diverge per engine. Split that section into out.destroy.terraform.txt and out.destroy.direct.txt per the repo convention, same as the existing jobs/remote_delete/destroy test.

@denik denik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, thanks!

Comment thread bundle/deploy/metadata/compute.go Outdated
Comment thread bundle/direct/bundle_plan.go Outdated
Per review feedback: planning now sets gone on a Delete entry when the
remote read returns 404. Applying such an entry removes it from state
without calling the delete API, and deploy/destroy approval prompts no
longer list it as a deletion or ask for approval, since nothing is
actually deleted.

With that the destroy output for remotely deleted resources matches the
terraform engine again, so the per-engine out.destroy split is reverted
for delete-trashed-out-of-band and the removed_from_config test now also
asserts that no delete API calls are made for gone resources.

Also reword the state-only resource comments in metadata.Compute to
explain the statemgmt.Load injection that makes the guards necessary.

Co-authored-by: Isaac
@simonfaltum

Copy link
Copy Markdown
Member Author

Addressed the review feedback in a01eb5d:

  • Delete entries for resources that no longer exist remotely are now marked gone: true in the plan. Apply removes them from state without calling the delete API, and deploy/destroy approvals skip them (no deletion listing, no approval prompt, no prevent_destroy error), since nothing is actually deleted.
  • With gone entries filtered from the destroy listing, the destroy output for delete-trashed-out-of-band no longer diverges per engine, so the out.destroy.<engine>.txt split from the earlier revision is reverted, and the jobs/remote_delete/destroy output is back to what main has.
  • removed_from_config now records requests and asserts that no delete API calls are made for the gone resources.
  • Reworded the state-only resource comments in metadata.Compute to explain the statemgmt.Load injection that makes the guards necessary.

Validation: ./task fmt-q, ./task lint-q, ./task checks, unit tests for bundle/deployplan, bundle/direct, bundle/deploy/metadata, bundle/phases, and the full TestAccept/bundle acceptance suite pass locally.

@simonfaltum simonfaltum requested a review from denik June 12, 2026 10:38
@simonfaltum simonfaltum requested review from janniklasrose and removed request for denik June 22, 2026 10:34
@simonfaltum simonfaltum requested a review from denik July 2, 2026 09:09
…t-stale-state

# Conflicts:
#	bundle/direct/bundle_apply.go
#	bundle/direct/bundle_plan.go
@simonfaltum simonfaltum temporarily deployed to test-trigger-is July 2, 2026 09:21 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is July 2, 2026 09:21 — with GitHub Actions Inactive
@simonfaltum simonfaltum enabled auto-merge July 2, 2026 09:56
@simonfaltum simonfaltum added this pull request to the merge queue Jul 2, 2026
Merged via the queue into main with commit b4fe518 Jul 2, 2026
25 checks passed
@simonfaltum simonfaltum deleted the simonfaltum/b22-direct-stale-state branch July 2, 2026 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants