Skip to content

Commit 2225149

Browse files
engine/direct: recover from a failed Create during Recreate (#5173)
## Changes * `bundle/direct/apply.go`: `Recreate` now drops the deployment state entry (`db.DeleteState`) between `DoDelete` and the follow-up `Create`, instead of `db.SaveState(key, "", nil, nil)`. * `bundle/direct/bundle_plan.go`: Treat an existing state entry whose `__id__` is empty as missing, so the next plan re-plans `Create` instead of erroring with `invalid state: empty id`. This covers state files written by pre-fix CLIs. * `acceptance/bundle/resources/vector_search_endpoints/recreate/create-fails/`: New test that triggers the failure path end-to-end by renaming `my_endpoint` onto a sibling endpoint's name and switching its `endpoint_type`. The first Recreate's `Create` 409s on the conflict; the next `bundle plan` recovers cleanly. ## Why A direct-engine `Recreate` was a `DoDelete` → `SaveState(key, "", nil, nil)` → `Create` sequence. If the follow-up `Create` failed for any reason (in our reproducer: a name collision against another bundle resource), `Finalize` persisted a state row with `__id__ == ""`. Every subsequent `bundle plan` then refused to proceed (`invalid state: empty id`) and `bundle destroy` couldn't recover either, leaving the bundle in a broken state until the user hand-edited `resources.json`. Dropping the state entry up front means a failed `Create` simply looks like "no state for this resource" on the next plan, which is the natural recovery path. The planner-side tolerance handles state files already written by older CLIs. ## Tests * New acceptance test `bundle/resources/vector_search_endpoints/recreate/create-fails` exercises the full path: initial deploy, Recreate triggered by `endpoint_type` change, `Create` 409 from a name collision with `blocker_endpoint`, then `bundle plan` showing `create my_endpoint` and `bundle destroy` cleaning up. * `go test ./bundle/...` passes. * `./task lint` passes. * `./task test` had unrelated local failures (Python `databricks-bundles` module not installed in the fresh worktree's venv, surfacing in pydabs/invariant tests); CI should not hit that. _PR description drafted with Claude Code._
1 parent 5853c9a commit 2225149

8 files changed

Lines changed: 90 additions & 7 deletions

File tree

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
* JSON output for single objects now uses standard `"key": "value"` spacing (matching list output and `encoding/json` defaults).
99

1010
### Bundles
11+
* engine/direct: Drop the deployment state entry on a recreate before the follow-up `Create`, so a `Create` failure no longer leaves a broken state with `invalid state: empty id` on the next `bundle plan` ([#5173](https://github.com/databricks/cli/pull/5173)).
1112

1213
### Dependency updates
1314

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
bundle:
2+
name: recreate-create-fails-$UNIQUE_NAME
3+
4+
sync:
5+
paths: []
6+
7+
resources:
8+
vector_search_endpoints:
9+
my_endpoint:
10+
name: vs-endpoint-a-$UNIQUE_NAME
11+
endpoint_type: STANDARD
12+
blocker_endpoint:
13+
name: vs-endpoint-b-$UNIQUE_NAME
14+
endpoint_type: STORAGE_OPTIMIZED

acceptance/bundle/resources/vector_search_endpoints/recreate/create-fails/out.test.toml

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
2+
=== Initial deploy creates two endpoints with distinct names
3+
>>> [CLI] bundle deploy
4+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/recreate-create-fails-[UNIQUE_NAME]/default/files...
5+
Deploying resources...
6+
Updating deployment state...
7+
Deployment complete!
8+
9+
=== Edit my_endpoint: rename onto blocker_endpoint's name and switch endpoint_type to trigger Recreate
10+
>>> update_file.py databricks.yml vs-endpoint-a-[UNIQUE_NAME] vs-endpoint-b-[UNIQUE_NAME]
11+
12+
>>> update_file.py databricks.yml endpoint_type: STANDARD endpoint_type: STORAGE_OPTIMIZED
13+
14+
=== Deploy: Recreate of my_endpoint runs Delete (ok) then Create (409, name taken by blocker)
15+
>>> [CLI] bundle deploy --auto-approve
16+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/recreate-create-fails-[UNIQUE_NAME]/default/files...
17+
Deploying resources...
18+
Error: cannot recreate resources.vector_search_endpoints.my_endpoint: Vector search endpoint with name vs-endpoint-b-[UNIQUE_NAME] already exists (409 RESOURCE_ALREADY_EXISTS)
19+
20+
Endpoint: POST [DATABRICKS_URL]/api/2.0/vector-search/endpoints
21+
HTTP Status: 409 Conflict
22+
API error_code: RESOURCE_ALREADY_EXISTS
23+
API message: Vector search endpoint with name vs-endpoint-b-[UNIQUE_NAME] already exists
24+
25+
Updating deployment state...
26+
27+
Exit code: 1
28+
29+
=== Subsequent plan recovers: my_endpoint state was dropped, replan as Create
30+
>>> [CLI] bundle plan
31+
create vector_search_endpoints.my_endpoint
32+
33+
Plan: 1 to add, 0 to change, 0 to delete, 1 unchanged
34+
35+
>>> [CLI] bundle destroy --auto-approve
36+
The following resources will be deleted:
37+
delete resources.vector_search_endpoints.blocker_endpoint
38+
39+
All files and directories at the following location will be deleted: /Workspace/Users/[USERNAME]/.bundle/recreate-create-fails-[UNIQUE_NAME]/default
40+
41+
Deleting files...
42+
Destroy complete!
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
envsubst < databricks.yml.tmpl > databricks.yml
2+
3+
cleanup() {
4+
trace $CLI bundle destroy --auto-approve || true
5+
rm -f out.requests.txt
6+
}
7+
trap cleanup EXIT
8+
9+
title "Initial deploy creates two endpoints with distinct names"
10+
trace $CLI bundle deploy
11+
12+
title "Edit my_endpoint: rename onto blocker_endpoint's name and switch endpoint_type to trigger Recreate"
13+
trace update_file.py databricks.yml "vs-endpoint-a-$UNIQUE_NAME" "vs-endpoint-b-$UNIQUE_NAME"
14+
trace update_file.py databricks.yml " endpoint_type: STANDARD" " endpoint_type: STORAGE_OPTIMIZED"
15+
16+
title "Deploy: Recreate of my_endpoint runs Delete (ok) then Create (409, name taken by blocker)"
17+
errcode trace $CLI bundle deploy --auto-approve
18+
19+
title "Subsequent plan recovers: my_endpoint state was dropped, replan as Create"
20+
trace $CLI bundle plan
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Cloud = false

bundle/direct/apply.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,9 @@ func (d *DeploymentUnit) Recreate(ctx context.Context, db *dstate.DeploymentStat
8686
return fmt.Errorf("deleting old id=%s: %w", oldID, err)
8787
}
8888

89-
err = db.SaveState(d.ResourceKey, "", nil, nil)
89+
// Drop the state entry so a subsequent failure of Create leaves no malformed
90+
// (empty-id) entry behind. The next plan will see "no state" and retry as Create.
91+
err = db.DeleteState(d.ResourceKey)
9092
if err != nil {
9193
return fmt.Errorf("deleting state: %w", err)
9294
}

bundle/direct/bundle_plan.go

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -181,16 +181,15 @@ func (b *DeploymentBundle) CalculatePlan(ctx context.Context, client *databricks
181181
}
182182

183183
dbentry, hasEntry := b.StateDB.GetResourceEntry(resourceKey)
184-
if !hasEntry {
184+
// Tolerate empty-id entries from older partial-recreate failures
185+
// (apply.Recreate now deletes state on the way through, but pre-fix
186+
// state files may still carry a malformed entry). Treat as missing
187+
// and let the resource be re-created on this plan.
188+
if !hasEntry || dbentry.ID == "" {
185189
entry.Action = deployplan.Create
186190
return true
187191
}
188192

189-
if dbentry.ID == "" {
190-
logdiag.LogError(ctx, fmt.Errorf("%s: invalid state: empty id", errorPrefix))
191-
return false
192-
}
193-
194193
savedState, err := parseState(adapter.StateType(), dbentry.State)
195194
if err != nil {
196195
logdiag.LogError(ctx, fmt.Errorf("%s: interpreting state: %w", errorPrefix, err))

0 commit comments

Comments
 (0)