Skip to content

Commit 05335fc

Browse files
authored
Fix task renaming in config-remote-sync when tasks are defined in multiple files (#4459)
## Changes Respect index-shifting occurs when tasks (or other sequences that may have merging behavior) are merged from multiple files. The previous logic incorrectly assumed that they could only be defined in one file. Also, new logic includes generic sorting issues when tasks are sorted by mutators, and it's not in sync with the way the defined in the YAML I also updated tests to simulate such issues ## Why Sync was failing when tasks were defined in multiple files, and sorting after mutators was different from what we have in the config ## Tests Added more tests for tasks, also updated multi-file test case <!-- If your PR needs to be included in the release notes for next release, add a separate entry in NEXT_CHANGELOG.md as part of your PR. -->
1 parent 1bdf869 commit 05335fc

11 files changed

Lines changed: 305 additions & 102 deletions

File tree

acceptance/bundle/config-remote-sync/job_multiple_tasks/databricks.yml.tmpl

Lines changed: 32 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,56 +5,72 @@ resources:
55
jobs:
66
my_job:
77
tasks:
8-
- task_key: task1
8+
- task_key: b_task
99
notebook_task:
10-
notebook_path: /Users/{{workspace_user_name}}/task1
10+
notebook_path: /Users/{{workspace_user_name}}/b_task
1111
new_cluster:
1212
spark_version: $DEFAULT_SPARK_VERSION
1313
node_type_id: $NODE_TYPE_ID
1414
num_workers: 1
15-
- task_key: task2
15+
- task_key: d_task
1616
notebook_task:
17-
notebook_path: /Users/{{workspace_user_name}}/task2
17+
notebook_path: /Users/{{workspace_user_name}}/d_task
1818
new_cluster:
1919
spark_version: $DEFAULT_SPARK_VERSION
2020
node_type_id: $NODE_TYPE_ID
2121
num_workers: 2
2222
depends_on:
23-
- task_key: task1
24-
- task_key: task3
23+
- task_key: b_task
24+
- task_key: c_task
2525
notebook_task:
26-
notebook_path: /Users/{{workspace_user_name}}/task3
26+
notebook_path: /Users/{{workspace_user_name}}/c_task
2727
new_cluster:
2828
spark_version: $DEFAULT_SPARK_VERSION
2929
node_type_id: $NODE_TYPE_ID
3030
num_workers: 2
3131
depends_on:
32-
- task_key: task2
33-
- task_key: task4
32+
- task_key: d_task
33+
- task_key: a_task
3434
notebook_task:
35-
notebook_path: /Users/{{workspace_user_name}}/task4
35+
notebook_path: /Users/{{workspace_user_name}}/a_task
3636
new_cluster:
3737
spark_version: $DEFAULT_SPARK_VERSION
3838
node_type_id: $NODE_TYPE_ID
3939
num_workers: 1
4040
depends_on:
41-
- task_key: task3
41+
- task_key: c_task
4242

4343
rename_task_job:
4444
tasks:
45-
- task_key: task_rename_1
45+
- task_key: b_task
4646
notebook_task:
47-
notebook_path: /Users/{{workspace_user_name}}/rename_task_1
47+
notebook_path: /Users/{{workspace_user_name}}/b_task
4848
new_cluster:
4949
spark_version: $DEFAULT_SPARK_VERSION
5050
node_type_id: $NODE_TYPE_ID
5151
num_workers: 1
52-
- task_key: task_rename_2
52+
- task_key: d_task
53+
depends_on:
54+
- task_key: b_task
5355
notebook_task:
54-
notebook_path: /Users/{{workspace_user_name}}/rename_task_2
56+
notebook_path: /Users/{{workspace_user_name}}/d_task
5557
new_cluster:
5658
spark_version: $DEFAULT_SPARK_VERSION
5759
node_type_id: $NODE_TYPE_ID
5860
num_workers: 1
61+
- task_key: c_task
5962
depends_on:
60-
- task_key: task_rename_1
63+
- task_key: b_task
64+
notebook_task:
65+
notebook_path: /Users/{{workspace_user_name}}/c_task
66+
new_cluster:
67+
spark_version: $DEFAULT_SPARK_VERSION
68+
node_type_id: $NODE_TYPE_ID
69+
num_workers: 1
70+
- task_key: a_task
71+
notebook_task:
72+
notebook_path: /Users/{{workspace_user_name}}/a_task
73+
new_cluster:
74+
spark_version: $DEFAULT_SPARK_VERSION
75+
node_type_id: $NODE_TYPE_ID
76+
num_workers: 1

acceptance/bundle/config-remote-sync/job_multiple_tasks/output.txt

Lines changed: 37 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,20 @@ Deploying resources...
33
Updating deployment state...
44
Deployment complete!
55

6-
=== Modify only 'process' task num_workers and add timeout
6+
=== Modify c_task, remove d_task, add e_task
77
=== Detect and save changes
88
Detected changes in 1 resource(s):
99

1010
Resource: resources.jobs.my_job
11-
tasks[task_key='new_task']: add
12-
tasks[task_key='task2']: remove
13-
tasks[task_key='task3'].depends_on[0].task_key: replace
14-
tasks[task_key='task3'].new_cluster.num_workers: replace
15-
tasks[task_key='task3'].timeout_seconds: add
11+
tasks[task_key='c_task'].depends_on[0].task_key: replace
12+
tasks[task_key='c_task'].new_cluster.num_workers: replace
13+
tasks[task_key='c_task'].timeout_seconds: add
14+
tasks[task_key='d_task']: remove
15+
tasks[task_key='e_task']: add
1616

1717

1818

19-
=== Configuration changes for new task
19+
=== Configuration changes
2020

2121
>>> diff.py databricks.yml.backup databricks.yml
2222
--- databricks.yml.backup
@@ -30,37 +30,37 @@ Resource: resources.jobs.my_job
3030
@@ -13,13 +12,11 @@
3131
node_type_id: [NODE_TYPE_ID]
3232
num_workers: 1
33-
- - task_key: task2
33+
- - task_key: d_task
3434
+ - new_cluster:
3535
+ node_type_id: [NODE_TYPE_ID]
3636
+ num_workers: 1
3737
+ spark_version: 13.3.x-snapshot-scala2.12
3838
notebook_task:
39-
- notebook_path: /Users/{{workspace_user_name}}/task2
39+
- notebook_path: /Users/{{workspace_user_name}}/d_task
4040
- new_cluster:
4141
- spark_version: 13.3.x-snapshot-scala2.12
4242
- node_type_id: [NODE_TYPE_ID]
4343
- num_workers: 2
4444
- depends_on:
45-
- - task_key: task1
46-
+ notebook_path: /Users/[USERNAME]/new_task
47-
+ task_key: new_task
48-
- task_key: task3
45+
- - task_key: b_task
46+
+ notebook_path: /Users/[USERNAME]/e_task
47+
+ task_key: e_task
48+
- task_key: c_task
4949
notebook_task:
5050
@@ -28,7 +25,8 @@
5151
spark_version: 13.3.x-snapshot-scala2.12
5252
node_type_id: [NODE_TYPE_ID]
5353
- num_workers: 2
5454
+ num_workers: 5
5555
depends_on:
56-
- - task_key: task2
57-
+ - task_key: task1
56+
- - task_key: d_task
57+
+ - task_key: b_task
5858
+ timeout_seconds: 3600
59-
- task_key: task4
59+
- task_key: a_task
6060
notebook_task:
6161
@@ -40,5 +38,4 @@
6262
depends_on:
63-
- task_key: task3
63+
- task_key: c_task
6464
-
6565
rename_task_job:
6666
tasks:
@@ -69,14 +69,15 @@ Deploying resources...
6969
Updating deployment state...
7070
Deployment complete!
7171

72-
=== Rename task_rename_1 to task_rename_new
72+
=== Rename b_task to b_task_renamed (4 tasks, 2 with depends_on, 1 without)
7373
=== Detect task key rename
7474
Detected changes in 1 resource(s):
7575

7676
Resource: resources.jobs.rename_task_job
77-
tasks[task_key='task_rename_1']: remove
78-
tasks[task_key='task_rename_2'].depends_on[0].task_key: replace
79-
tasks[task_key='task_rename_new']: add
77+
tasks[task_key='b_task']: remove
78+
tasks[task_key='b_task_renamed']: add
79+
tasks[task_key='c_task'].depends_on[0].task_key: replace
80+
tasks[task_key='d_task'].depends_on[0].task_key: replace
8081

8182

8283

@@ -85,25 +86,31 @@ Resource: resources.jobs.rename_task_job
8586
>>> diff.py databricks.yml.backup2 databricks.yml
8687
--- databricks.yml.backup2
8788
+++ databricks.yml
88-
@@ -40,11 +40,11 @@
89+
@@ -40,14 +40,14 @@
8990
rename_task_job:
9091
tasks:
91-
- - task_key: task_rename_1
92+
- - task_key: b_task
9293
+ - new_cluster:
9394
+ node_type_id: [NODE_TYPE_ID]
9495
+ num_workers: 1
9596
+ spark_version: 13.3.x-snapshot-scala2.12
9697
notebook_task:
97-
notebook_path: /Users/{{workspace_user_name}}/rename_task_1
98+
notebook_path: /Users/{{workspace_user_name}}/b_task
9899
- new_cluster:
99100
- spark_version: 13.3.x-snapshot-scala2.12
100101
- node_type_id: [NODE_TYPE_ID]
101102
- num_workers: 1
102-
+ task_key: task_rename_new
103-
- task_key: task_rename_2
103+
+ task_key: b_task_renamed
104+
- task_key: d_task
105+
depends_on:
106+
- - task_key: b_task
107+
+ - task_key: b_task_renamed
104108
notebook_task:
105-
@@ -55,3 +55,3 @@
106-
num_workers: 1
109+
notebook_path: /Users/{{workspace_user_name}}/d_task
110+
@@ -58,5 +58,5 @@
111+
- task_key: c_task
107112
depends_on:
108-
- - task_key: task_rename_1
109-
+ - task_key: task_rename_new
113+
- - task_key: b_task
114+
+ - task_key: b_task_renamed
115+
notebook_task:
116+
notebook_path: /Users/{{workspace_user_name}}/c_task

acceptance/bundle/config-remote-sync/job_multiple_tasks/script

Lines changed: 15 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,20 @@ $CLI bundle deploy
77
job_id="$(read_id.py my_job)"
88

99

10-
title "Modify only 'process' task num_workers and add timeout"
10+
title "Modify c_task, remove d_task, add e_task"
1111
edit_resource.py jobs $job_id <<EOF
1212
for task in r["tasks"]:
13-
if task["task_key"] == "task3":
13+
if task["task_key"] == "c_task":
1414
task["new_cluster"]["num_workers"] = 5
1515
task["timeout_seconds"] = 3600
16-
task["depends_on"] = [{"task_key": "task1"}]
16+
task["depends_on"] = [{"task_key": "b_task"}]
1717
18-
r["tasks"] = [task for task in r["tasks"] if task["task_key"] != "task2"]
18+
r["tasks"] = [task for task in r["tasks"] if task["task_key"] != "d_task"]
1919
2020
r["tasks"].append({
21-
"task_key": "new_task",
21+
"task_key": "e_task",
2222
"notebook_task": {
23-
"notebook_path": "/Users/${CURRENT_USER_NAME}/new_task"
23+
"notebook_path": "/Users/${CURRENT_USER_NAME}/e_task"
2424
},
2525
"new_cluster": {
2626
"spark_version": "${DEFAULT_SPARK_VERSION}",
@@ -35,7 +35,7 @@ echo
3535
cp databricks.yml databricks.yml.backup
3636
$CLI bundle config-remote-sync --save
3737

38-
title "Configuration changes for new task"
38+
title "Configuration changes"
3939
echo
4040
trace diff.py databricks.yml.backup databricks.yml
4141
rm databricks.yml.backup
@@ -47,27 +47,24 @@ mv databricks.yml.resolved databricks.yml
4747
# Deploy the updated configuration to sync state
4848
$CLI bundle deploy
4949

50-
title "Rename task_rename_1 to task_rename_new"
50+
title "Rename b_task to b_task_renamed (4 tasks, 2 with depends_on, 1 without)"
5151
rename_job_id="$(read_id.py rename_task_job)"
5252
edit_resource.py jobs $rename_job_id <<'EOF'
5353
for task in r["tasks"]:
54-
if task["task_key"] == "task_rename_1":
55-
task["task_key"] = "task_rename_new"
56-
# Update dependencies that reference the old key
54+
if task["task_key"] == "b_task":
55+
task["task_key"] = "b_task_renamed"
5756
if "depends_on" in task:
5857
for dep in task["depends_on"]:
59-
if dep["task_key"] == "task_rename_1":
60-
dep["task_key"] = "task_rename_new"
58+
if dep["task_key"] == "b_task":
59+
dep["task_key"] = "b_task_renamed"
6160
EOF
6261

6362
title "Detect task key rename"
6463
echo
6564
cp databricks.yml databricks.yml.backup2
66-
$CLI bundle config-remote-sync --save || true
65+
$CLI bundle config-remote-sync --save
6766

6867
title "Configuration changes for task key rename"
6968
echo
70-
if [ -f databricks.yml.backup2 ]; then
71-
trace diff.py databricks.yml.backup2 databricks.yml || true
72-
rm databricks.yml.backup2
73-
fi
69+
trace diff.py databricks.yml.backup2 databricks.yml
70+
rm databricks.yml.backup2

acceptance/bundle/config-remote-sync/multiple_files/databricks.yml.tmpl

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,18 @@ bundle:
33

44
include:
55
- "resources/*.yml"
6+
7+
targets:
8+
dev:
9+
resources:
10+
jobs:
11+
job_one:
12+
max_concurrent_runs: 1
13+
tasks:
14+
- task_key: b_task
15+
notebook_task:
16+
notebook_path: /Users/{{workspace_user_name}}/b_task
17+
new_cluster:
18+
spark_version: $DEFAULT_SPARK_VERSION
19+
node_type_id: $NODE_TYPE_ID
20+
num_workers: 1

0 commit comments

Comments
 (0)