Skip to content

Commit bd97738

Browse files
feat: add generated databricks delta autocdc pipeline (#27)
* feat: add generated databricks delta autocdc pipeline * fix: correct autocdc pipeline env override name * fix: pass shared bundle vars to delta pipeline commands * fix: define silver schema in shared delta commands * fix: keep release builds locked without cargo version churn
1 parent 7ca81a0 commit bd97738

15 files changed

Lines changed: 352 additions & 40 deletions

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,11 @@ The Delta path creates and updates:
105105

106106
The silver schema is expected to stay empty until you stand up a Lakeflow `AUTO CDC` pipeline for the tables you actually want to materialize there.
107107

108+
```bash
109+
just databricks-delta-deploy-pipeline DEFAULT prod
110+
just databricks-delta-run-pipeline DEFAULT prod
111+
```
112+
108113
Reference Databricks over S3 flow:
109114

110115
```bash

docs/monitoring.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -26,26 +26,28 @@ The first dashboard focuses on:
2626
- recent checkpoint history
2727
- bronze and silver table inventory
2828

29-
## Why Silver Is Empty
29+
## AUTO CDC Status
3030

31-
Silver is still expected to be empty until a real Lakeflow `AUTO CDC` pipeline
32-
is deployed.
31+
The current example source now has a real generated Lakeflow `AUTO CDC`
32+
pipeline. Deploy and run it with:
33+
34+
```bash
35+
just databricks-delta-deploy-pipeline DEFAULT prod
36+
just databricks-delta-run-pipeline DEFAULT prod
37+
```
38+
39+
For a newly onboarded source, silver will stay empty until you run that same
40+
deploy/run sequence for that source.
3341

3442
What exists today:
3543

3644
- control schema and checkpoint tables
3745
- bronze CDC landing
3846
- Delta extractor job
3947
- Lakeflow SQL template for per-table `AUTO CDC`
48+
- generated per-source pipeline scripts
4049

4150
What is still missing:
4251

43-
- a source-aware pipeline generation and deploy path that turns the bronze
44-
tables for a specific source into a real deployed Lakeflow pipeline
45-
46-
The generic blocker is not Lakeflow itself. It is the missing code path that:
47-
48-
1. enumerates the bronze tables for one source
49-
2. decides the silver target names consistently
50-
3. renders the per-table `AUTO CDC` SQL
51-
4. deploys or updates the pipeline repeatably
52+
- only the deploy/run step for any additional source you onboard beyond the
53+
current example profile

justfile

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,15 @@ databricks-delta-render-dashboard output_file:
8787
databricks-delta-publish-dashboard profile warehouse_id dashboard_id="":
8888
./scripts/publish-databricks-delta-dashboard.sh {{profile}} {{warehouse_id}} {{dashboard_id}}
8989

90+
databricks-delta-render-pipeline profile output_file:
91+
./scripts/render-databricks-delta-pipeline.sh {{profile}} {{output_file}}
92+
93+
databricks-delta-deploy-pipeline profile="DEFAULT" target="prod":
94+
./scripts/deploy-databricks-delta-pipeline.sh {{profile}} {{target}}
95+
96+
databricks-delta-run-pipeline profile="DEFAULT" target="prod":
97+
./scripts/run-databricks-delta-pipeline.sh {{profile}} {{target}}
98+
9099
databricks-delta-deploy profile="DEFAULT" target="dev":
91100
./scripts/deploy-databricks-delta.sh {{profile}} {{target}}
92101

platform/databricks/delta/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,9 @@ Bundle lifecycle:
4242
- `scripts/bootstrap-databricks-delta.sh <profile> <warehouse_id>`
4343
- `scripts/render-databricks-delta-dashboard.sh <output_file>`
4444
- `scripts/publish-databricks-delta-dashboard.sh <profile> <warehouse_id> [dashboard_id]`
45+
- `scripts/render-databricks-delta-pipeline.sh <profile> <output_file>`
46+
- `scripts/deploy-databricks-delta-pipeline.sh <profile> <target>`
47+
- `scripts/run-databricks-delta-pipeline.sh <profile> <target>`
4548
- `scripts/deploy-databricks-delta.sh <profile> <target>`
4649
- `scripts/run-databricks-delta-job.sh <profile> <target> [job_key]`
4750
- `scripts/run-databricks-delta-smoke.sh <profile> <target> <warehouse_id>`
@@ -109,6 +112,8 @@ Recommended operator entrypoints:
109112
just databricks-delta-sync-secret
110113
just databricks-delta-bootstrap <warehouse_id>
111114
just databricks-delta-publish-dashboard DEFAULT <warehouse_id>
115+
just databricks-delta-deploy-pipeline DEFAULT prod
116+
just databricks-delta-run-pipeline DEFAULT prod
112117
just databricks-delta-deploy
113118
just databricks-delta-run
114119
just databricks-delta-smoke <warehouse_id>

platform/databricks/delta/databricks.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@ bundle:
55
include:
66
- resources/*.yml
77

8+
sync:
9+
include:
10+
- generated/*.sql
11+
812
variables:
913
source_slug:
1014
description: Stable source slug used for deployment names and workspace paths.
@@ -28,8 +32,14 @@ variables:
2832
description: Schema that owns the checkpoint table.
2933
bronze_schema:
3034
description: Schema that owns bronze CDC tables.
35+
silver_schema:
36+
description: Schema that owns silver current-state tables.
3137
checkpoint_table:
3238
description: Checkpoint table name.
39+
autocdc_pipeline_name:
40+
description: Databricks Lakeflow pipeline name for bronze-to-silver AUTO CDC.
41+
autocdc_pipeline_file:
42+
description: Generated SQL file name for the Lakeflow AUTO CDC pipeline.
3343

3444
targets:
3545
dev:
@@ -49,7 +59,10 @@ targets:
4959
catalog: workspace
5060
control_schema: convex_sync_kit_meshix_api_delta_control
5161
bronze_schema: convex_sync_kit_meshix_api_delta_bronze
62+
silver_schema: convex_sync_kit_meshix_api_delta_silver
5263
checkpoint_table: connector_checkpoint
64+
autocdc_pipeline_name: convex-sync-kit-meshix-api-dev-autocdc
65+
autocdc_pipeline_file: meshix-api-dev-bronze-to-silver.sql
5366
prod:
5467
mode: production
5568
workspace:
@@ -66,4 +79,7 @@ targets:
6679
catalog: workspace
6780
control_schema: convex_sync_kit_meshix_api_delta_control
6881
bronze_schema: convex_sync_kit_meshix_api_delta_bronze
82+
silver_schema: convex_sync_kit_meshix_api_delta_silver
6983
checkpoint_table: connector_checkpoint
84+
autocdc_pipeline_name: convex-sync-kit-meshix-api-prod-autocdc
85+
autocdc_pipeline_file: meshix-api-prod-bronze-to-silver.sql
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.sql
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
resources:
2+
pipelines:
3+
convex_delta_autocdc:
4+
name: ${var.autocdc_pipeline_name}
5+
catalog: ${var.catalog}
6+
schema: ${var.silver_schema}
7+
serverless: true
8+
photon: true
9+
continuous: false
10+
root_path: ${workspace.file_path}
11+
libraries:
12+
- file:
13+
path: ${workspace.file_path}/generated/${var.autocdc_pipeline_file}

release-please-config.json

Lines changed: 1 addition & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -10,34 +10,7 @@
1010
"release-type": "simple",
1111
"package-name": "convex-sync-kit",
1212
"changelog-path": "CHANGELOG.md",
13-
"include-component-in-tag": false,
14-
"extra-files": [
15-
{
16-
"type": "toml",
17-
"path": "Cargo.toml",
18-
"jsonpath": "$.workspace.package.version"
19-
},
20-
{
21-
"type": "toml",
22-
"path": "apps/convex-sync/Cargo.toml",
23-
"jsonpath": "$.package.version"
24-
},
25-
{
26-
"type": "toml",
27-
"path": "apps/convex-inspect/Cargo.toml",
28-
"jsonpath": "$.package.version"
29-
},
30-
{
31-
"type": "toml",
32-
"path": "crates/convex-sync-core/Cargo.toml",
33-
"jsonpath": "$.package.version"
34-
},
35-
{
36-
"type": "toml",
37-
"path": "crates/convex-export-s3/Cargo.toml",
38-
"jsonpath": "$.package.version"
39-
}
40-
]
13+
"include-component-in-tag": false
4114
}
4215
}
4316
}
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
if [[ "$#" -ne 2 ]]; then
5+
echo "usage: $0 <profile> <target>" >&2
6+
exit 1
7+
fi
8+
9+
profile="$1"
10+
target="$2"
11+
bundle_engine="${DATABRICKS_BUNDLE_ENGINE:-direct}"
12+
13+
repo_root="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
14+
bundle_root="$repo_root/platform/databricks/delta"
15+
16+
# shellcheck source=/dev/null
17+
source "$repo_root/scripts/load-source-config.sh"
18+
load_convex_sync_source_config "$repo_root"
19+
20+
read_env_file_value() {
21+
local key="$1"
22+
local env_file="$repo_root/.env"
23+
if [[ ! -f "$env_file" ]]; then
24+
return 1
25+
fi
26+
local line
27+
line="$(grep -E "^${key}=" "$env_file" | tail -n 1 || true)"
28+
if [[ -z "$line" ]]; then
29+
return 1
30+
fi
31+
printf '%s' "${line#*=}"
32+
}
33+
34+
deployment_url="${CONVEX_DEPLOYMENT_URL:-$(read_env_file_value CONVEX_DEPLOYMENT_URL || true)}"
35+
if [[ -z "$deployment_url" ]]; then
36+
echo "CONVEX_DEPLOYMENT_URL is required" >&2
37+
exit 1
38+
fi
39+
40+
source_id="${CONVEX_SOURCE_ID:-$deployment_url}"
41+
source_slug="${CONVEX_SYNC_SOURCE_SLUG:-default}"
42+
source_slug_sql="${CONVEX_SYNC_SOURCE_SQL:-${source_slug//-/_}}"
43+
deployment_slug="${DATABRICKS_DELTA_DEPLOYMENT_SLUG:-${source_slug}-${target}}"
44+
job_name="${DATABRICKS_DELTA_JOB_NAME:-convex-sync-kit-${deployment_slug}-delta-extract}"
45+
pipeline_name="${DATABRICKS_DELTA_AUTOCDC_PIPELINE_NAME:-convex-sync-kit-${deployment_slug}-autocdc}"
46+
pipeline_file="${DATABRICKS_DELTA_AUTOCDC_PIPELINE_FILE:-${deployment_slug}-bronze-to-silver.sql}"
47+
generated_file="$bundle_root/generated/$pipeline_file"
48+
49+
"$repo_root/scripts/render-databricks-delta-pipeline.sh" "$profile" "$generated_file" >/dev/null
50+
51+
catalog="${DATABRICKS_DELTA_CATALOG:-workspace}"
52+
table_name="${CONVEX_TABLE_NAME:-}"
53+
secret_scope="${DATABRICKS_DELTA_SECRET_SCOPE:-convex-sync-kit}"
54+
secret_key="${DATABRICKS_DELTA_SECRET_KEY:-convex-deploy-key}"
55+
control_schema="${DATABRICKS_DELTA_CONTROL_SCHEMA:-convex_sync_kit_${source_slug_sql}_delta_control}"
56+
bronze_schema="${DATABRICKS_DELTA_BRONZE_SCHEMA:-convex_sync_kit_${source_slug_sql}_delta_bronze}"
57+
silver_schema="${DATABRICKS_DELTA_SILVER_SCHEMA:-convex_sync_kit_${source_slug_sql}_delta_silver}"
58+
checkpoint_table="${DATABRICKS_DELTA_CHECKPOINT_TABLE:-connector_checkpoint}"
59+
60+
"$repo_root/scripts/ensure-databricks-delta-secret.sh" "$profile" "$secret_scope" "$secret_key"
61+
62+
bundle_args=(
63+
--var "convex_deployment_url=$deployment_url"
64+
--var "source_slug=$source_slug"
65+
--var "job_name=$job_name"
66+
--var "convex_deploy_key_secret_scope=$secret_scope"
67+
--var "convex_deploy_key_secret_key=$secret_key"
68+
--var "source_id=$source_id"
69+
--var "table_name=$table_name"
70+
--var "catalog=$catalog"
71+
--var "control_schema=$control_schema"
72+
--var "bronze_schema=$bronze_schema"
73+
--var "silver_schema=$silver_schema"
74+
--var "checkpoint_table=$checkpoint_table"
75+
--var "autocdc_pipeline_name=$pipeline_name"
76+
--var "autocdc_pipeline_file=$pipeline_file"
77+
--var "deployment_slug=$deployment_slug"
78+
)
79+
80+
(
81+
cd "$bundle_root"
82+
DATABRICKS_BUNDLE_ENGINE="$bundle_engine" databricks bundle validate -p "$profile" -t "$target" "${bundle_args[@]}"
83+
DATABRICKS_BUNDLE_ENGINE="$bundle_engine" databricks bundle deploy -p "$profile" -t "$target" "${bundle_args[@]}"
84+
)

scripts/deploy-databricks-delta.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,10 @@ secret_key="${DATABRICKS_DELTA_SECRET_KEY:-convex-deploy-key}"
4848
catalog="${DATABRICKS_DELTA_CATALOG:-workspace}"
4949
control_schema="${DATABRICKS_DELTA_CONTROL_SCHEMA:-convex_sync_kit_${source_slug_sql}_delta_control}"
5050
bronze_schema="${DATABRICKS_DELTA_BRONZE_SCHEMA:-convex_sync_kit_${source_slug_sql}_delta_bronze}"
51+
silver_schema="${DATABRICKS_DELTA_SILVER_SCHEMA:-convex_sync_kit_${source_slug_sql}_delta_silver}"
5152
checkpoint_table="${DATABRICKS_DELTA_CHECKPOINT_TABLE:-connector_checkpoint}"
53+
pipeline_name="${DATABRICKS_DELTA_AUTOCDC_PIPELINE_NAME:-convex-sync-kit-${deployment_slug}-autocdc}"
54+
pipeline_file="${DATABRICKS_DELTA_AUTOCDC_PIPELINE_FILE:-${deployment_slug}-bronze-to-silver.sql}"
5255

5356
"$repo_root/scripts/ensure-databricks-delta-secret.sh" "$profile" "$secret_scope" "$secret_key"
5457

@@ -64,7 +67,10 @@ bundle_args=(
6467
--var "catalog=$catalog"
6568
--var "control_schema=$control_schema"
6669
--var "bronze_schema=$bronze_schema"
70+
--var "silver_schema=$silver_schema"
6771
--var "checkpoint_table=$checkpoint_table"
72+
--var "autocdc_pipeline_name=$pipeline_name"
73+
--var "autocdc_pipeline_file=$pipeline_file"
6874
)
6975

7076
(

0 commit comments

Comments
 (0)