Skip to content

[WIP] Add databricks dbconnect init / sync commands#5690

Draft
rugpanov wants to merge 34 commits into
mainfrom
dbconnect-init-sync
Draft

[WIP] Add databricks dbconnect init / sync commands#5690
rugpanov wants to merge 34 commits into
mainfrom
dbconnect-init-sync

Conversation

@rugpanov

Copy link
Copy Markdown

Draft / do-not-merge: opened for early review; not ready to merge.

Changes

Adds a new databricks dbconnect command namespace with two subcommands:

  • databricks dbconnect init — create a fresh pyproject.toml and provision a matched .venv.
  • databricks dbconnect sync — merge managed dependencies into an existing pyproject.toml and re-provision.

From the selected Databricks compute target (serverless / cluster / job), the command derives and provisions a local Python environment matched to the runtime: the right Python version, the right databricks-connect pin, and dependency constraints so local resolution matches the Databricks runtime. It runs a phase pipeline: discover uv → resolve target → fetch the per-environment constraints (configurable base URL, with an offline cache) → plan → apply → ensure Python → uv sync → seed pip → validate.

Implementation notes:

  • Thin Cobra layer (cmd/dbconnect/) over a unit-testable pipeline (libs/dbconnect/), with a PackageManager interface seam (uv implemented; pip/conda can follow).
  • Surgical, formatting-preserving pyproject.toml merge that touches only three managed regions and preserves the user's comments, ordering, and their own [tool.uv] keys; idempotent.
  • Target resolution via the SDK (cluster GetByClusterId → DBR → envKey, serverless, and job compute) with three-state messaging.
  • Honors the corporate PyPI proxy by bridging ~/.config/pip/pip.conf index-urlUV_INDEX_URL (uv ignores pip.conf).
  • --check dry-run prints the plan + diff and changes nothing; --output json emits a stable structured schema, and --debug adds diagnostic logging for troubleshooting on machines we can't access.
  • No new third-party dependencies.

Why

Promotes a proven proof-of-concept shell script into a real CLI command so the VS Code extension (and users directly) can set up a local environment matched to their compute, instead of guessing Python and databricks-connect versions. Doing the version/constraint resolution from the compute target avoids local/remote drift.

Tests

  • Table-driven unit tests across libs/dbconnect/: merge edge cases (single/multi-line arrays, quote styles, CRLF, idempotency, preserving user [tool.uv] keys), envKey mapping + Python-version parsing, target resolution (precedence + three-state), constraint fetch with offline-cache fallback, and pipeline orchestration incl. --check gating and validation.
  • Acceptance cases under acceptance/dbconnect/: serverless --check, --output json shape, no-target error, cluster-unsupported, flag conflict, and JSON-mode error exit code.
  • Verified end-to-end against a real serverless-v4 target: provisions a Python 3.12 .venv with databricks-connect 17.x and the injected constraints.

Out of scope for this first cut: pip/conda package managers (interface only) and the nearest-supported envKey fallback.

This pull request and its description were written by Isaac.

@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: d787bcb

Run: 28615829285

Env 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 10 13 230 1050 4:09
💚​ aws windows 10 13 232 1048 4:12
💚​ aws-ucws linux 10 13 314 968 4:58
💚​ aws-ucws windows 10 13 316 966 4:07
💚​ azure linux 4 15 230 1049 4:13
💚​ azure windows 4 15 232 1047 3:50
💚​ azure-ucws linux 4 15 316 965 5:05
💚​ azure-ucws windows 4 15 318 963 3:35
💚​ gcp linux 4 15 229 1051 4:00
💚​ gcp windows 4 15 231 1049 3:40
23 interesting tests: 13 SKIP, 10 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
💚​ TestAccept 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestFetchRepositoryInfoAPI_FromRepo 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
💚​ TestFetchRepositoryInfoAPI_FromRepo/root 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
💚​ TestFetchRepositoryInfoAPI_FromRepo/subdir 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
Top 5 slowest tests (at least 2 minutes):
duration env testname
2:54 aws windows TestAccept
2:52 gcp windows TestAccept
2:45 azure windows TestAccept
2:38 aws-ucws windows TestAccept
2:36 azure-ucws windows TestAccept

@rugpanov rugpanov temporarily deployed to test-trigger-is July 2, 2026 18:18 — with GitHub Actions Inactive
@rugpanov rugpanov temporarily deployed to test-trigger-is July 2, 2026 18:18 — with GitHub Actions Inactive
rugpanov added 22 commits July 2, 2026 21:12
Brainstormed design for porting the dbconnect-init.sh demo into a real
CLI subcommand namespace with init + sync commands, a shared phase
pipeline, full target resolution, a surgical TOML merge, and a stable
--json schema.

Co-authored-by: Isaac
Bite-sized, TDD task breakdown (11 tasks) covering the command scaffold,
result types, envKey mapping, constraint fetch+cache, surgical TOML merge,
target resolution, uv package manager, the phase pipeline, Cobra wiring,
acceptance tests, and changelog.

Co-authored-by: Isaac
Regenerate the golden from the built binary; the prior hand-written
version showed the command Short text instead of the rendered Long help.

Co-authored-by: Isaac
- Remove noise doc comments from Error() and Unwrap() (idiomatic for standard interface methods)
- Replace thin NewError doc comment with meaningful info about fmt.Sprintf and nil handling
- Remove YAGNI default case from Mode.String(), use if/return instead

Co-authored-by: Isaac
- Replace double TrimPrefix calls with simpler strings.TrimPrefix(strings.ToLower(version), "v")
- Hoist pythonVersionRe to package-level var to avoid repeated compilation
- Remove noise comment that restated the code

Co-authored-by: Isaac
The PythonMinorFromRequires call happens after a successful network fetch,
so wrapping its error with ErrConstraintFetchFailed was a misattribution.
Use ErrValidationFailed instead, which correctly signals that the constraint
file content failed to parse rather than that the fetch itself failed.

Co-authored-by: Isaac
rugpanov added 12 commits July 2, 2026 21:12
Co-authored-by: Isaac
- Add json tags to PipelineError (code/message/-) so --output json emits
  the documented contract instead of Go field names
- Change uv version probe from "version" subcommand to --version flag to
  avoid project-scoped failure when no pyproject.toml exists in cwd
- Guard renderResult against nil res: synthesize a minimal Result with
  error populated so JSON mode always emits a structured object
- Use i+1 for 1-based phase numbering in text output
- Add comment explaining why ValidateTargetFlags is kept alongside
  MarkFlagsMutuallyExclusive

Co-authored-by: Isaac
Add acceptance tests for the dbconnect init/sync feature:
- flag-conflict: verifies Cobra mutual exclusion of --cluster/--serverless/--job
- no-target: verifies error when no compute target is selected
- serverless-check: verifies --serverless v4 --check with stubbed constraint server
- serverless-json: verifies --output json with full Result struct
- cluster-unsupported: verifies constraint fetch failure for unsupported DBR version
- help/test.toml: opts out of bundle-engine matrix for the help case

Each case stubs the test server via [[Server]] in test.toml and uses
DATABRICKS_DBCONNECT_CONSTRAINT_SOURCE=$DATABRICKS_HOST to point the
constraint fetch at the local test server.

Co-authored-by: Grigory Panov
no-target and cluster-unsupported tests use commands that must fail;
musterr asserts this and fails the test if the command unexpectedly
succeeds. errcode is for tolerated failures only.

Co-authored-by: Isaac
Also standardize the serverless-json acceptance uv-version replacement
regex to the unwrapped form used by the sibling cases.

Co-authored-by: Isaac
…d cluster-unsupported scaffolding

Co-authored-by: Isaac
These are internal process artifacts and don't belong in the databricks/cli tree.

Co-authored-by: Isaac
…taxonomy, camelCase JSON

Aligns `databricks dbconnect` with the reconciled cli-spec:
- Collapse `init`+`sync` into a single `dbconnect sync` that auto-detects
  greenfield (no pyproject.toml) vs. merge; command path lives in one constant.
- Add `--constraints-only` (Python + constraints, no databricks-connect pin;
  still builds .venv, omits dbconnectVersion, skips the DB Connect assertion).
- Rewrite the `--output json` contract to the camelCase schema: schemaVersion,
  command, ok, mode, dryRun, target, resolved, greenfield, plan, phases[] (all
  six phases with pending), warnings[], error{code,failurePhase,diskMutated}.
- Rename error codes to the E_* set; report failurePhase at the phase that
  detects the failure so it always matches the errored phase in phases[].
- Detect non-uv managers (conda/pip) in preflight and exit cleanly with
  E_MANAGER_UNSUPPORTED; a plain PEP 621 pyproject.toml resolves to uv.
- Classify a 404 for a resolved env key as E_ENV_UNSUPPORTED (latest-LTS hint,
  no cache fallback) vs. transport failure as E_FETCH; add a writable preflight.
- Default the constraint repo to rugpanov/databricks-environments.

Fixes two bugs the real `uv sync`/validate path exposed (both masked by the
fake package manager and --check in tests):
- uvManager.Validate no longer requires databricks-connect to be importable
  (constraints-only left it uninstalled), so validate stops failing after it
  has already provisioned the venv.
- Greenfield render now emits project.version, which uv requires for a
  [project] table; without it every real greenfield `uv sync` failed.

Co-authored-by: Isaac
@rugpanov rugpanov force-pushed the dbconnect-init-sync branch from 8d184bf to d787bcb Compare July 2, 2026 19:21
@rugpanov rugpanov temporarily deployed to test-trigger-is July 2, 2026 19:21 — with GitHub Actions Inactive
@rugpanov rugpanov temporarily deployed to test-trigger-is July 2, 2026 19:21 — with GitHub Actions Inactive
rugpanov added a commit that referenced this pull request Jul 3, 2026
First of a stacked series splitting the databricks dbconnect feature
(umbrella branch dbconnect-init-sync / PR #5690) into small, single-
concern PRs. Each layer is independently reviewable and adds no
user-facing surface until the final PR wires the command in.

This PR is the foundation the rest of the stack builds on:

- result.go: the result types and the --json / E_* error contract that
  every phase reports through (Result, PipelineError, ErrorCode, PhaseName,
  PhaseStatus, Mode, TargetInfo, ResolvedInfo, Plan, Warning).
- envkey.go: mapping a compute target to an environment key
  (EnvKeyForServerless, EnvKeyForSparkVersion, NormalizeServerless) and
  parsing the Python minor from a requires-python specifier.

Nothing imports this package yet, so the CLI is unchanged. The unexported
filesystem/artifact constants and the canonical phase-order slice live with
the pipeline that consumes them (a later PR in the stack), keeping this
layer to just the contract types.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants