Skip to content

acceptance: schema-driven invariant fuzzing for bundle configs#5686

Draft
radakam wants to merge 24 commits into
mainfrom
deco-25361-fuzz-create-payload
Draft

acceptance: schema-driven invariant fuzzing for bundle configs#5686
radakam wants to merge 24 commits into
mainfrom
deco-25361-fuzz-create-payload

Conversation

@radakam

@radakam radakam commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Changes

A schema-driven config fuzzer for acceptance/bundle/invariant/. gen_fuzz_config.py walks the bundle schema and emits a random, schema-valid databricks.yml for any resource type; the invariant/fuzz variant deploys each one under the direct engine. Seeds are deterministic, so failures reproduce from the printed seed.

Every PR runs a small fixed seed window asserting only that the CLI never panics. task test-fuzz and the nightly job sweep a wider window with the no-drift invariant on (FUZZ_CHECK_DRIFT).

Why

Random schema-valid configs exercise the full deploy path and surface panics the CLI would otherwise only hit on unusual real-world configs. The no-panic invariant is the reliable, backend-independent signal. Drift checking against the local fake server is best-effort: it accepts invalid configs the real backend rejects, so re-plans can show phantom drift — confirmed by re-running generated configs against a real workspace. Drift checking therefore stays opt-in.

Testing

  • go test ./acceptance -run TestAccept/bundle/invariant/fuzz — generate, deploy, destroy; assert no panic.
  • task test-fuzz — wide-window run with drift invariant (nightly).
  • Curated no_drift still passes; a drifting seed now fails instead of passing silently; verify_no_drift.py errors cleanly on empty plan input.

@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 07:44 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 07:44 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 08:06 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 08:06 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 898efb7

Run: 28450202310

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 1 13 232 1038 4:48
🟨​ aws windows 7 1 13 234 1036 5:41
💚​ aws-ucws linux 8 13 316 956 4:39
💚​ aws-ucws windows 8 13 318 954 3:17
💚​ azure linux 2 15 232 1037 3:33
💚​ azure windows 2 15 234 1035 2:46
💚​ azure-ucws linux 2 15 318 953 4:31
💚​ azure-ucws windows 2 15 320 951 3:08
💚​ gcp linux 2 15 231 1039 3:25
💚​ gcp windows 2 15 233 1037 2:26
21 interesting tests: 13 SKIP, 7 KNOWN, 1 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestFetchRepositoryInfoAPI_FromRepo 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R

@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 09:47 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 23, 2026 09:47 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 24, 2026 08:29 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 24, 2026 08:29 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 24, 2026 12:05 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 24, 2026 12:05 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 24, 2026 13:27 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 24, 2026 13:27 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 24, 2026 13:35 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 24, 2026 13:35 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 25, 2026 11:42 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 25, 2026 11:42 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 25, 2026 11:54 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 25, 2026 11:54 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 25, 2026 17:47 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 25, 2026 17:47 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 26, 2026 07:42 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 26, 2026 07:42 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 26, 2026 08:22 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 26, 2026 08:22 — with GitHub Actions Inactive
@radakam radakam marked this pull request as ready for review June 26, 2026 08:34
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Approval status: pending

/acceptance/bundle/ - needs approval

7 files changed
Suggested: @pietern
Also eligible: @denik, @janniklasrose, @shreyas-goenka, @andrewnester, @anton-107, @lennartkats-db

General files (require maintainer)

4 files changed
Based on git history:

  • @pietern -- recent work in .github/workflows/, ./, acceptance/bundle/invariant/

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

Comment thread bundle/config/mutator/resourcemutator/cluster_fixups.go Outdated
Comment thread bundle/fuzz/generate_test.go Outdated
@radakam radakam temporarily deployed to test-trigger-is June 26, 2026 15:45 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 26, 2026 15:45 — with GitHub Actions Inactive
radakam added 16 commits June 30, 2026 08:34
The nightly test-fuzz job is intentionally excluded from test-result, so
a failure was only visible in the Actions tab. Add a failure step that
opens (or comments on) a single deduped GitHub issue with a one-command
repro.

Also correct the jobsCreatePath comment: a different API version shows up
as a capture failure (the testserver registers only this route, so a
mismatched version 404s and the deploy fails), not as a payload diff.
…ion test

Rename the capture/deploy/recorder helpers to *_test.go so the parity
harness compiles only under `go test` instead of into the package's
regular build, and add a committed regression test (cluster_fixups_test.go)
covering the single-node task-cluster num_workers force-send fix so the
divergence is guarded at PR time, not just in the nightly suite.
…ting

Move the remaining generator/diff/rand implementation into _test.go files
(keeping only a doc.go for the package comment) so nothing in the harness
compiles into the regular build, since no product code imports it.

Distinguish deploy/capture failures from create-payload divergences in
checkJobParity: skip when neither engine deploys the generated config, fail
distinctly when exactly one engine accepts it (an acceptance divergence, not
a payload diff), and only diff payloads when both deploys succeed. This keeps
nightly triage from misdirecting a deploy failure into regressionSeeds.

Also document the unique-identity-key assumption in diffKeyedSlice.
Use strings.SplitSeq instead of ranging over strings.Split (modernize
stringsseq) and require.Positivef instead of require.Greaterf(t, n, 0)
(testifylint negative-positive).
The failure-reporting step used `gh issue list --jq '.[0].number'`, which
prints the literal "null" when no open issue exists, so it always took the
comment branch and tried to comment on issue "null" instead of creating one.
Use `// empty` so the create branch runs on the first divergence.
Revert the num_workers single-node task-cluster fix along with its unit
test and acceptance updates so this PR adds only the parity harness.

Both terraform/direct divergences the harness found are now documented and
suppressed via DefaultIgnorePaths rather than fixed (fixes follow
separately): num_workers on single-node task clusters (seed 29) and the
spark.databricks.delta.preview.enabled spark conf key.
Address review feedback on the create-payload parity harness:

- Replace the path-only ignore list with value-conditional ignore rules so
  the documented num_workers divergence (direct omits, terraform force-sends
  0) is suppressed only for that exact shape; a real value mismatch at the
  same path now fails again.
- Unexport package-internal identifiers (generateJob, diffPayloads,
  difference, defaultIgnoreRules) that are only used within the package.
- Document why TestCaptureJobCreateDirect is intentionally not opt-in.
- Reword the one-sided-deploy failures as deploy/capture differences rather
  than asserting one engine "rejected" the config.
- Make TestParitySeeds hermetic against ambient FUZZ_* env vars.
- Correct the seed 29 comment to reflect that the divergence is suppressed.
The terraform provider force-sends num_workers:0 for a single-node
new_cluster on task-level clusters too, not just shared job_clusters, but
prepareJobSettingsForUpdate only applied initializeNumWorkers to
job_clusters. The direct engine therefore omitted num_workers on task
clusters and the two engines produced divergent create payloads (found by
the bundle/fuzz parity harness, seed 29).

Apply initializeNumWorkers to task new_cluster too so the direct engine
matches terraform, drop the now-obsolete tasks[*].new_cluster.num_workers
ignore entry, and simplify the fuzz ignore list to a plain []string now
that value-conditional matching is no longer needed.
Switch the fuzz suite from comparing terraform and direct create payloads to
asserting invariants on the direct engine's payload. Terraform and direct can
disagree for legitimate reasons, so a payload diff is noisy; an invariant has no
legitimate reason to fail, so a failure is a real bug. This drops the payload
diff and its ignore-list of documented divergences, and removes terraform from
the harness (each seed is now one in-process direct deploy).

Gate on `bundle validate` so the suite distinguishes the two fuzzing outcomes:
an invalid config skips (it can't violate an invariant), while a validated config
that fails to deploy or breaks an invariant fails. This is the distinction a
looser, schema-driven generator will rely on.

Revert the num_workers:0 force-send for single-node task clusters (and its
acceptance goldens): it only matched terraform's payload, with no demonstrated
behavior benefit, and direct has shipped without it. If a real backend
requirement is confirmed, it can return as a standalone change.
…uzzing

Drop the terraform/direct create-payload parity package in favor of fuzzing
the existing acceptance/bundle/invariant framework, which already checks
invariants across all resource types and is prepped for fuzzing via its
INPUT_CONFIG_OK contract.

- add acceptance/bin/gen_fuzz_config.py: a seeded generator that walks the
  bundle schema and emits a random databricks.yml for any resource type
- add acceptance/bundle/invariant/fuzz: generates configs over a seed window
  and asserts the CLI never panics; the no-drift invariant is opt-in
  (FUZZ_CHECK_DRIFT) for the nightly wide-window run
- point task test-fuzz and the nightly job at the new variant
- remove bundle/fuzz and its parity harness
- Correct misleading comments: the nightly test-fuzz job runs the same
  local harness against the fake server (wider seed window + drift on),
  not a real workspace.
- Run config generation inside the per-seed subshell so a generator
  crash also prints the "reproduce with" hint.
- Document the schema-driven fuzz subdir in the invariant README,
  including that a failure is a real CLI bug and how to reproduce it.
- Drop the unused name hint in gen_config (objects ignore it).
Make the comments across the schema fuzz harness more concise while keeping
the non-obvious "why" context.
Mirror the integration-test flow: comment on the PR that introduced the
failing commit rather than opening/deduping a tracking issue.
…ating it

Extract the no_drift deploy/drift/destroy body into a shared no_drift.sh
sourced by both the no_drift test and the fuzzer, so the invariant lives in
one place and other invariant tests can be fuzzed the same way.
A rejected config never deploys, so emitting the marker made the fuzzer
read the re-plan's "needs create" as drift.
The fuzzer runs the shared no_drift.sh body with errexit off and classifies
each seed from the captured exit code. The drift block ended with a no-panic
check that reset $? to 0, so a config that deployed cleanly but drifted was
silently treated as a pass. Accumulate the drift assertions into drift_rc and
return it instead. The curated no_drift test (errexit on) is unaffected.

Also make verify_no_drift.py fail cleanly on empty/unparseable plan output
(when bundle plan itself failed) instead of crashing with a traceback, and
tighten the fuzz harness comments.
@radakam radakam force-pushed the deco-25361-fuzz-create-payload branch from 5c1e25d to 314f4ee Compare June 30, 2026 08:35
@radakam radakam temporarily deployed to test-trigger-is June 30, 2026 08:36 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 30, 2026 08:36 — with GitHub Actions Inactive

cp databricks.yml LOG.config

# We redirect output rather than record it because some configs that are being tested may produce warnings

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why do we need to modify the test? Can't we just place generated config into config/generated_.yaml.tmpl and then run go test -run /path/to/test/INPUT_CONFIG=generated_?

The idea is to run any invariant test, not just no drift. For example, stress testing migrate is very valuable as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, -run can only filter INPUT_CONFIG values already in test.toml and diffs against a fixed output.txt, so it can't inject runtime-generated configs or tolerate the rejections fuzzing produces, so instead I extracted no_drift.sh and added a FUZZ_INVARIANT matrix, so the fuzzer loops seeds itself and asserts via exit code on LOG.* while still being able to fuzz any invariant. What do you think about it?

@radakam radakam marked this pull request as draft June 30, 2026 09:58
Extract the migrate invariant body into a shared migrate.sh (mirroring
no_drift.sh) and have the fuzzer source ../$FUZZ_INVARIANT.sh so it can
exercise any invariant. Wire up FUZZ_INVARIANT=[no_drift, migrate] so the
schema fuzzer now also stress-tests the Terraform->direct migration on
random configs. The fuzzer's panic scan now globs LOG.* rather than naming
LOG.validate/LOG.deploy, since different bodies write different logs.
@radakam radakam temporarily deployed to test-trigger-is June 30, 2026 09:59 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 30, 2026 09:59 — with GitHub Actions Inactive
CatalogsCreate only echoed a subset of the create request, so a re-read
returned null for connection_name, managed_encryption_settings, and
custom_max_retention_hours. Because connection_name is recreate_on_changes
(immutable), the schema fuzzer's no_drift invariant saw a perpetual
recreate; the others showed as update drift. Persist these fields on create
so the re-read matches the deployed config.

Also clamp the fuzzer's custom_max_retention_hours to UC-valid values
(0 or 168-720 hours) so generated catalog configs deploy.
@radakam radakam temporarily deployed to test-trigger-is June 30, 2026 12:43 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 30, 2026 12:43 — with GitHub Actions Inactive
…variants

Broaden the fuzz invariant matrix beyond no_drift/migrate with four more
schema-driven invariant bodies, each selectable via FUZZ_INVARIANT and
following the existing INPUT_CONFIG_OK / SKIP_DRIFT_CHECK contract:

- redeploy.sh: deploy twice; the second deploy must be a clean no-op, which
  exercises the write path twice and catches create handlers that don't
  round-trip their inputs.
- canonical.sh: `bundle validate -o json` must be byte-identical across two
  runs; guards against nondeterministic serialization. Cloud-independent, so
  it always runs (not gated behind SKIP_DRIFT_CHECK).
- update.sh: edit a comment/description and assert the redeploy is an in-place
  update (not a recreate) that converges with no drift. Configs without an
  editable field are skipped before the marker (treated as a rejection).
- destroy_recreate.sh: deploy then destroy; a re-plan must want to create
  everything again, proving destroy left no orphaned state.

Add two stdlib-only helpers: edit_fuzz_config.py (flips one comment/description
scalar via a line match, no YAML dependency) and verify_plan_action.py (asserts
a plan shows the expected action, mirroring bundle/deployplan/action.go).
@radakam radakam temporarily deployed to test-trigger-is June 30, 2026 14:01 — with GitHub Actions Inactive
@radakam radakam temporarily deployed to test-trigger-is June 30, 2026 14:01 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants