Skip to content

Drive CLI codegen from a checked-in cli.json#5484

Open
pietern wants to merge 8 commits into
mainfrom
cligen
Open

Drive CLI codegen from a checked-in cli.json#5484
pietern wants to merge 8 commits into
mainfrom
cligen

Conversation

@pietern

@pietern pietern commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Generates the workspace and account command stubs from a checked-in .codegen/cli.json instead of reflecting over a private API spec, so codegen is reproducible from data committed to this repo. The same cli.json also feeds the bundle schema's API annotations, removing the CLI's runtime dependency on an OpenAPI spec being present.

  • internal/cligen: a self-contained, data-driven generator (model, templates, name/casing derivation) that renders the command stubs from cli.json. The templates are ported verbatim from genkit's generator, so the generated cmd/** files are unchanged.
  • bundle/internal/schema: read API annotations (descriptions, enums, field behaviors) directly from cli.json.
  • Taskfile: task generate now runs the cli.json-based generators and needs no upstream checkout. Refreshing cli.json itself is the separate generate-cli-json task, the one step that still requires an upstream checkout.
  • Regenerated the committed codegen artifacts from cli.json. The cmd/** stubs are identical; the schema, docs, and direct-engine artifacts pick up a few correctness changes from the annotation-source switch.

Changes to generated outputs

  • Consolidated preview signal: Hidden/preview state now derives from launch_stage everywhere instead of the inconsistent preview flag (set on 151 fields vs 865 actually PRIVATE_PREVIEW) — adds doNotSuggest/:meta private: markers and drops fields deprecated during private preview.
  • DEVELOPMENT filtering: Codegen now reads the cli.json filtered at min-stage = PRIVATE_PREVIEW instead of the unfiltered spec, so DEVELOPMENT-stage surface no longer leaks (e.g. unreleased ingestion sources).
  • Schema names match the Go SDK: package.PascalName keys resolve descriptions that lossy lowercased keys (e.g. compute.kind) previously left blank.

Test plan

  • ./task generate is reproducible and produces an empty diff
  • internal/cligen unit tests (name/casing derivation, render)

This pull request and its description were written by Isaac.

Generate the workspace and account command stubs from a checked-in
.codegen/cli.json instead of reflecting over an external API spec, so
codegen is reproducible from data committed to this repo.

- internal/cligen: a self-contained, data-driven generator (model,
  templates, name/casing derivation) that renders the command stubs
  from cli.json
- bundle/internal/schema: read API annotations (descriptions, enums,
  field behaviors) directly from cli.json; drop the OpenAPI pseudo-spec
  reconstruction and the raw-spec environment path
- Taskfile: wire the cli.json-based generators into `task generate`
- regenerate the committed codegen artifacts from cli.json

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 15:13 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 15:13 — with GitHub Actions Inactive
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Approval status: pending

/bundle/ - needs approval

12 files changed
Suggested: @denik
Also eligible: @shreyas-goenka, @janniklasrose, @andrewnester, @anton-107, @lennartkats-db

/internal/ - needs approval

15 files changed
Suggested: @simonfaltum
Also eligible: @tanmay-db, @parthban-db, @renaudhartert-db, @hectorcast-db, @Divyansh-db, @tejaskochar-db, @mihaimitrea-db, @chrisst, @rauchy

General files (require maintainer)

40 files changed
Based on git history:

  • @simonfaltum -- recent work in bundle/internal/schema/, python/databricks/bundles/pipelines/_models/, bundle/schema/

Any maintainer (@andrewnester, @anton-107, @denik, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

splitASCII drives every casing function and is a faithful port of genkit's
Named.splitASCII, whose nearest-letter scanning produces several non-obvious
splits (acronym head/tail handling, digits never starting a word). Pin those
as a curated table so the port can't drift.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 15:49 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 15:49 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Commit: ea21ed6

Run: 27233264789

Env 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 261 928 8:14
🟨​ aws windows 7 15 263 926 21:14
💚​ aws-ucws linux 7 15 357 842 9:59
💚​ aws-ucws windows 7 15 359 840 13:18
🔄​ azure linux 1 1 17 263 926 8:00
💚​ azure windows 1 17 266 924 10:24
💚​ azure-ucws linux 1 17 362 838 9:25
💚​ azure-ucws windows 1 17 364 836 12:46
💚​ gcp linux 1 17 260 929 8:01
💚​ gcp windows 1 17 262 927 12:41
23 interesting tests: 15 SKIP, 7 KNOWN, 1 flaky
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestFetchRepositoryInfoAPI_FromRepo ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p ✅​p ✅​p
Top 28 slowest tests (at least 2 minutes):
duration env testname
7:22 gcp windows TestAccept
6:11 aws-ucws windows TestAccept
5:41 azure-ucws windows TestAccept
5:06 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
5:02 azure windows TestAccept
4:50 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:38 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:19 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:53 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:44 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:41 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:30 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:28 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:18 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:18 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:15 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:15 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:07 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:01 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:58 azure-ucws linux TestAccept
2:58 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:56 aws-ucws linux TestAccept
2:54 gcp linux TestAccept
2:49 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:48 azure linux TestAccept
2:40 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:31 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:30 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

pietern added 2 commits June 9, 2026 19:04
Pure move of the name-splitting engine (splitASCII and its
condAtNearestLetters/searchNearest helpers) out of names.go so it sits
next to split_test.go. The casing functions stay in names.go.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 17:06 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 17:06 — with GitHub Actions Inactive
Add table cases for the special-cased early returns in camelName/snakeName
and the '$'-skip branch in splitASCII, bringing the pure name/casing logic
to full statement coverage.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 17:07 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 17:07 — with GitHub Actions Inactive
Switch the CLI to genkit's native cli_v1 producer (`genkit update-sdk`,
mode cli_v1). The cli.json schema keys/refs now use the Go SDK's
package.PascalName rendering and the file carries every message —
including request/response wrappers, which are inert for the CLI since it
reads only the schemas its commands reference.

- derive DoNotSuggest from launch_stage == PRIVATE_PREVIEW instead of the
  dropped preview flag (cli_json.go, parser.go), sourcing the signal from
  the single launch-stage source of truth
- Taskfile generate-cli-json: build genkit from universe HEAD and run
  update-sdk, since the cli_v1 producer is decoupled from the spec
- regenerate annotations, jsonschema (+ for-docs), and docs; compute.Kind's
  description now resolves via the package.PascalName key
- cli_json_test: assert the spec sha via the _openapi_sha file (genkit no
  longer writes it into cli.json metadata)

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 19:58 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 19:58 — with GitHub Actions Inactive
The cli_v1 producer emits go_v0 package.PascalName schema keys that match
the Go SDK type names directly (e.g. serving.OpenAiConfig), so the
Go-name-to-spec-name fixup is never reached. Regenerating the annotations,
json schemas, and docs is byte-identical with it removed.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 20:07 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is June 9, 2026 20:07 — with GitHub Actions Inactive
reason: spec:input_only
- field: source_code_path
reason: spec:input_only

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were erroneously added in #5361 because of:

  1. app.App aliasing fields we use in the bundle config
  2. those fields having launch stage DEVELOPMENT and not being filtered out previously

This is a latent issue that needs to be investigated later.

- drop the .codegen/cli.json entry from the test-task sources; cli.json
  changes are already covered by the dedicated cli.json/cligen tasks
- restore the "Generating CLI code..." echo to match origin/main

Co-authored-by: Isaac

@simonfaltum simonfaltum left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a deep pass on this since the diff is mostly generated artifacts and the interesting parts are easy to miss. The core claims check out when actually re-run:

  • task generate-cligen round-trips byte-for-byte at the PR head (generator + ws), and cmd/** is identical to main.
  • Regenerating annotations_openapi.yml and jsonschema.json from the committed cli.json reproduces both exactly.
  • jsonschema.json has zero structural changes (no properties added or removed, including oneOf/anyOf branches). The changed lines are all descriptions, enums and markers.
  • serviceFilename's unconditional TrimPrefix("account") matches genkit's cliv0/service.go exactly, and genkit's update-sdk doesn't prune stale generated files either, so behavior is at parity there.

A few things I couldn't anchor inline:

  • .agent/rules/auto-generated-files.md still tells agents to run ./task generate-genkit, which this PR removes. Worth updating to generate-cligen / generate-cli-json in the same PR.
  • The pydabs Privilege enums (volumes/schemas/catalogs) drop ~19 values and the Workday report-parameters model is deleted. I think the DEVELOPMENT filtering is correct, but databricks-bundles is published, so consider a NEXT_CHANGELOG.md entry for the removals.
  • Now that command generation is hermetic, a CI step running task generate-cligen + git diff --exit-code becomes possible for the first time. That would catch cli.json / cmd/** drift, which is exactly the failure class the silent paths in Resolve() can create. Fine as a follow-up.

Inline comments below for the rest. No blockers from my side.

Comment thread internal/cligen/model.go
Comment on lines +312 to +328
s.Subservices = s.Subservices[:0]
for _, id := range s.SubserviceIDs {
if sub := c.byID[id]; sub != nil {
s.Subservices = append(s.Subservices, sub)
}
}
for _, m := range s.Methods {
if m.RequestBodyField == nil {
continue
}
for _, f := range m.AllFields {
if f.Name == m.RequestBodyField.Name {
m.RequestBodyField = f
break
}
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both re-link paths here fail silently on a bad cli.json: an unknown subservice id is dropped, and a RequestBodyField name with no AllFields match keeps the decoded duplicate pointer, so the template's eq . $method.RequestBodyField identity check silently misfires and request-body flags get generated wrong. The contract test catches the subservice count mismatch but not the request-body identity.

I think Resolve() should return an error on any unresolved reference (and TestCliJSONIsInterpretable could assert the pointer identity as a cheap extra guard). Right now task generate-cligen on a bad refresh would happily emit broken output and you'd only find out when CI runs the unit test.

apps:

ignore_remote_changes:
- field: git_source

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a real behavior change for the direct engine that the description doesn't mention: pure remote drift on git_source / source_code_path for apps was previously skipped, now it'll classify as an update.

From what I can tell it's actually a fix: these two were never App proto fields (they're CLI-side config additions in bundle/config/resources/apps.go), the old entries came from a name collision with same-named INPUT_ONLY fields in the internal spec via the 0.141 SDK bump (#5361), and dresources/app.go deliberately maps DefaultSourceCodePath to enable exactly this drift detection. But neither direction is covered by the config-drift acceptance test (it only exercises command/env_vars), so I'd confirm it's intended and call it out in the description.

Comment on lines +187 to +193
{{- if not .Required -}}
{{- if and $method.Pagination (or
(and (and $method.Pagination.Token $method.Pagination.Token.PollField) (eq .Name $method.Pagination.Token.PollField.Name) (not (eq $method.Path "/api/2.1/clusters/events")))
(and $method.Pagination.Offset (eq .Name $method.Pagination.Offset.Name))
(and $method.Pagination.Limit (eq .Name $method.Pagination.Limit.Name))
(and $method.Pagination.MaxResults (eq .Name $method.Pagination.MaxResults.Name))
(and (eq $method.Path "/api/2.1/clusters/events") (or (eq .Name "offset") (eq .Name "limit")))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says the templates are ported verbatim, but this clusters/events special-case (here and in the hidden-flags block at line 223) is new. It compensates for the pagination model changing between the old spec and cli.json. Regeneration proves it's byte-identical today, but there's no comment explaining why it exists, and it silently stops applying if the path ever changes.

Can we get a short comment here on why, or better, compensate in the cli_v1 producer so the template stays actually verbatim?


// Wait for completion.
opts := api.WithTimeout({{.CamelName}}Timeout)
{{ if not .LongRunningOperation.ResponseType.IsEmptyResponse -}}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small latent divergence: genkit used only IsGoogleEmpty for the LRO response type, but or IsGoogleEmpty IsLegacyEmptyResponse for plain responses. With both folded into one producer-computed IsEmptyResponse, an LRO whose response type is a legacy named-but-fieldless message would render differently than genkit. No current data hits it (regen is byte-identical), but worth a comment or a producer-side assertion so it doesn't surprise anyone later.

Comment thread internal/cligen/cligen.go
// @latest, so generation is deterministic and works offline.
modfile := filepath.Join(dir, "tools", "go.mod")
goimports := append([]string{"tool", "-modfile=" + modfile, "goimports", "-w"}, abs...)
if out, err := exec.Command("go", goimports...).CombinedOutput(); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goimports gets all ~190 absolute paths in a single exec; on a deep checkout path on Windows this can approach the 32k CreateProcess limit. The old .codegen.json formatter had the same shape, so this is parity, not a regression, but chunking the file list is cheap if we ever care.

Comment thread internal/cligen/names.go
Comment on lines +8 to +12
// This file ports genkit's pure name-casing functions (codegen/code/named.go)
// so the CLI can derive kebab/snake/pascal/camel/constant/title casings from a
// single stored `name`, instead of the producer denormalizing every variant
// into cli.json. The port must match genkit byte-for-byte; names_test.go
// asserts that against a cli.json that still carries the stored casings.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment: the committed cli.json no longer carries the stored casings, and names_test.go pins 7 hardcoded samples (validated against genkit before the casings were dropped, per its own comment). Worth rewording so the next reader doesn't go looking for casing fields in cli.json.

Comment thread Taskfile.yml
Comment on lines +777 to 780
- git checkout -- .gitattributes
- go test -timeout 240s -run TestConsistentDatabricksSdkVersion github.com/databricks/cli/internal/build
- rm .github/workflows/next-changelog.yml
- mv tagging.py internal/genkit/tagging.py

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two assumptions here worth double-checking before merge: (1) this needs the cli_v1 producer merged in universe first, otherwise .codegen.json's mode: cli_v1 and this task dangle; (2) the rm / mv steps assume cli_v1 still emits next-changelog.yml and tagging.py into the repo root, so if it stopped emitting them the task fails midway. Also note git checkout -- .gitattributes discards any uncommitted .gitattributes changes. Fine for the documented flow, just sharp.

Comment thread tools/go.mod
github.com/google/yamlfmt/cmd/yamlfmt
golang.org/x/tools/cmd/deadcode
golang.org/x/tools/cmd/goimports
golang.org/x/vuln/cmd/govulncheck

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

govulncheck isn't referenced by the Taskfile or anything else in this PR as far as I can tell (goimports is what cligen needs). Intentional?

@simonfaltum simonfaltum left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a second, deeper pass with the diff split into buckets (generator core, names, templates, schema pipeline, regenerated artifacts, python surface, plumbing, and cli.json itself), each finding independently re-verified, plus a full local re-run at ea21ed6. Inline comments carry the per-file findings; the cross-cutting results:

Verified by execution

  • go build ./..., go vet, and the unit tests pass. Re-running every cli.json-driven generator (cligen, refschema, annotations, both schemas, validation, docs, direct apitypes/resources) reproduces the committed tree byte-for-byte, git status empty afterwards. Skipped only generate-tf-schema (network; its inputs are unaffected by this PR) and pydabs-codegen (uv env; its only PR-relevant input, jsonschema.json, regenerated identically).
  • Determinism holds: two generator runs are byte-identical, and I found no map iteration affecting output order.
  • All 1804 names in the commands block derive identically under the ported name functions and genkit's strings.Title originals. There is one divergent input class with zero current members, see the names.go comment; that one and the compute.Kind duplicate are my two pre-merge asks, plus the changelog below.
  • 4 of 5 templates differ from genkit cliv0 only by trailing newline. cli.json is well-formed, has zero DEVELOPMENT entries across 10,011 launch_stage values, and matches model.go key-for-key.

Changelog, stronger case than my earlier comment: 14 of the 19 removed Privilege values, GPU_1xH100, and the Workday QueryKeyValue class shipped in released databricks-bundles 0.299.2 through 1.2.1 (the 5 STREAM/MEMORY_STORE values landed after v1.2.1 and were never released). databricks-bundles is GA on PyPI and python/ has no package-local changelog, so NEXT_CHANGELOG.md is the only channel. I think the removals need an entry before merge.

Resolve() hardening, extending my earlier comment: the same silent-trust pattern also covers duplicate service ids (last-win in byID, mis-wires ParentService and with it the account-vs-workspace import path), serviceFilename collisions (second service silently overwrites the first's file), and CliComment assuming producer-normalized summaries (genkit normalized at render time). All four invariants are clean in the current cli.json (187 services, 0 violations); asserting them in cli_json_test.go is cheap.

Checked and fine, so nobody needs to re-derive these:

  • The dropped {{else if .Entity.IsEmpty}} template branch is dead code in genkit too (IsEmpty implies IsObject, which matches four branches earlier). Not a regression.
  • The IngestionSourceType enum prune (~100 down to 20 values) has no reachable user impact: the $def is orphaned in both base and head (source_type is OUTPUT_ONLY and stripped by removeOutputOnlyFields), and the kept connector options all carry doNotSuggest.
  • The apps drift-rule removal in resources.generated.yml is the DEVELOPMENT filter working as claimed (the proto fields are stage DEVELOPMENT from day one), consistent with Pieter's comment; the plan-time behavior change is what my earlier inline comment covers.

Stale references for this PR or a fast follow: CLAUDE.md:44 and .agent/rules/auto-generated-files.md still say generate-genkit (mentioned before), the generate task desc still leads with "genkit", the all task comment explains skipping a task that no longer exists, and .gitignore:73 still ignores the no-longer-produced .codegen/openapi.json. Also enum_launch_stages (445 schemas) and enum_descriptions (30) in the schemas block have no consumer in this repo; either trim them from the export or note the intended consumer.

cli.json hygiene (GitHub won't render a diff for the 195k-line file, so no inline anchor): descriptions carry internal references: go/acl/service-identity (L23035), src.dev.databricks.com universe links with commit SHAs (L8098, L41942, L49205), 8 internal Jira IDs in TODO notes, and one Google Doc design link (L47551). I cross-checked all of them: nearly everything already ships verbatim in the public Go/Python/Java SDK models, so this PR doesn't create a new disclosure class, but three strings aren't in any public artifact yet (the TSE-3937 etag TODO among them). A description-scrub pass in the cli_v1 producer would be a nice follow-up.

Still no blockers from my side.

Comment thread internal/cligen/names.go
// title mirrors strings.Title for the ASCII words splitASCII emits: it
// uppercases each letter that follows a non-letter (or the start). Implemented
// directly to avoid the deprecated strings.Title and stay lint-clean.
func title(s string) string {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diverges from genkit for digit-then-letter words. genkit derives PascalName/TitleName with strings.Title, which treats digits as non-separators (letters after digits stay lowercase); this title() uppercases any letter following a non-letter, digits included:

  • ai21labs_api_key: port gives Ai21LabsApiKey, genkit/SDK give Ai21labsApiKey (real field today: serving.Ai21labsApiKey)
  • oauth2client: Oauth2Client vs Oauth2client
  • sha256checksum: Sha256Checksum vs Sha256checksum

I compared both implementations over all 1804 names in the commands block: zero diverge today, so current output is identical. But the first cli.json refresh that adds a digit-letter name as a request field renders a reference like req.Ai21LabsApiKey against an SDK that spells it Ai21labsApiKey, i.e. non-compiling stubs (and TitleName drift in the drop-down strings would be silent). Since the header here promises a byte-for-byte port, I think title() should only uppercase after non-letter-non-digit, and names_test.go should pin the divergent classes (ai21labs_api_key, plus a consecutive-uppercase acronym like HTMLParser); the current 7-row table misses exactly these.

if err != nil {
return fmt.Errorf("failed to load names for {{$.TitleName}} drop-down. Please manually specify required arguments. Original error: %w", err)
}
id, err := cmdio.Select(ctx, names, "{{range $request.RequiredFields}}{{.Summary | trimSuffix "."}}{{end}}")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summaries land in plain double-quoted literals here and in the fmt.Errorf at line 378, with no escaping (the backtick contexts go through without, these don't). A " or \ in a field summary produces unparseable Go, which at least fails loudly in format() and TestCliJSONIsInterpretable; but a syntactically valid injection compiles and passes every existing check, buried in a 195k-line cli.json refresh plus regenerated stubs. The current data is clean (202 descriptions contain double quotes, none reach these sites), so a strconv.Quote-style helper would be byte-identical today. Same class: .Name in the Errorf strings at 323/325, GroupID at 76, the launch-stage strings at 91-92/313-314, and Key/DisplayName in the two groups templates. I'd add the quoting helper before cli.json refreshes become routine.

{{- range $arg, $field := .RequiredPositionalArguments}}
{{- template "args-scan" (dict "Arg" $arg "Field" $field "Method" $method "HasIdPrompt" $hasIdPrompt "ExcludeFromJson" $excludeFromJson)}}
{{- end -}}
{{- range .AllFields -}}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latent and inherited byte-for-byte from genkit, so fine as a follow-up: this parse loop (the only consumer of the *Param vars) sits inside the $hasPosArgs block opened at line 359, while the flag registration at 209-210 is unconditional. A method with an optional duration/timestamp/fieldmask field and no required positional args registers the flag and then silently discards whatever the user passes. Unexercised today (only secrets-uc has such params and both methods have positional args). Moving the loop outside the guard changes no current output.

Comment thread internal/cligen/model.go
IsInt bool `json:"is_int,omitempty"`
IsInt64 bool `json:"is_int64,omitempty"`
IsFloat64 bool `json:"is_float64,omitempty"`
IsDuration bool `json:"is_duration,omitempty"`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_duration is the only modeled key with zero occurrences in the checked-in cli.json (the WKT siblings are live: is_timestamp 74x, is_field_mask 90x), and the cli_v1 producer isn't visible from this repo, so nothing pins that its key for duration fields is actually is_duration. If it ever differs, optional duration flags fail loudly (/* NOT PRIMITIVE */), but a required positional duration falls into the fmt.Sscan branch, which compiles and silently misparses at runtime. A small synthetic decode/render fixture pinning the key would close this before durations first appear in the API surface.

Comment thread internal/cligen/model.go
// the cliv0 templates (templates/*.tmpl, copied verbatim from genkit) access.
//
// Design rules that keep output byte-for-byte identical to genkit:
// - Casings (Kebab/Pascal/Snake/Camel/Constant/Title) are PRE-RESOLVED by the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale bullet: casings are derived at render time via names.go (lines 102-107 of this file say so themselves), not pre-resolved by the producer; only the leaf refs (PascalRef, FieldRef, NamedIdMapJSON, LROMethodRef) carry stored strings. Worth fixing since this header documents the producer contract, and right now it contradicts both the implementation and names_test.go's comment.

Comment thread .codegen.json
@@ -1,5 +1,5 @@
{
"mode": "cli_v0",
"mode": "cli_v1",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my Taskfile comment about sequencing with a concrete check: genkit on universe master as of 2026-06-09 still has no cli_v1 (codegen.go dispatches cli_v0 only, same in cmd/update and cmd/generate), so genkit update-sdk --dir against this repo fails with unknown mode until the producer merges. Day-to-day task generate is unaffected since cli.json is checked in. Worth noting the minimum universe version in the generate-cli-json comment once the producer lands.

"type": "string",
"description": "The kind of compute described by this compute specification.\n\nDepending on `kind`, different validations and default values will be applied.\n\nClusters with `kind = CLASSIC_PREVIEW` support the following fields, whereas clusters with no specified `kind` do not.\n* [is_single_node](/api/workspace/clusters/create#is_single_node)\n* [use_ml_runtime](/api/workspace/clusters/create#use_ml_runtime)\n\nBy using the [simple form](https://docs.databricks.com/compute/simple-form.html), your clusters are automatically using `kind = CLASSIC_PREVIEW`.",
"enum": [
"CLASSIC_PREVIEW",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New duplicate: compute.Kind's enum is now ["CLASSIC_PREVIEW", "CLASSIC_PREVIEW"] here and in jsonschema_for_docs.json. The PascalName fix makes annotations_openapi.yml carry this enum for the first time, and annotations_openapi_overrides.yml:892-896 still has the manual workaround entry from when the lossy key left it empty (#3717); LoadAndMerge concatenates sequences, so both sources land. Deleting the now-redundant override entry fixes it. Same dual-source cause as the pre-existing catalog.SseEncryptionDetailsAlgorithm duplicate on main, so a dedupe in the enum merge would fix both.

- Sequence
- Resources for the app. See [\_](#appsnameresources).

- - `source_code_path`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This row publishes source_code_path with an empty description cell, and the git_source section above is newly public too. Both fields lost their PRIVATE markers because apps.App dropped them from cli.json (DEVELOPMENT stage upstream), and source_code_path's only annotation source is a PLACEHOLDER override, which docsgen renders blank. Since source_code_path is a required DABs field, un-hiding is probably right, but then it needs a real description in the annotation files; if un-hiding wasn't intended, the hidden state has to be restored manually since the spec no longer covers these CLI-side fields.


HardwareAcceleratorTypeParam = (
Literal["GPU_1xA10", "GPU_8xH100", "GPU_1xH100"] | HardwareAcceleratorType
Literal["GPU_1xA10", "GPU_8xH100"] | HardwareAcceleratorType

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging because this is the most user-adjacent enum trim in the PR: unlike the other trimmed enums (all :meta private: experimental), HardwareAcceleratorType carries no experimental marker, is exported from databricks.bundles.jobs, and backs the documented Compute.hardware_accelerator field for serverless GPU workloads. GPU_1xH100 shipped in released databricks-bundles 0.299.2 through 1.2.1, so Compute(hardware_accelerator="GPU_1xH100") goes from valid to a ValueError on upgrade. The value was DEVELOPMENT stage and never existed in the Go SDK, so dropping it is correct; it just deserves an explicit mention in the changelog entry and PR description.

MODIFY_CLEAN_ROOM = "MODIFY_CLEAN_ROOM"
EXECUTE_CLEAN_ROOM_TASK = "EXECUTE_CLEAN_ROOM_TASK"
EXTERNAL_USE_SCHEMA = "EXTERNAL_USE_SCHEMA"
VIEW_OBJECT = "VIEW_OBJECT"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 of these 19 values (VIEW_OBJECT through READ_FEATURE, added by the v0.132.0 SDK bump in #5237) shipped in released databricks-bundles 0.299.2 through 1.2.1 as plain public exports (Privilege/PrivilegeParam in __all__, plus the CatalogGrantPrivilege alias, no experimental marker). The 5 STREAM/MEMORY_STORE values landed after v1.2.1 and were never released. Same 19-value removal in schemas/ and volumes/. They're DEVELOPMENT stage and unusable server-side, so removing them is correct, but Privilege.VIEW_OBJECT raising AttributeError after a minor upgrade of a GA package is exactly what a changelog entry is for (see review body).

Comment thread .codegen.json
@@ -1,5 +1,5 @@
{
"mode": "cli_v0",
"mode": "cli_v1",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we align format in cli.json? There, it's

  "metadata": {
    "generator_version": "cliv1"
  },

Comment on lines +49 to +51
if len(cli.Schemas) == 0 {
return nil, fmt.Errorf("no schemas found in %s", path)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should belong into test rather than runtime validation

Comment thread Taskfile.yml
Comment on lines -752 to -766
- |
if [ -z "$UNIVERSE_SKIP_CHECKOUT" ]; then
if ! git -C {{.UNIVERSE_DIR}} diff --quiet || ! git -C {{.UNIVERSE_DIR}} diff --cached --quiet; then
echo "Error: universe repo at {{.UNIVERSE_DIR}} has uncommitted changes; commit or stash them, or set UNIVERSE_SKIP_CHECKOUT=1 to skip checkout"
exit 1
fi
echo "Checking out universe at SHA: $(cat .codegen/_openapi_sha)"
cd {{.UNIVERSE_DIR}}
if ! git cat-file -e $(cat {{.ROOT_DIR}}/.codegen/_openapi_sha) 2>/dev/null; then
git fetch --filter=blob:none origin master
fi
git checkout $(cat {{.ROOT_DIR}}/.codegen/_openapi_sha)
else
echo "UNIVERSE_SKIP_CHECKOUT set; using current {{.UNIVERSE_DIR}} HEAD"
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this removed? just cd-ing into universe below and executing has the potential for mistakes if the repo is dirty, we should at least check that

Comment thread Taskfile.yml
- "{{.UNIVERSE_DIR}}/bazel-bin/openapi/genkit/genkit_/genkit update-sdk"
- "cat .gitattributes.manual .gitattributes > .gitattributes.tmp && mv .gitattributes.tmp .gitattributes"
- '{{.UNIVERSE_DIR}}/bazel-bin/openapi/genkit/genkit_/genkit update-sdk --dir {{.ROOT_DIR}}'
- git checkout -- .gitattributes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can potentially override the updates in generate-cli-json no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants