Skip to content

Only show installable projects in 'databricks labs list'#5560

Merged
janniklasrose merged 5 commits into
mainfrom
janniklasrose/labs-list-installable
Jun 30, 2026
Merged

Only show installable projects in 'databricks labs list'#5560
janniklasrose merged 5 commits into
mainfrom
janniklasrose/labs-list-installable

Conversation

@janniklasrose

@janniklasrose janniklasrose commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Changes

databricks labs list showed every non-archived, non-fork repository in the databrickslabs GitHub org — currently 39 — but only repositories that ship a labs.yml manifest can actually be installed (currently 8: blueprint, dlt-meta, dqx, lakebridge, lsql, pylint-plugin, sandbox, ucx). Picking anything else from the list failed databricks labs install with Error: remote: read labs.yml from GitHub: not found (error message improved separately in #5559), so the listing mostly advertised projects that cannot be installed.

With this change, the filter is narrowed to labs that are tagged as databricks-cli-installable, currently 3.

Output before (39 entries, abridged) / after (3 entries):

Name           Description
blueprint      Baseline for Databricks Labs projects written in P...
lakebridge     Accelerates migrations to Databricks by automating...
ucx            Automated migrations to Unity Catalog

Tests

Unit tests

This pull request and its description were written by Isaac, an AI coding agent.

'databricks labs list' showed every non-archived, non-fork repository in the databrickslabs GitHub org (currently 39), but only repositories that ship a labs.yml manifest at the root of their release tag can actually be installed (currently 8). Everything else failed 'databricks labs install' with a not-found error. Filter the listing to repositories that have a root labs.yml on their default branch, checked concurrently via raw.githubusercontent.com (not subject to the low unauthenticated GitHub API rate limit) and cached for 24 hours like the repository list itself.

Co-authored-by: Isaac
Co-authored-by: Isaac
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 1ff24cb

Run: 28436899542

Env 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 1 13 231 1037 5:03
🟨​ aws windows 7 1 13 233 1035 5:39
💚​ aws-ucws linux 8 13 315 955 4:25
💚​ aws-ucws windows 8 13 317 953 3:34
🔄​ azure linux 2 2 15 229 1036 5:38
🔄​ azure windows 2 2 15 231 1034 4:28
💚​ azure-ucws linux 2 15 317 952 4:21
💚​ azure-ucws windows 2 15 319 950 3:20
💚​ gcp linux 2 15 230 1038 3:27
💚​ gcp windows 2 15 232 1036 2:32
23 interesting tests: 13 SKIP, 7 KNOWN, 2 flaky, 1 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestFetchRepositoryInfoAPI_FromRepo 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🔄​ TestFetchRepositoryInfoAPI_FromRepo/root ✅​p ✅​p ✅​p ✅​p 🔄​f 🔄​f ✅​p ✅​p ✅​p ✅​p
🔄​ TestFetchRepositoryInfoAPI_FromRepo/subdir ✅​p ✅​p ✅​p ✅​p 🔄​f 🔄​f ✅​p ✅​p ✅​p ✅​p

@simonfaltum simonfaltum left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full diff plus the supporting packages (localcache, cmd/labs/github, clear_cache.go), and ran an independent second-model pass over the same diff; both converged on the same two issues, so requesting changes for those (details inline):

  1. labs clear-cache does not know about the new cache file.
  2. An offline cold start writes an empty installable cache that then sticks for 24h, and (1) means clear-cache cannot fix it.

Both fixes are small. Two smaller notes inline (changelog wording given #5559 is still open, and a test nit).

Checked and found sound: the errgroup filter (writes to distinct slice elements, first-error semantics, limit 10, ctx propagation), preserved ordering and archived/fork semantics, graceful offline behavior when caches exist, the raw.githubusercontent choice and its failure mode (failing loudly beats caching a partial list for 24h), no stale-cache hazard on default_branch (it has been in ghRepo since #914, so old on-disk caches have it), and the test design (the blueprint fixture proof in TestListingWorks is a nice touch). Unit tests for cmd/labs, cmd/labs/github, and cmd/labs/localcache pass locally, including a -race run of the new test.

This review was written by Isaac, an AI coding agent, with an independent second pass by another model.

Comment thread cmd/labs/list.go Outdated
Comment thread cmd/labs/list.go Outdated
Comment thread NEXT_CHANGELOG.md Outdated
Comment thread cmd/labs/list_test.go Outdated

@asnare asnare left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that labs list should only list installable projects, and considered the problem back when I addressed the paging issue.

The problem with testing for labs.yml is the additional REST calls that are required:

  • Before this PR: 1 request (results cached for 24h)
  • After this request: 1 + N requests, where N is currently 39.

Although this implementation avoids the 60/IP/hour quota on the REST API by hitting the CDN directly, in terms of light-touch I think I'd prefer to filter projects based on repository "topics". I've tagged a few with databricks-cli-installable, for testing purposes. Filtering on this has a few benefits:

  • No additional HTTP requests necessary, repository topics are already included in the response we get.
  • Caching remains simple.
  • On the labs/maintainer side, things become opt-in. At 8/39 I think opt-in is preferable to opt-out.
  • On the labs/maintainer side, turning up on labs list becomes an admin operation.

Before reviewing the technical implementation I'd like to get consensus on this.

P.S. I also rejected using the GraphQL API as a solution to detect the presence of labs.yml: calls need to be authenticated, and the quota system would still make it costly.

@janniklasrose

Copy link
Copy Markdown
Contributor Author

@asnare I like the databricks-cli-installable tag idea, it's simple (the diff now looks much cleaner). Any concern with tagging the other 5 installable repos before we proceed here? Do we need alignment from repo owners, or is the fact that these repos already support labs install enough to warrant the tag?

@asnare asnare left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me.

@janniklasrose: the installable repositories all have the magic tag now.

@asnare asnare requested a review from simonfaltum June 22, 2026 08:44
@janniklasrose janniklasrose added this pull request to the merge queue Jun 30, 2026
Merged via the queue into main with commit 53dceca Jun 30, 2026
25 checks passed
@janniklasrose janniklasrose deleted the janniklasrose/labs-list-installable branch June 30, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants