Skip to content

Commit 3204e67

Browse files
authored
Make secure token storage the default storage mode (#5272)
## Why Part of CLI GA. Storing long-lived OAuth refresh tokens for interactive logins (`auth_type = databricks-cli`) in a plain JSON file in the user's home directory is a security weakness: any process with home-directory access can read them. The CLI is increasingly the entry point for local agent workflows, so we want tokens in the OS-native secure store by default. The flip has to handle one big "but": not every system has a usable OS keyring. Linux containers, headless SSH sessions, WSL1, and some CI runners do not have a D-Bus session bus. On those systems the keyring is not merely empty, it is not reachable at all. If we shipped the default flip alone, every command on those systems would error out with a cryptic backend message ("Cannot autolaunch D-Bus without X11 $DISPLAY", etc.) until the user manually set `DATABRICKS_AUTH_STORAGE=plaintext` or re-ran `databricks auth login`. That is a real support burden for the GA window. This PR ships the default flip together with the supporting UX. Three pieces, each scoped narrowly: 1. **Pin-on-success** after a successful login on the default-secure path, so a later transient keyring failure cannot silently demote a working secure-storage user back to plaintext. 2. **Read-path fallback for unavailable keyrings only.** When the keyring backend itself is unreachable (no D-Bus, no Secret Service daemon, etc.), reads fall through to the file cache so pre-upgrade `token-cache.json` entries stay accessible without manual configuration. **This does not fire when the keyring is reachable but empty** — that is the normal post-upgrade case, where we still surface a clear "run `databricks auth login` to sign in" nudge so the user moves their tokens into the secure store rather than silently keeping them in plaintext. 3. **Friendlier `ErrNotFound` messages** that tell the user what to do, with upgrade-specific copy when a legacy `token-cache.json` is present. ## Behavior matrix The distinction between "keyring not available" and "keyring available but empty" drives most of the design: | Scenario | Read-path behavior | | --- | --- | | Keyring reachable, has token for profile | Return token from keyring. | | Keyring reachable, no token for profile | `ErrNotFound` wrapped with "no cached credentials; run `databricks auth login` to sign in" (or upgrade copy if `token-cache.json` exists). User re-authenticates and writes the new token to the keyring. | | Keyring NOT reachable, default-secure mode | Silent fall-back to file cache. Pre-upgrade tokens keep working. **No nudge, no error.** | | Keyring NOT reachable, user explicitly chose secure (env / config) | Return the keyring cache anyway. The actual `Lookup` surfaces the unreachability rather than being silently downgraded against the user's stated intent. | | Keyring probe times out | Stay on the keyring. A locked keyring being unlocked is the common timeout case; misdiagnosing it as "unavailable" would silently route reads to a different backend. | ## Changes Before: tokens were written to `~/.databricks/token-cache.json`. Setting `DATABRICKS_AUTH_STORAGE=secure` opted in to the OS keyring. Now: - Default storage is the OS keyring (Keychain / Credential Manager / Secret Service). Users re-run `databricks auth login` once after upgrade. - `DATABRICKS_AUTH_STORAGE=plaintext` or `[__settings__].auth_storage = plaintext` opts back to the file cache. Env wins over config. - After a successful login on the default-secure path, the CLI writes `auth_storage = secure` to `[__settings__]`. Pins the choice so a later transient probe failure cannot silently demote the user. - Read paths cheap-probe the keyring with a read-only `Get` on a non-existent account. If the backend is unreachable on the default-secure path, the file cache is returned instead. The probe and fall-back are scoped strictly to backend-unavailability, not to empty results. Read-path fallback does NOT pin; pinning stays exclusive to login, which has the stronger write-probe signal and an explicit user action. - `ErrNotFound` from `Lookup` is wrapped with actionable copy: generic case "no cached credentials; run `databricks auth login` to sign in"; upgrade case (mode=secure AND `~/.databricks/token-cache.json` has entries) "stored credentials from older CLI versions are no longer used; run `databricks auth login` to sign in again, or set `DATABRICKS_AUTH_STORAGE=plaintext` to keep using the file cache". - Non-`ErrNotFound` keyring errors get wrapped with the same actionable hint so users on no-keyring systems who somehow bypass the probe (e.g. explicit-secure callers) see "OS keyring unreachable: ... (set `DATABRICKS_AUTH_STORAGE=plaintext` or run `databricks auth login`)" instead of a raw D-Bus message. - Login-time silent fallback (already on `main` as dormant infrastructure) activates and pins. Implementation: - `libs/auth/storage/mode.go`: resolver default flips from `StorageModePlaintext` to `StorageModeSecure`. Constant doc comments updated. - `libs/auth/storage/cache.go`: drops "dormant today" comments. New `PinSecureMode` (login-side pin) and `applyReadFallback` (read-side fallback). `cacheFactories` gains `probeKeyringRead`. `persistPlaintextFallback` now logs internally at debug for shape-consistency with `PinSecureMode`. - `libs/auth/storage/keyring.go`: new `ProbeKeyringRead` (read-only probe). `Lookup` wraps non-`ErrNotFound` errors with the unreachability hint. - `libs/auth/storage/not_found_hint.go` (new): `notFoundHintCache` wraps `ResolveCache` / `ResolveCacheForLogin` so `ErrNotFound` from `Lookup` carries an actionable hint without getting sandwiched between the SDK's `cache:` prefix and `ErrNotFound`'s tail. - `cmd/auth/login.go`, `cmd/auth/token.go`: call `storage.PinSecureMode` after each `persistentAuth.Challenge()`. `login.go` also moves `ResolveCacheForLogin` to run after input validation so trivially-invalid commands no longer probe the keyring. - Unit tests cover all of the above (`PinSecureMode` cases, `applyReadFallback` cases, `ProbeKeyringRead`, `notFoundHintCache`, `legacyCacheHasTokens`). - `acceptance/script.prepare` forces `DATABRICKS_AUTH_STORAGE=plaintext` at the root so existing auth acceptance tests keep exercising the file-backed path. Tests that want the resolver default override it. - `acceptance/cmd/auth/describe/u2m-plaintext-default` renamed to `u2m-secure-default`; its `test.toml` adds a `[[Repls]]` regex normalizing the platform-dependent keyring lookup error. - `acceptance/cmd/auth/describe/u2m-json-output`, `u2m-plaintext-env`, `u2m-plaintext-config`: regenerated to match the new error copy. - `cmd/auth/auth_test.go`: `TestProfileHostCompatibleViaCobra` copies the fixture into a temp directory so the resolver's writes can never dirty the checked-in file. - `NEXT_CHANGELOG.md`: breaking-change entry under Notable Changes covering the flip, the re-login requirement, both opt-out paths, and the read-path fallback for systems without a usable keyring. ## Test plan - [x] `task checks` clean - [x] `task lint-q` clean - [x] `go test ./libs/auth/... ./cmd/auth/... ./libs/databrickscfg/...` passes - [x] `go test ./acceptance -run 'TestAccept/cmd/auth'` passes on macOS - [x] `go test ./acceptance -run 'TestAccept/cmd/configure'` passes (covers a `databricks-cli` auth path outside `cmd/auth`) - [ ] Linux CI is the real test for the `[[Repls]]` regex in `u2m-secure-default/test.toml` (macOS clean miss vs. Linux backend error). - [ ] Manual: with `DATABRICKS_AUTH_STORAGE` unset, `databricks auth login --profile X` writes to the keyring and persists `auth_storage = secure` to `[__settings__]`. - [ ] Manual: `DATABRICKS_AUTH_STORAGE=plaintext databricks auth login --profile X` continues to write to `~/.databricks/token-cache.json` with the host-key dual-write entry; `[__settings__]` is not modified. - [ ] Manual: keyring reachable but empty for the current profile, an auth command produces the "run `databricks auth login` to sign in" nudge (not a silent fall-back). - [ ] Manual: keyring NOT reachable (Linux container, headless SSH), an auth command silently uses the file cache; a populated pre-upgrade `token-cache.json` keeps working. This pull request and its description were written by Isaac.
1 parent 193be7f commit 3204e67

27 files changed

Lines changed: 915 additions & 88 deletions

File tree

NEXT_CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
### Notable Changes
66

7+
* Breaking change: OAuth tokens for interactive logins (`auth_type = databricks-cli`) are now stored in the OS-native secure store by default (Keychain on macOS, Credential Manager on Windows, Secret Service on Linux) instead of `~/.databricks/token-cache.json`. After upgrading, run `databricks auth login` once per profile to re-authenticate; cached tokens from older versions are not migrated. To keep the previous file-backed storage, set `DATABRICKS_AUTH_STORAGE=plaintext` or add `auth_storage = plaintext` under `[__settings__]` in `~/.databrickscfg` (the env var takes precedence over the config setting), then re-run `databricks auth login`. On systems where the OS keyring is not reachable (e.g. Linux containers without a D-Bus session bus), the CLI transparently falls back to the file cache when reading tokens so legacy `token-cache.json` entries remain accessible without manual configuration.
8+
79
### CLI
810

911
* Added `databricks aitools` command group for installing Databricks skills into your coding agents (Claude Code, Cursor, Codex CLI, OpenCode, GitHub Copilot, Antigravity). Skills are fetched from [github.com/databricks/databricks-agent-skills](https://github.com/databricks/databricks-agent-skills) and either symlinked into each agent's skills directory or copied into the current project. Use `databricks aitools install` to set up, `update` to pull newer versions, `list` to see what's available, and `uninstall` to remove them.

acceptance/cmd/auth/describe/u2m-json-output/output.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
>>> [CLI] auth describe --profile u2m-profile --output json
33
Warn: [hostmetadata] failed to fetch host metadata for https://u2m-profile.databricks.test, will skip for 1m0s
44
{
5-
"mode": "plaintext",
6-
"location": "~/.databricks/token-cache.json",
5+
"mode": "secure",
6+
"location": "OS keyring (service: databricks-cli)",
77
"source": "default"
88
}

acceptance/cmd/auth/describe/u2m-plaintext-config/output.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
>>> [CLI] auth describe --profile u2m-profile
33
Warn: [hostmetadata] failed to fetch host metadata for https://u2m-profile.databricks.test, will skip for 1m0s
4-
Unable to authenticate: error getting token: cache: token not found
4+
Unable to authenticate: error getting token: cache: no cached credentials; run `databricks auth login` to sign in
55
Token storage: plaintext, ~/.databricks/token-cache.json (from auth_storage in [__settings__] section of [TEST_TMP_DIR]/home/.databrickscfg)
66
-----
77
Current configuration:

acceptance/cmd/auth/describe/u2m-plaintext-default/test.toml

Lines changed: 0 additions & 3 deletions
This file was deleted.

acceptance/cmd/auth/describe/u2m-plaintext-env/output.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
>>> [CLI] auth describe --profile u2m-profile
33
Warn: [hostmetadata] failed to fetch host metadata for https://u2m-profile.databricks.test, will skip for 1m0s
4-
Unable to authenticate: error getting token: cache: token not found
4+
Unable to authenticate: error getting token: cache: no cached credentials; run `databricks auth login` to sign in
55
Token storage: plaintext, ~/.databricks/token-cache.json (from DATABRICKS_AUTH_STORAGE environment variable)
66
-----
77
Current configuration:

acceptance/cmd/auth/describe/u2m-plaintext-default/out.test.toml renamed to acceptance/cmd/auth/describe/u2m-secure-default/out.test.toml

File renamed without changes.

acceptance/cmd/auth/describe/u2m-plaintext-default/output.txt renamed to acceptance/cmd/auth/describe/u2m-secure-default/output.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11

22
>>> [CLI] auth describe --profile u2m-profile
33
Warn: [hostmetadata] failed to fetch host metadata for https://u2m-profile.databricks.test, will skip for 1m0s
4-
Unable to authenticate: error getting token: cache: token not found
5-
Token storage: plaintext, ~/.databricks/token-cache.json (from default)
4+
Unable to authenticate: error getting token: [KEYRING_LOOKUP_ERROR]
5+
Token storage: secure, OS keyring (service: databricks-cli) (from default)
66
-----
77
Current configuration:
88
✓ host: https://u2m-profile.databricks.test (from [TEST_TMP_DIR]/home/.databrickscfg config file)

acceptance/cmd/auth/describe/u2m-plaintext-default/script renamed to acceptance/cmd/auth/describe/u2m-secure-default/script

File renamed without changes.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Ignore = [
2+
"home"
3+
]
4+
5+
# This test runs against the real OS keyring at Lookup time (no writes).
6+
# macOS produces a clean miss; Linux without a usable D-Bus session bus
7+
# produces a backend error. Normalize both so the assertion stays on the
8+
# resolved storage mode, not the lookup outcome.
9+
[[Repls]]
10+
Old = 'Unable to authenticate: error getting token: .*'
11+
New = 'Unable to authenticate: error getting token: [KEYRING_LOOKUP_ERROR]'

acceptance/cmd/auth/logout/stale-account-id-workspace-host/output.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,4 @@ logfood (Default) [DATABRICKS_URL] NO
3636

3737
=== Logged out profile should no longer return a token
3838
>>> musterr [CLI] auth token --profile logfood
39-
Error: cache: databricks OAuth is not configured for this host. Try logging in again with `databricks auth login --profile logfood` before retrying. If this fails, please report this issue to the Databricks CLI maintainers at https://github.com/databricks/cli/issues/new
39+
Error: cache: databricks OAuth is not configured for this host. no cached credentials; run `databricks auth login` to sign in

0 commit comments

Comments
 (0)