Skip to content

Commit 6115d17

Browse files
authored
auth: silently fall back to plaintext when keyring is unreachable on login (#5181)
## Why Today, when secure storage is selected but the OS keyring is unreachable (no D-Bus on Linux, headless SSH, WSL1, locked keychain that hangs for 3s), `databricks auth login` errors out and tells the user to set `DATABRICKS_AUTH_STORAGE=plaintext`. That is a hard wall for users who do not know in advance whether their environment has a working keyring, and the failure typically lands after the user has already completed the browser flow. The team agreed to aim for security by default, but do not block users when the keyring is not available. Scope of this PR: silent-fallback wiring for the auth login path, plus probe and resolver-with-source plumbing. Pin-on-success across modes is the correct end state but lands with **MS4** alongside the default flip from plaintext to secure. Pinning today (default = plaintext) would freeze every user into plaintext and neutralize MS4. Telemetry and the `databricks auth storage <mode>` command are intentionally out of scope and tracked separately. ## Changes **Before:** `databricks auth login` with secure storage on a machine without a keyring fails with an error after OAuth, regardless of how secure was selected. **Now:** - **Default mode** today resolves to plaintext (unchanged). The silent-fallback wiring in `applyLoginFallback` is dormant: the `(mode=Secure, explicit=false, probe fail)` branch is unreachable through the resolver until MS4 flips the default to secure. When that flip happens, default users on a broken keyring fall back to file silently and the fallback persists `auth_storage = plaintext` so subsequent commands skip the (slow/blocking) probe. - **Explicit secure** (env var, config, or override flag) + probe fail: return a clear error. "I want secure" is honored strictly, never silently downgraded. This avoids the divergence GPT 5.5 review caught: writing the token to file while leaving `auth_storage = secure` in config would make `auth token` and bundle commands fail on the next call because they would still resolve to secure and hit the unreachable keyring. Implementation: - `storage.ProbeKeyring()` performs a write+delete cycle with the existing 3s timeout to detect a usable keyring without leaving stray entries. - `storage.ResolveStorageModeWithSource()` returns the resolved mode plus whether it came from an explicit user choice (override / env / config) versus the default. - `storage.ResolveCacheForLogin()` wraps the resolver. For default-secure + probe failure it falls back; for explicit-secure + probe failure it returns an error; for any non-secure mode it skips the probe entirely. - `databrickscfg.SetConfiguredAuthStorage()` writes the key under `[__settings__]`, mirroring `SetDefaultProfile`. Used by the silent-fallback persist. - `cmd/auth/login.go` swaps `ResolveCache` for `ResolveCacheForLogin`. Read paths (`auth token`, bundle commands) keep the original keyring error so they do not silently mint plaintext copies of tokens that live in the keyring on another machine. ## Test plan - [x] Unit: `ProbeKeyring` success cleans up after itself; Set/Delete error and Set timeout each propagate. - [x] Unit: `ResolveStorageModeWithSource` returns `explicit=false` for default and `explicit=true` for override / env / config. - [x] Unit: `applyLoginFallback` falls back and persists `auth_storage = plaintext` for default-secure + probe fail. - [x] Unit: `applyLoginFallback` returns a "secure storage was requested" error for explicit-secure + probe fail, and does not write config. - [x] Unit: `resolveCacheForLoginWith` errors out for explicit secure (env, config, override) when the probe fails. - [x] Unit: `SetConfiguredAuthStorage` creates the file/section as needed and preserves `default_profile`. - [x] `./task checks` clean - [x] `./task lint-q` 0 issues - [x] All `cmd/auth`, `libs/auth/storage`, `libs/databrickscfg` unit tests pass - [x] All `acceptance/cmd/auth/storage-modes` and `acceptance/cmd/auth/login` acceptance tests pass This pull request and its description were written by Isaac.
1 parent ef90f53 commit 6115d17

9 files changed

Lines changed: 481 additions & 14 deletions

File tree

cmd/auth/login.go

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,11 @@ a new profile is created.
146146
ctx := cmd.Context()
147147
profileName := cmd.Flag("profile").Value.String()
148148

149-
tokenCache, mode, err := storage.ResolveCache(ctx, "")
149+
// Resolve the cache before the browser step so a missing/locked keyring
150+
// surfaces here rather than after the user completes OAuth. When secure
151+
// is selected but the keyring is unreachable, this silently falls back
152+
// to plaintext and persists auth_storage = plaintext for next time.
153+
tokenCache, mode, err := storage.ResolveCacheForLogin(ctx, "")
150154
if err != nil {
151155
return err
152156
}

libs/auth/storage/cache.go

Lines changed: 96 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ import (
44
"context"
55
"fmt"
66

7+
"github.com/databricks/cli/libs/databrickscfg"
8+
"github.com/databricks/cli/libs/env"
9+
"github.com/databricks/cli/libs/log"
710
"github.com/databricks/databricks-sdk-go/credentials/u2m"
811
"github.com/databricks/databricks-sdk-go/credentials/u2m/cache"
912
)
@@ -12,15 +15,17 @@ import (
1215
// so unit tests can inject stubs without hitting the real OS keyring or
1316
// filesystem. Production code uses defaultCacheFactories().
1417
type cacheFactories struct {
15-
newFile func(context.Context) (cache.TokenCache, error)
16-
newKeyring func() cache.TokenCache
18+
newFile func(context.Context) (cache.TokenCache, error)
19+
newKeyring func() cache.TokenCache
20+
probeKeyring func() error
1721
}
1822

1923
// defaultCacheFactories returns the production factory set.
2024
func defaultCacheFactories() cacheFactories {
2125
return cacheFactories{
22-
newFile: func(ctx context.Context) (cache.TokenCache, error) { return NewFileTokenCache(ctx) },
23-
newKeyring: NewKeyringCache,
26+
newFile: func(ctx context.Context) (cache.TokenCache, error) { return NewFileTokenCache(ctx) },
27+
newKeyring: NewKeyringCache,
28+
probeKeyring: ProbeKeyring,
2429
}
2530
}
2631

@@ -38,6 +43,30 @@ func ResolveCache(ctx context.Context, override StorageMode) (cache.TokenCache,
3843
return resolveCacheWith(ctx, override, defaultCacheFactories())
3944
}
4045

46+
// ResolveCacheForLogin resolves the cache like ResolveCache with extra rules
47+
// for the auth login path:
48+
//
49+
// 1. When the resolved mode is secure and the user did not explicitly ask
50+
// for it (no override flag, no env var, no config), and the OS keyring
51+
// is unreachable, fall back silently to plaintext and persist
52+
// auth_storage = plaintext to [__settings__] so subsequent commands
53+
// skip the (slow/blocking) probe and route directly to the file cache.
54+
// 2. When the user explicitly asked for secure (override, env var, or
55+
// config) but the keyring is unreachable, return an error. An explicit
56+
// "I want secure" is honored strictly: never silently downgrade.
57+
//
58+
// Both rules are dormant today: the resolver default is plaintext, so
59+
// (mode=Secure, explicit=false) is unreachable. They activate when the
60+
// default flips to secure (MS4 / cli-ga). Wiring lands now so MS4 is a
61+
// single-line default flip plus pin-on-success additions.
62+
//
63+
// Login-specific. Read paths (auth token, bundle commands) keep the original
64+
// keyring error so they don't silently mint plaintext copies of tokens that
65+
// were stored in the keyring on another machine.
66+
func ResolveCacheForLogin(ctx context.Context, override StorageMode) (cache.TokenCache, StorageMode, error) {
67+
return resolveCacheForLoginWith(ctx, override, defaultCacheFactories())
68+
}
69+
4170
// WrapForOAuthArgument wraps tokenCache so SDK-side writes (Challenge, refresh)
4271
// dual-write to the legacy host-based cache key when mode is plaintext. Other
4372
// modes return tokenCache unchanged: secure mode never writes a host-key entry,
@@ -73,3 +102,66 @@ func resolveCacheWith(ctx context.Context, override StorageMode, f cacheFactorie
73102
return nil, "", fmt.Errorf("unsupported storage mode %q", string(mode))
74103
}
75104
}
105+
106+
// resolveCacheForLoginWith is the pure form of ResolveCacheForLogin. It takes
107+
// the factory set as a parameter so tests can inject stubs.
108+
func resolveCacheForLoginWith(ctx context.Context, override StorageMode, f cacheFactories) (cache.TokenCache, StorageMode, error) {
109+
mode, explicit, err := ResolveStorageModeWithSource(ctx, override)
110+
if err != nil {
111+
return nil, "", err
112+
}
113+
return applyLoginFallback(ctx, mode, explicit, f)
114+
}
115+
116+
// applyLoginFallback realizes the login-time fallback rules given an already-
117+
// resolved mode and whether the user explicitly asked for it. Split out so
118+
// tests can drive the (mode, explicit) input space directly without depending
119+
// on whatever the resolver's default mode happens to be at any point in time.
120+
//
121+
// Pin-on-success across modes (locking in the first working behavior to
122+
// insulate users from keyring flakiness) is intentionally not implemented
123+
// here. It lands with MS4 alongside the default flip; pinning before the
124+
// flip would freeze every default user into plaintext and make the flip a
125+
// no-op for them.
126+
func applyLoginFallback(ctx context.Context, mode StorageMode, explicit bool, f cacheFactories) (cache.TokenCache, StorageMode, error) {
127+
switch mode {
128+
case StorageModePlaintext:
129+
c, err := f.newFile(ctx)
130+
if err != nil {
131+
return nil, "", fmt.Errorf("open file token cache: %w", err)
132+
}
133+
return c, mode, nil
134+
case StorageModeSecure:
135+
if probeErr := f.probeKeyring(); probeErr != nil {
136+
if explicit {
137+
return nil, "", fmt.Errorf("secure storage was requested but the OS keyring is not reachable: %w", probeErr)
138+
}
139+
log.Debugf(ctx, "secure storage unavailable (%v), falling back to plaintext", probeErr)
140+
fileCache, fileErr := f.newFile(ctx)
141+
if fileErr != nil {
142+
return nil, "", fmt.Errorf("open file token cache: %w", fileErr)
143+
}
144+
if err := persistPlaintextFallback(ctx); err != nil {
145+
log.Debugf(ctx, "persisting auth_storage fallback failed: %v", err)
146+
}
147+
return fileCache, StorageModePlaintext, nil
148+
}
149+
return f.newKeyring(), mode, nil
150+
default:
151+
return nil, "", fmt.Errorf("unsupported storage mode %q", string(mode))
152+
}
153+
}
154+
155+
// persistPlaintextFallback writes auth_storage = plaintext to [__settings__]
156+
// in .databrickscfg so subsequent commands skip the (slow/blocking) keyring
157+
// probe and route straight to the file cache.
158+
//
159+
// Only called on the (mode=Secure, explicit=false) probe-failure branch. That
160+
// branch is unreachable today (resolver default is plaintext), so this is
161+
// dormant infrastructure: it activates when the default flips to secure
162+
// (MS4) and lets default-on-broken-keyring users avoid a 3s probe on every
163+
// command.
164+
func persistPlaintextFallback(ctx context.Context) error {
165+
configPath := env.Get(ctx, "DATABRICKS_CONFIG_FILE")
166+
return databrickscfg.SetConfiguredAuthStorage(ctx, string(StorageModePlaintext), configPath)
167+
}

libs/auth/storage/cache_test.go

Lines changed: 120 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@ package storage
33
import (
44
"context"
55
"errors"
6+
"os"
67
"path/filepath"
78
"testing"
89

10+
"github.com/databricks/cli/libs/databrickscfg"
911
"github.com/databricks/cli/libs/env"
1012
"github.com/databricks/databricks-sdk-go/credentials/u2m"
1113
"github.com/databricks/databricks-sdk-go/credentials/u2m/cache"
@@ -24,8 +26,9 @@ func (stubCache) Lookup(string) (*oauth2.Token, error) { return nil, cache.ErrNo
2426
func fakeFactories(t *testing.T) cacheFactories {
2527
t.Helper()
2628
return cacheFactories{
27-
newFile: func(context.Context) (cache.TokenCache, error) { return stubCache{source: "file"}, nil },
28-
newKeyring: func() cache.TokenCache { return stubCache{source: "keyring"} },
29+
newFile: func(context.Context) (cache.TokenCache, error) { return stubCache{source: "file"}, nil },
30+
newKeyring: func() cache.TokenCache { return stubCache{source: "keyring"} },
31+
probeKeyring: func() error { return nil },
2932
}
3033
}
3134

@@ -106,8 +109,9 @@ func TestResolveCache_FileFactoryErrorPropagates(t *testing.T) {
106109
ctx := t.Context()
107110
boom := errors.New("disk full")
108111
factories := cacheFactories{
109-
newFile: func(context.Context) (cache.TokenCache, error) { return nil, boom },
110-
newKeyring: func() cache.TokenCache { return stubCache{source: "keyring"} },
112+
newFile: func(context.Context) (cache.TokenCache, error) { return nil, boom },
113+
newKeyring: func() cache.TokenCache { return stubCache{source: "keyring"} },
114+
probeKeyring: func() error { return nil },
111115
}
112116

113117
_, _, err := resolveCacheWith(ctx, StorageModePlaintext, factories)
@@ -116,6 +120,118 @@ func TestResolveCache_FileFactoryErrorPropagates(t *testing.T) {
116120
assert.ErrorIs(t, err, boom)
117121
}
118122

123+
func TestResolveCacheForLogin_PlaintextSkipsProbe(t *testing.T) {
124+
hermetic(t)
125+
ctx := t.Context()
126+
probed := false
127+
f := fakeFactories(t)
128+
f.probeKeyring = func() error {
129+
probed = true
130+
return nil
131+
}
132+
133+
got, mode, err := resolveCacheForLoginWith(ctx, StorageModePlaintext, f)
134+
135+
require.NoError(t, err)
136+
assert.Equal(t, StorageModePlaintext, mode)
137+
assert.Equal(t, "file", got.(stubCache).source)
138+
assert.False(t, probed, "probe must not run when mode is already plaintext")
139+
}
140+
141+
func TestResolveCacheForLogin_SecureProbeOK(t *testing.T) {
142+
hermetic(t)
143+
ctx := env.Set(t.Context(), EnvVar, "secure")
144+
145+
got, mode, err := resolveCacheForLoginWith(ctx, "", fakeFactories(t))
146+
147+
require.NoError(t, err)
148+
assert.Equal(t, StorageModeSecure, mode)
149+
assert.Equal(t, "keyring", got.(stubCache).source)
150+
}
151+
152+
func TestResolveCacheForLogin_ExplicitEnvSecure_ProbeFail_Errors(t *testing.T) {
153+
hermetic(t)
154+
ctx := env.Set(t.Context(), EnvVar, "secure")
155+
configPath := env.Get(ctx, "DATABRICKS_CONFIG_FILE")
156+
157+
f := fakeFactories(t)
158+
f.probeKeyring = func() error { return errors.New("no keyring") }
159+
160+
_, _, err := resolveCacheForLoginWith(ctx, "", f)
161+
require.Error(t, err)
162+
assert.ErrorContains(t, err, "secure storage was requested")
163+
164+
persisted, gerr := databrickscfg.GetConfiguredAuthStorage(ctx, configPath)
165+
require.NoError(t, gerr)
166+
assert.Equal(t, "", persisted, "env-set secure must not be persisted as plaintext")
167+
}
168+
169+
func TestResolveCacheForLogin_ExplicitConfigSecure_ProbeFail_Errors(t *testing.T) {
170+
hermetic(t)
171+
ctx := t.Context()
172+
configPath := env.Get(ctx, "DATABRICKS_CONFIG_FILE")
173+
require.NoError(t, os.WriteFile(configPath, []byte("[__settings__]\nauth_storage = secure\n"), 0o600))
174+
175+
f := fakeFactories(t)
176+
f.probeKeyring = func() error { return errors.New("no keyring") }
177+
178+
_, _, err := resolveCacheForLoginWith(ctx, "", f)
179+
require.Error(t, err)
180+
assert.ErrorContains(t, err, "secure storage was requested")
181+
182+
persisted, gerr := databrickscfg.GetConfiguredAuthStorage(ctx, configPath)
183+
require.NoError(t, gerr)
184+
assert.Equal(t, "secure", persisted, "config-set secure must not be silently rewritten")
185+
}
186+
187+
func TestResolveCacheForLogin_ExplicitOverrideSecure_ProbeFail_Errors(t *testing.T) {
188+
hermetic(t)
189+
ctx := t.Context()
190+
191+
f := fakeFactories(t)
192+
f.probeKeyring = func() error { return errors.New("no keyring") }
193+
194+
_, _, err := resolveCacheForLoginWith(ctx, StorageModeSecure, f)
195+
require.Error(t, err)
196+
assert.ErrorContains(t, err, "secure storage was requested")
197+
}
198+
199+
func TestApplyLoginFallback_DefaultSecure_ProbeFail_FallsBackAndPersists(t *testing.T) {
200+
hermetic(t)
201+
ctx := t.Context()
202+
configPath := env.Get(ctx, "DATABRICKS_CONFIG_FILE")
203+
204+
f := fakeFactories(t)
205+
f.probeKeyring = func() error { return errors.New("no keyring") }
206+
207+
got, mode, err := applyLoginFallback(ctx, StorageModeSecure, false, f)
208+
209+
require.NoError(t, err)
210+
assert.Equal(t, StorageModePlaintext, mode)
211+
assert.Equal(t, "file", got.(stubCache).source)
212+
213+
persisted, err := databrickscfg.GetConfiguredAuthStorage(ctx, configPath)
214+
require.NoError(t, err)
215+
assert.Equal(t, "plaintext", persisted, "default-mode fallback must persist auth_storage = plaintext")
216+
}
217+
218+
func TestApplyLoginFallback_ExplicitSecure_ProbeFail_Errors(t *testing.T) {
219+
hermetic(t)
220+
ctx := t.Context()
221+
configPath := env.Get(ctx, "DATABRICKS_CONFIG_FILE")
222+
223+
f := fakeFactories(t)
224+
f.probeKeyring = func() error { return errors.New("no keyring") }
225+
226+
_, _, err := applyLoginFallback(ctx, StorageModeSecure, true, f)
227+
require.Error(t, err)
228+
assert.ErrorContains(t, err, "secure storage was requested")
229+
230+
persisted, gerr := databrickscfg.GetConfiguredAuthStorage(ctx, configPath)
231+
require.NoError(t, gerr)
232+
assert.Equal(t, "", persisted, "explicit-secure error must not write config")
233+
}
234+
119235
func TestWrapForOAuthArgument(t *testing.T) {
120236
const (
121237
host = "https://example.com"

libs/auth/storage/keyring.go

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import (
88
"time"
99

1010
"github.com/databricks/databricks-sdk-go/credentials/u2m/cache"
11+
"github.com/google/uuid"
1112
"github.com/zalando/go-keyring"
1213
"golang.org/x/oauth2"
1314
)
@@ -17,6 +18,14 @@ import (
1718
// cache key the SDK passes through TokenCache.Store / Lookup.
1819
const keyringServiceName = "databricks-cli"
1920

21+
// keyringProbeAccountPrefix is prefixed onto a per-call random suffix to form
22+
// the account name ProbeKeyring writes and deletes. A fixed name like
23+
// "__probe__" could collide with a user profile of the same name (which is
24+
// what keyringCache uses as the account field), so the probe would clobber
25+
// and delete that user's stored token. Per-call randomness also means
26+
// concurrent probes don't step on each other.
27+
const keyringProbeAccountPrefix = "__probe_"
28+
2029
// defaultKeyringTimeout is how long a single keyring operation is allowed
2130
// to run before the wrapper returns a TimeoutError. Matches the value used
2231
// by GitHub CLI.
@@ -79,6 +88,35 @@ func NewKeyringCache() cache.TokenCache {
7988
}
8089
}
8190

91+
// ProbeKeyring returns nil if the OS keyring is reachable and accepts a
92+
// write+delete cycle within the standard timeout. A non-nil error means the
93+
// keyring cannot be used in this environment (no backend, headless Linux
94+
// session waiting on a UI prompt, locked keychain refusing access, etc.).
95+
//
96+
// Used by databricks auth login to decide whether to silently fall back to
97+
// plaintext storage before opening the browser, so the user does not
98+
// complete an OAuth flow only to fail at the final Store call.
99+
func ProbeKeyring() error {
100+
return probeWithBackend(zalandoBackend{}, defaultKeyringTimeout)
101+
}
102+
103+
func probeWithBackend(backend keyringBackend, timeout time.Duration) error {
104+
c := &keyringCache{
105+
backend: backend,
106+
timeout: timeout,
107+
keyringSvcName: keyringServiceName,
108+
}
109+
account := keyringProbeAccountPrefix + uuid.NewString()
110+
tok := &oauth2.Token{AccessToken: "probe"}
111+
if err := c.Store(account, tok); err != nil {
112+
return fmt.Errorf("write: %w", err)
113+
}
114+
if err := c.Store(account, nil); err != nil {
115+
return fmt.Errorf("delete: %w", err)
116+
}
117+
return nil
118+
}
119+
82120
// Store stores t under key. Nil t deletes the entry; deleting a missing
83121
// entry is not an error.
84122
func (k *keyringCache) Store(key string, t *oauth2.Token) error {

0 commit comments

Comments
 (0)