Skip to content

Commit e03dc7a

Browse files
Benehikoclaude
andcommitted
fix(keychain): make Linux write path and dedup tests relock-aware
The real-keyring tests added by this PR fail intermittently in CI with "Cannot create an item in a locked collection" and a duplicate backlog that never collapses. Both are the gnome-keyring relock race: the store dials a fresh D-Bus connection per operation, and a prior op's closing connection relocks the collection asynchronously — after the unlock but before the call against the collection runs. IsLocked cannot guard it because the state changes between the check and the call. React to the authoritative signal instead. Add withRelockRetry, which retries an operation with bounded exponential backoff when it fails with the org.freedesktop.Secret.Error.IsLocked D-Bus error, re-unlocking each attempt and aborting immediately if Unlock itself fails (e.g. a dismissed prompt). Wrap the lock-gated mutations with it: - Save: the create-when-absent CreateItem, the in-place SetItemSecret, and the best-effort duplicate-collapse DeleteItem loop. The collapse delete stays best-effort but is now relock-aware: a swallowed locked error would otherwise leave the duplicates this feature exists to drain (issue #446). - Delete: DeleteItem. Reads (Get/Filter GetSecret) are equally race-prone but are left to the dedicated relock-retry change (#560); this commit covers the write path the new tests exercise. Also make the tests' own direct-daemon helpers relock-aware, since they talk to the daemon outside the store: seedRealDuplicates' CreateItem and purgeRealItems' DeleteItem now go through withRelockRetry. Tests: fake-backed TestKeychainSaveRetriesWhenCreateRelocks, TestKeychainSaveRetriesWhenSetSecretRelocks, TestKeychainSaveCollapseRetriesWhenDeleteRelocks and TestKeychainSaveStopsRetryingAfterMaxRelocks. Verified green against real gnome-keyring on the fedora-43 and ubuntu-24 CI harnesses; lint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 759357d commit e03dc7a

2 files changed

Lines changed: 224 additions & 9 deletions

File tree

store/keychain/keychain_linux.go

Lines changed: 92 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ import (
2323
"fmt"
2424
"maps"
2525
"slices"
26+
"time"
2627

2728
dbus "github.com/godbus/dbus/v5"
2829

@@ -148,6 +149,81 @@ func isCollectionUnlocked(collectionPath dbus.ObjectPath, service secretService)
148149
return errCollectionLocked
149150
}
150151

152+
// secretServiceIsLockedError is the D-Bus error name the secret service returns
153+
// when a mutating call (e.g. CreateItem) targets a locked collection.
154+
//
155+
// https://specifications.freedesktop.org/secret-service-spec/latest/errors.html
156+
const secretServiceIsLockedError = "org.freedesktop.Secret.Error.IsLocked"
157+
158+
// isLockedDBusError reports whether err is the secret service's "collection is
159+
// locked" D-Bus error. The lock state is matched on the structured D-Bus error
160+
// name rather than the human-readable message so it is stable across backends
161+
// and locales.
162+
func isLockedDBusError(err error) bool {
163+
var dbusErr dbus.Error
164+
return errors.As(err, &dbusErr) && dbusErr.Name == secretServiceIsLockedError
165+
}
166+
167+
// Relock retry tuning. An operation that hits a relocked collection is retried
168+
// with exponential backoff: the relock is a brief race that settles on its own,
169+
// and spacing the attempts out avoids hammering the secret service (or, on a
170+
// password-protected keyring, re-issuing Unlock fast enough to spam the user
171+
// with authentication prompts).
172+
//
173+
// relockRetryMaxDelay caps the backoff growth; with the current
174+
// maxRelockRetries the slept delays are 20,40,80,160,320ms (the cap only takes
175+
// effect once maxRelockRetries reaches 6, where the sixth delay would otherwise
176+
// be 640ms).
177+
const (
178+
maxRelockRetries = 5
179+
relockRetryBaseDelay = 20 * time.Millisecond
180+
relockRetryMaxDelay = 500 * time.Millisecond
181+
)
182+
183+
// sleepFn is the sleep seam used by the relock backoff so tests can exercise the
184+
// retry loop without real delays. It is a package-level var with no
185+
// synchronisation, so tests that swap it must not run in parallel.
186+
var sleepFn = time.Sleep
187+
188+
// withRelockRetry runs a collection operation, retrying it with exponential
189+
// backoff when the secret service rejects it because the collection is locked.
190+
//
191+
// The store dials a fresh D-Bus connection for every operation and closes it on
192+
// return. gnome-keyring scopes an unlock to the session that performed it, so
193+
// when a previous operation's connection closes the daemon relocks the
194+
// collection — and that relock can land asynchronously in the middle of a later
195+
// operation, after we have already observed the collection as unlocked but
196+
// before the call against the collection runs. The result is an intermittent
197+
// "Cannot create an item in a locked collection" error even though we unlocked
198+
// moments earlier. IsLocked cannot guard against this because the state changes
199+
// between the check and the call, so we react to the authoritative signal — the
200+
// operation's own locked error — by unlocking again and retrying.
201+
//
202+
// In the common case this is the passwordless auto-unlock path (e.g. the
203+
// PAM-unlocked login keyring), where Unlock returns the null prompt and asks
204+
// the user for nothing. withRelockRetry cannot itself prove the keyring is
205+
// passwordless, so on a password-protected keyring a retry could surface an
206+
// authentication prompt; the bounded retry count and backoff keep that to a
207+
// handful of spaced-out prompts at worst, and a dismissed prompt makes Unlock
208+
// return an error that aborts the loop immediately rather than re-prompting.
209+
func withRelockRetry(service secretService, collectionPath dbus.ObjectPath, op func() error) error {
210+
err := op()
211+
delay := relockRetryBaseDelay
212+
for attempt := 0; attempt < maxRelockRetries && isLockedDBusError(err); attempt++ {
213+
sleepFn(delay)
214+
delay = min(delay*2, relockRetryMaxDelay)
215+
if unlockErr := service.Unlock([]dbus.ObjectPath{collectionPath}); unlockErr != nil {
216+
// Surface why the retry stopped while preserving errors.Is on the
217+
// underlying Unlock error (e.g. a dismissed prompt). The original
218+
// locked error is intentionally dropped: the failed unlock is the
219+
// actionable cause once we have decided to stop retrying.
220+
return fmt.Errorf("unlock after relock: %w", unlockErr)
221+
}
222+
err = op()
223+
}
224+
return err
225+
}
226+
151227
type keychainStore[T store.Secret] struct {
152228
serviceGroup string
153229
serviceName string
@@ -198,7 +274,9 @@ func (k *keychainStore[T]) Delete(_ context.Context, id store.ID) error {
198274
return nil
199275
}
200276

201-
return service.DeleteItem(items[0])
277+
return withRelockRetry(service, objectPath, func() error {
278+
return service.DeleteItem(items[0])
279+
})
202280
}
203281

204282
func (k *keychainStore[T]) Get(ctx context.Context, id store.ID) (store.Secret, error) {
@@ -406,8 +484,10 @@ func (k *keychainStore[T]) Save(_ context.Context, id store.ID, secret store.Sec
406484
// Nothing stored yet: create a fresh item.
407485
if len(items) == 0 {
408486
properties := kc.NewSecretProperties(label, attributes)
409-
_, err = service.CreateItem(objectPath, properties, sessSecret, kc.ReplaceBehaviorReplace)
410-
return err
487+
return withRelockRetry(service, objectPath, func() error {
488+
_, createErr := service.CreateItem(objectPath, properties, sessSecret, kc.ReplaceBehaviorReplace)
489+
return createErr
490+
})
411491
}
412492

413493
// Update the first match in place. Its object path is preserved, so the
@@ -416,13 +496,20 @@ func (k *keychainStore[T]) Save(_ context.Context, id store.ID, secret store.Sec
416496
// the attributes and label and collapsing any pre-existing duplicates are
417497
// best-effort (the secret is already stored) and must not flip the result.
418498
primary := items[0]
419-
if err := service.SetItemSecret(primary, sessSecret); err != nil {
499+
if err := withRelockRetry(service, objectPath, func() error {
500+
return service.SetItemSecret(primary, sessSecret)
501+
}); err != nil {
420502
return err
421503
}
422504
_ = service.SetItemAttributes(primary, attributes)
423505
_ = service.SetItemLabel(primary, label)
424506
for _, dup := range items[1:] {
425-
_ = service.DeleteItem(dup)
507+
// Best-effort, but still relock-aware: a collection that relocks
508+
// mid-collapse would otherwise leave the duplicates the whole feature
509+
// exists to drain (see withRelockRetry and issue #446).
510+
_ = withRelockRetry(service, objectPath, func() error {
511+
return service.DeleteItem(dup)
512+
})
426513
}
427514

428515
return nil

store/keychain/keychain_linux_test.go

Lines changed: 132 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,13 +48,34 @@ type fakeService struct {
4848
// concurrency-safe: the tests that read them drive a single sequential
4949
// operation through the fake.
5050
createCalls int
51+
setSecretCalls int
52+
deleteCalls int
5153
setSecretItems []dbus.ObjectPath
5254
deletedItems []dbus.ObjectPath
5355

56+
// {createItem,setSecret,deleteItem}LockedErrs is how many leading calls of
57+
// each kind fail with the secret service "collection is locked" D-Bus error
58+
// before one succeeds, simulating a collection that relocks underneath the
59+
// store (see withRelockRetry). The error is wrapped exactly as the real
60+
// service wraps it, so the tests exercise the errors.As-through-wrap path
61+
// isLockedDBusError depends on. unlockCalls counts the re-unlocks the retry
62+
// issues; unlockErr, when set, makes Unlock fail (e.g. a dismissed prompt).
63+
createItemLockedErrs int
64+
setSecretLockedErrs int
65+
deleteItemLockedErrs int
66+
unlockCalls int
67+
unlockErr error
68+
5469
opened atomic.Int64
5570
closed atomic.Int64
5671
}
5772

73+
// lockedErr mirrors how the real SecretService wraps a locked-collection D-Bus
74+
// error (see secretservice.go), so isLockedDBusError must unwrap to detect it.
75+
func lockedErr(op string) error {
76+
return fmt.Errorf("failed to %s: %w", op, dbus.Error{Name: secretServiceIsLockedError})
77+
}
78+
5879
func (f *fakeService) Collections() ([]dbus.ObjectPath, error) {
5980
return []dbus.ObjectPath{loginKeychainObjectPath}, nil
6081
}
@@ -65,24 +86,39 @@ func (f *fakeService) OpenSession(kc.AuthenticationMode) (*kc.Session, error) {
6586
// lets the Save path run end-to-end against the fake.
6687
return &kc.Session{Mode: kc.AuthenticationInsecurePlain}, nil
6788
}
68-
func (f *fakeService) CloseSession(*kc.Session) {}
69-
func (f *fakeService) Unlock([]dbus.ObjectPath) error { return nil }
89+
func (f *fakeService) CloseSession(*kc.Session) {}
90+
func (f *fakeService) Unlock([]dbus.ObjectPath) error {
91+
f.unlockCalls++
92+
return f.unlockErr
93+
}
94+
7095
func (f *fakeService) SearchCollection(dbus.ObjectPath, kc.Attributes) ([]dbus.ObjectPath, error) {
7196
return f.items, nil
7297
}
7398

7499
func (f *fakeService) CreateItem(dbus.ObjectPath, map[string]dbus.Variant, kc.Secret, kc.ReplaceBehavior) (dbus.ObjectPath, error) {
75100
f.createCalls++
101+
if f.createCalls <= f.createItemLockedErrs {
102+
return "", lockedErr("create item")
103+
}
76104
return "/created", nil
77105
}
78106

79107
func (f *fakeService) DeleteItem(item dbus.ObjectPath) error {
108+
f.deleteCalls++
109+
if f.deleteCalls <= f.deleteItemLockedErrs {
110+
return lockedErr("delete item")
111+
}
80112
f.deletedItems = append(f.deletedItems, item)
81113
return nil
82114
}
83115
func (f *fakeService) GetAttributes(dbus.ObjectPath) (kc.Attributes, error) { return nil, nil }
84116
func (f *fakeService) GetSecret(dbus.ObjectPath, kc.Session) ([]byte, error) { return nil, nil }
85117
func (f *fakeService) SetItemSecret(item dbus.ObjectPath, _ kc.Secret) error {
118+
f.setSecretCalls++
119+
if f.setSecretCalls <= f.setSecretLockedErrs {
120+
return lockedErr("set item secret")
121+
}
86122
f.setSecretItems = append(f.setSecretItems, item)
87123
return nil
88124
}
@@ -105,6 +141,16 @@ func withFakeService(t *testing.T, fake *fakeService) {
105141
}
106142
}
107143

144+
// stubRelockSleep replaces the relock backoff sleep with a no-op so retry tests
145+
// run without real delays, restoring it on cleanup. It mutates the package-level
146+
// sleepFn var, so tests using it must not run in parallel.
147+
func stubRelockSleep(t *testing.T) {
148+
t.Helper()
149+
orig := sleepFn
150+
t.Cleanup(func() { sleepFn = orig })
151+
sleepFn = func(time.Duration) {}
152+
}
153+
108154
// TestKeychainGetNotFound exercises the full Get path against the fake — open,
109155
// resolve collection, search — and asserts an empty search maps to
110156
// ErrCredentialNotFound, all without a live keyring.
@@ -180,6 +226,79 @@ func TestKeychainSaveCollapsesDuplicatesInPlace(t *testing.T) {
180226
"the remaining duplicates must be collapsed, leaving only the first match")
181227
}
182228

229+
// TestKeychainSaveRetriesWhenCreateRelocks covers the create path: gnome-keyring
230+
// can relock the collection between the unlock and the CreateItem, so a fresh
231+
// Save must react to the locked error by unlocking and retrying rather than
232+
// failing.
233+
func TestKeychainSaveRetriesWhenCreateRelocks(t *testing.T) {
234+
stubRelockSleep(t)
235+
fake := &fakeService{} // no items -> create path
236+
fake.createItemLockedErrs = 2
237+
withFakeService(t, fake)
238+
239+
ks := setupKeychain(t, nil)
240+
require.NoError(t, ks.Save(t.Context(), store.MustParseID("com.test.test/test/bob"),
241+
&mocks.MockCredential{Username: "bob", Password: "bob-password"}))
242+
243+
assert.Equal(t, 3, fake.createCalls, "two locked failures then one success")
244+
assert.Equal(t, 2, fake.unlockCalls, "exactly one Unlock per relock retry")
245+
}
246+
247+
// TestKeychainSaveRetriesWhenSetSecretRelocks covers the in-place update path:
248+
// the SetItemSecret that rewrites the surviving item must survive a relock.
249+
func TestKeychainSaveRetriesWhenSetSecretRelocks(t *testing.T) {
250+
stubRelockSleep(t)
251+
fake := &fakeService{items: []dbus.ObjectPath{"/item/a"}}
252+
fake.setSecretLockedErrs = 2
253+
withFakeService(t, fake)
254+
255+
ks := setupKeychain(t, nil)
256+
require.NoError(t, ks.Save(t.Context(), store.MustParseID("com.test.test/test/bob"),
257+
&mocks.MockCredential{Username: "bob", Password: "bob-password"}))
258+
259+
assert.Equal(t, []dbus.ObjectPath{"/item/a"}, fake.setSecretItems,
260+
"the secret must be written in place once the relock clears")
261+
assert.Equal(t, 2, fake.unlockCalls, "exactly one Unlock per relock retry")
262+
}
263+
264+
// TestKeychainSaveCollapseRetriesWhenDeleteRelocks is the unit-level counterpart
265+
// of the real-keyring backlog test: collapsing a duplicate must drain it even if
266+
// the collection relocks mid-delete. The collapse delete is best-effort, but a
267+
// silently swallowed locked error would leave the duplicate behind — the exact
268+
// #446 symptom — so it is still relock-aware.
269+
func TestKeychainSaveCollapseRetriesWhenDeleteRelocks(t *testing.T) {
270+
stubRelockSleep(t)
271+
fake := &fakeService{items: []dbus.ObjectPath{"/item/a", "/item/b"}}
272+
fake.deleteItemLockedErrs = 2
273+
withFakeService(t, fake)
274+
275+
ks := setupKeychain(t, nil)
276+
require.NoError(t, ks.Save(t.Context(), store.MustParseID("com.test.test/test/bob"),
277+
&mocks.MockCredential{Username: "bob", Password: "bob-password"}))
278+
279+
assert.Equal(t, []dbus.ObjectPath{"/item/b"}, fake.deletedItems,
280+
"the duplicate must be collapsed once the relock clears")
281+
assert.Equal(t, 3, fake.deleteCalls, "two locked failures then one success")
282+
assert.Equal(t, 2, fake.unlockCalls, "exactly one Unlock per relock retry")
283+
}
284+
285+
// TestKeychainSaveStopsRetryingAfterMaxRelocks asserts the retry is bounded: a
286+
// persistently locked collection surfaces the locked error to the caller instead
287+
// of looping forever.
288+
func TestKeychainSaveStopsRetryingAfterMaxRelocks(t *testing.T) {
289+
stubRelockSleep(t)
290+
fake := &fakeService{} // no items -> create path
291+
fake.createItemLockedErrs = 1 << 30 // never recovers
292+
withFakeService(t, fake)
293+
294+
ks := setupKeychain(t, nil)
295+
err := ks.Save(t.Context(), store.MustParseID("com.test.test/test/bob"),
296+
&mocks.MockCredential{Username: "bob", Password: "bob-password"})
297+
require.Error(t, err)
298+
assert.True(t, isLockedDBusError(err), "the persistent locked error must reach the caller")
299+
assert.Equal(t, maxRelockRetries+1, fake.createCalls, "initial attempt plus the bounded retries")
300+
}
301+
183302
// The real-keychain dedup tests use their own service group/name so their items
184303
// are namespace-isolated from TestKeychain (which shares com.test.test/test).
185304
// GetAllMetadata/Filter search by {service:group, service:name}, so a leaked
@@ -293,7 +412,14 @@ func seedRealDuplicates(t *testing.T, serviceGroup, serviceName string, id store
293412
safelySetMetadata(serviceGroup, serviceName, attrs)
294413
safelySetID(id, attrs)
295414

296-
_, err = svc.CreateItem(collection, kc.NewSecretProperties(label, attrs), sessSecret, kc.ReplaceBehaviorDoNotReplace)
415+
// Seed directly against the daemon, but stay relock-aware: a prior op's
416+
// closing connection can relock the collection between the unlock above
417+
// and this create (see withRelockRetry), which would otherwise fail the
418+
// seed with "Cannot create an item in a locked collection".
419+
err = withRelockRetry(svc, collection, func() error {
420+
_, createErr := svc.CreateItem(collection, kc.NewSecretProperties(label, attrs), sessSecret, kc.ReplaceBehaviorDoNotReplace)
421+
return createErr
422+
})
297423
require.NoError(t, err)
298424
}
299425
}
@@ -318,7 +444,9 @@ func purgeRealItems(t *testing.T, serviceGroup, serviceName string, id store.ID)
318444
items, err := svc.SearchCollection(collection, attrs)
319445
require.NoError(t, err)
320446
for _, item := range items {
321-
require.NoError(t, svc.DeleteItem(item))
447+
require.NoError(t, withRelockRetry(svc, collection, func() error {
448+
return svc.DeleteItem(item)
449+
}))
322450
}
323451
}
324452

0 commit comments

Comments
 (0)