fix: access-key token expiry parsed as relative — tokens never refresh (CIP-3233); release 2.2.4#408
Conversation
…-auth (CIP-3233) stack-auth's AccessKeyRefresher computed expires_at as `now + auth_resp.expiry`, but CTS /api/authorise returns `expiry` as an ABSOLUTE Unix epoch (the JWT `exp` claim), not a relative duration. The sum landed ~decades in the future, so AutoRefresh never considered the token expired and never refreshed it; ZeroKMS enforced the real ~15-min exp, so encrypt/decrypt failed ~15 min after startup until the pod restarted. Use the value as-is: `expires_at: auth_resp.expiry`. Also corrects the access-key test fixtures, which mocked `expiry` as a small relative value (e.g. 3600) and thereby hid the bug — they now model an absolute epoch (now + N) like the real CTS. Adds a regression test asserting an absolute `expiry` yields expires_in ~= the intended TTL (fails under the pre-fix `now + expiry` arithmetic). This is the actual root cause of the customer's 15-minute failures; the 2.2.3 CancelGuard backport (CIP-3159) is unrelated hardening and did not help. Confirmed against a live production token: response.expiry == JWT exp (absolute), exp - iat == 900.
Patch release carrying the access-key token-expiry fix (CIP-3233): bump workspace version 2.2.3 -> 2.2.4 and promote the Unreleased CHANGELOG entry to [2.2.4].
📝 WalkthroughWalkthrough
ChangesAccess Key Token Expiry Fix and v2.2.4 Release
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Fixes the actual root cause of the customer's "ZeroKMS auth fails ~15 minutes after startup" issue, which the 2.2.3 CancelGuard backport (CIP-3159) did not resolve. Ships as 2.2.4.
Linear: CIP-3233.
NOTE: this fix is for the vendored
stack-auth- the permanent fix is in https://github.com/cipherstash/cipherstash-suite/pull/2036Root cause
stack-auth's access-key refresher computed the token's local expiry asnow + auth_resp.expiry. But CTS/api/authorisereturnsexpiryas an absolute Unix epoch — it is literally the JWTexpclaim — not a relative duration. The sum lands ~decades in the future, soAutoRefreshnever considers the token expired and never refreshes it. ZeroKMS enforces the JWT's real ~15-minuteexp, so every encrypt/decrypt 401s ~15 minutes after a process starts and stays broken until restart.Confirmed against a live production token:
Pre-fix, stack-auth computed
expires_at = 1781743221 + 1781744121 ≈ year 2083.The fix
(The OAuth path is untouched — it correctly uses the relative
expires_in.)Tests
expiryas a small relative value (3600), which is exactly what hid the bug. They now model an absolute epoch (now + N) like the real CTS.access_key_expiry_is_absolute_epoch_not_relative): mocks an absoluteexpiryand asserts the resultingexpires_in()≈ the intended TTL. Verified it fails under the pre-fixnow + expiryarithmetic (expires_in()≈ 1.7e9) and passes with the fix.cargo check -p cipherstash-proxyclean. Release version/changelog drift guard satisfied locally.Notes
stack-auth(HEAD/0.37.0) and affects any long-running access-key consumer. Tracked in CIP-3233 with a cross-repo bump checklist; an upstream fix PR follows. Once Proxy moves to acipherstash-clientbuilt against fixed stack-auth, drop thevendor/stack-auth+[patch.crates-io]workaround.Summary by CodeRabbit
Bug Fixes
Chores