Skip to content

fix: access-key token expiry parsed as relative — tokens never refresh (CIP-3233); release 2.2.4#408

Merged
freshtonic merged 2 commits into
mainfrom
james/cip-3233-access-key-expiry
Jun 18, 2026
Merged

fix: access-key token expiry parsed as relative — tokens never refresh (CIP-3233); release 2.2.4#408
freshtonic merged 2 commits into
mainfrom
james/cip-3233-access-key-expiry

Conversation

@freshtonic

@freshtonic freshtonic commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the actual root cause of the customer's "ZeroKMS auth fails ~15 minutes after startup" issue, which the 2.2.3 CancelGuard backport (CIP-3159) did not resolve. Ships as 2.2.4.

Linear: CIP-3233.

NOTE: this fix is for the vendored stack-auth - the permanent fix is in https://github.com/cipherstash/cipherstash-suite/pull/2036

Root cause

stack-auth's access-key refresher computed the token's local expiry as now + auth_resp.expiry. But CTS /api/authorise returns expiry as an absolute Unix epoch — it is literally the JWT exp claim — not a relative duration. The sum lands ~decades in the future, so AutoRefresh never considers the token expired and never refreshes it. ZeroKMS enforces the JWT's real ~15-minute exp, so every encrypt/decrypt 401s ~15 minutes after a process starts and stays broken until restart.

Confirmed against a live production token:

response.expiry = 1781744121   == JWT exp = 1781744121   (absolute epoch)
JWT iat         = 1781743221
exp - iat       = 900          (the default token lifetime)

Pre-fix, stack-auth computed expires_at = 1781743221 + 1781744121 ≈ year 2083.

The fix

// vendor/stack-auth/src/access_key_refresher.rs
- expires_at: now + auth_resp.expiry,
+ expires_at: auth_resp.expiry,        // CTS `expiry` is already an absolute epoch

(The OAuth path is untouched — it correctly uses the relative expires_in.)

Tests

  • Corrected the access-key test fixtures. They mocked expiry as a small relative value (3600), which is exactly what hid the bug. They now model an absolute epoch (now + N) like the real CTS.
  • Added a regression test (access_key_expiry_is_absolute_epoch_not_relative): mocks an absolute expiry and asserts the resulting expires_in() ≈ the intended TTL. Verified it fails under the pre-fix now + expiry arithmetic (expires_in() ≈ 1.7e9) and passes with the fix.
  • Full vendored crate suite: 107 passed. cargo check -p cipherstash-proxy clean. Release version/changelog drift guard satisfied locally.

Notes

  • The 2.2.3 CancelGuard change stays — it's legitimate hardening for a real (but different) cancellation race; it just wasn't this bug.
  • This bug is live upstream in cipherstash-suite stack-auth (HEAD/0.37.0) and affects any long-running access-key consumer. Tracked in CIP-3233 with a cross-repo bump checklist; an upstream fix PR follows. Once Proxy moves to a cipherstash-client built against fixed stack-auth, drop the vendor/stack-auth + [patch.crates-io] workaround.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed ZeroKMS authentication failures occurring approximately 15 minutes after startup. Access tokens now renew correctly before expiry, eliminating the need for manual restart recovery.
  • Chores

    • Version bumped to v2.2.4

…-auth (CIP-3233)

stack-auth's AccessKeyRefresher computed expires_at as `now + auth_resp.expiry`, but CTS /api/authorise returns `expiry` as an ABSOLUTE Unix epoch (the JWT `exp` claim), not a relative duration. The sum landed ~decades in the future, so AutoRefresh never considered the token expired and never refreshed it; ZeroKMS enforced the real ~15-min exp, so encrypt/decrypt failed ~15 min after startup until the pod restarted.

Use the value as-is: `expires_at: auth_resp.expiry`. Also corrects the access-key test fixtures, which mocked `expiry` as a small relative value (e.g. 3600) and thereby hid the bug — they now model an absolute epoch (now + N) like the real CTS. Adds a regression test asserting an absolute `expiry` yields expires_in ~= the intended TTL (fails under the pre-fix `now + expiry` arithmetic).

This is the actual root cause of the customer's 15-minute failures; the 2.2.3 CancelGuard backport (CIP-3159) is unrelated hardening and did not help. Confirmed against a live production token: response.expiry == JWT exp (absolute), exp - iat == 900.
Patch release carrying the access-key token-expiry fix (CIP-3233): bump workspace version 2.2.3 -> 2.2.4 and promote the Unreleased CHANGELOG entry to [2.2.4].
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

AccessKeyRefresher::refresh is corrected to assign auth_resp.expiry directly to Token.expires_at instead of computing now + expiry, treating the API field as an absolute Unix epoch. Test mocks are updated accordingly, a regression test is added, and the workspace is bumped to v2.2.4 with a changelog entry.

Changes

Access Key Token Expiry Fix and v2.2.4 Release

Layer / File(s) Summary
Fix expiry arithmetic in AccessKeyRefresher::refresh
vendor/stack-auth/src/access_key_refresher.rs
Removes the std::time::{SystemTime, UNIX_EPOCH} import and rewrites refresh to set Token.expires_at directly from auth_resp.expiry (absolute epoch), replacing the previous now + expiry computation.
Update mocks and add regression test
vendor/stack-auth/src/access_key_refresher.rs
Reworks the /api/authorise mock to emit now + expires_in_secs as the expiry field, adds access_key_expiry_is_absolute_epoch_not_relative asserting ~900 s expires_in() and a non-expired token, and fixes the Axum delayed-auth handler to use now + 3600.
Version bump and changelog
Cargo.toml, CHANGELOG.md
Bumps workspace version from 2.2.3 to 2.2.4, adds the v2.2.4 Fixed entry for the token renewal bug, and updates compare reference links.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 Hoppity-hop through the epoch gate,
No more now + expiry to miscalculate!
The token arrives with its timestamp true,
Renews before midnight — no restart overdue.
Absolute epochs make the bunny smile,
v2.2.4 lands with proper time-style! 🕐

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the core fix: access-key token expiry parsing bug causing tokens to never refresh, with the release version clearly indicated.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch james/cip-3233-access-key-expiry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@freshtonic freshtonic requested review from coderdan and tobyhede June 18, 2026 02:08
@freshtonic freshtonic merged commit 4facf29 into main Jun 18, 2026
6 checks passed
@freshtonic freshtonic deleted the james/cip-3233-access-key-expiry branch June 18, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants