Skip to content

feat: add cacheMaxAgeFallback option for graceful JWKS degradation during outages#502

Open
cschetan77 wants to merge 1 commit into
masterfrom
feat/cache-resiliency
Open

feat: add cacheMaxAgeFallback option for graceful JWKS degradation during outages#502
cschetan77 wants to merge 1 commit into
masterfrom
feat/cache-resiliency

Conversation

@cschetan77
Copy link
Copy Markdown
Contributor

Description

When the JWKS endpoint is unavailable and the cache for a kid has expired, the library currently hard-evicts the key and begins rejecting all tokens — causing a full service outage even for tokens that remain valid.

This PR introduces a cacheMaxAgeFallback option: a second TTL that defines how long beyond cacheMaxAge the library will continue serving a stale cached key when the AS is unreachable.

Changes

  • src/wrappers/cache.js: wraps the memoizer's load function with stale fallback logic, maintains a bounded LRUCache (same max as main cache) for stale entries.
  • index.d.ts: adds cacheMaxAgeFallback?: number to OptionsBase
  • package.json: adds lru-cache ^11.0.0 as a direct dependency (was previously only a transitive dep via lru-memoizer)

Working

  • Normal cache behavior (cacheMaxAge) is unchanged.
  • When a cache entry expires and the refresh fetch fails, the SDK checks if a stale copy exists and whether Date.now() - lastFetchedAt < cacheMaxAge + cacheMaxAgeFallback.
  • If within the fallback window -> serve the stale key and log a warning.
  • If beyond the fallback window -> throw as before.
  • cacheMaxAgeFallback is undefined by default, opt-in only, no change to existing behavior.

Security consideration

This is an explicit availability vs. security tradeoff. In a key compromise scenario where the AS rotates keys and goes briefly unreachable simultaneously, stale keys would continue to be trusted for the duration of the fallback window. For this reason, cacheMaxAgeFallback has no default, the callers must set it consciously based on their threat model and expected AS recovery.

…resilient when JWKS endpoint is down and cache for kid has expired
@cschetan77 cschetan77 requested a review from a team as a code owner April 29, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant