feat(auth): forward access token via Bearer header + stale-token re-fetch#53
Merged
Conversation
- Add shared-storage RWX PVC volume support with per-group subPaths mounted at /shared/<group> inside user pods - Add init container that creates group directories with chmod 2775 (setgid bit) so new files inherit the group and are group-writable - Add libnss_wrapper.so configuration so whoami/id report the real username instead of 'jovyan', with NB_UMASK=0002 - Refactor pre_spawn_hook into focused single-responsibility functions: _get_user_groups, _setup_shared_storage, _setup_nss_wrapper - Orchestrator _pre_spawn_hook chains Nebi auth, shared storage, and NSS wrapper; always registered (NSS runs even without shared storage) - Add sharedStorage.groups allowlist and mountPathPrefix values - Add jupyterhub.custom.shared-storage-* config keys
- gid default was 100 but z2jh sets pod securityContext GID to 1000; add jovyan:x:1000: group entry so 'groups' command resolves the name - when shared PVC is disabled, mkdir -p ~/shared instead of removing it so users always see the directory regardless of storage configuration
- EnvoyOIDCAuthenticator now stores parsed groups in auth_state so the spawner can read them at spawn time (JupyterHub groups table is empty when manage_groups is not enabled) - refresh_user also re-parses groups from the refreshed IdToken to keep auth_state current - _pre_spawn_hook always resolves user groups, not only when shared PVC is enabled - _setup_nss_wrapper creates local ~/shared/<group> dirs per group when no shared PVC is configured, so users always see their group dirs
Deploys quay.io/nebari/volume-nfs backed by a single RWO PVC and re-exports
it as RWX NFS, enabling shared group directories on providers like Hetzner
that only provide ReadWriteOnce storage (hcloud-volumes).
- templates/nfs-server.yaml: NFS Deployment, Service, backend RWO PVC
- templates/shared-pvc.yaml: StorageClass + PV (NFS path) + PVC when
nfsServer.enabled; falls back to external RWX PVC otherwise
- values.yaml: sharedStorage.nfsServer.{enabled,storageClass,image} fields
k3s worker nodes on minimal OS images (Hetzner) ship without nfs-common, causing NFS PV mounts to fail with 'bad option'. The DaemonSet uses nsenter to install nfs-common on the host via apt-get, skipping if already present. Gated on sharedStorage.nfsServer.installClient (default false).
… try-except _get_user_groups accessed spawner.user.groups (SQLAlchemy lazy-loaded relationship) from an async pre_spawn_hook, causing DetachedInstanceError which silently aborted _setup_shared_storage and _setup_nss_wrapper. Groups are now read only from auth_state (stored by EnvoyOIDCAuthenticator). Each step is individually wrapped in try-except so failures are logged and don't prevent subsequent steps from running.
I1: set c.KubeSpawner.fs_gid=100 explicitly so shared dir file ownership
is deterministic (GID 100 = users group) rather than relying on z2jh default
I2: add Helm validation in _helpers.tpl that fails at template time if
sharedStorage.enabled and jupyterhub.custom.shared-storage-enabled diverge
I3: use Path(g).name like classic Nebari so /projects/myproj -> myproj,
not projects/myproj; deduplicate groups to prevent duplicate mountPaths
I4: add nodeSelector/nodeAffinity support to NFS server Deployment so
deployers can pin it to worker nodes and avoid slow RWO PVC reattachment
I7: add argocd.argoproj.io/sync-options: Prune=false to StorageClass and
PersistentVolume to prevent accidental deletion during ArgoCD force sync
C1: add chown 0:100 before chmod 2775 in initialize-shared-mounts init
container so shared dirs are explicitly owned by GID 100 (users)
C2: use printf instead of echo '...' for NSS file writes to safely handle
special characters in usernames without shell quoting issues
C3: deduplicate groups in _get_user_groups (via Path.name already handles
most cases; added explicit dedup set for belt-and-suspenders)
M1: log exception with exc_info=True in refresh_user JWT parse failure
N3: merge into existing lifecycle_hooks instead of replacing; warn if
a postStart hook already exists before overwriting
Logging: added comprehensive info/debug/warning logging throughout all
pre-spawn hook functions for both happy and failure paths
This reverts commit 56c22cf.
Exposes a profile selector in JupyterHub matching the classic Nebari experience. Profiles are defined under jupyterhub.custom.profiles in values.yaml and passed directly to c.KubeSpawner.profile_list via get_config(). Default profiles: - Small: 1 CPU / 2 GB RAM (default) - Medium: 4 CPU / 8 GB RAM kubespawner_override accepts any KubeSpawner trait so GPU profiles, custom images, and node selectors work without code changes in the future. When profiles list is empty, no selector is shown (single-instance mode).
Update default profile display_name and description to be more user-friendly (e.g. "Small Instance" with "Stable environment with 1 CPU / 2 GB RAM" instead of just "Small" / "1 CPU / 2 GB RAM").
fix: add descriptive names to default server profiles
# Conflicts: # templates/_helpers.tpl # values.yaml
Profiles feature (#31) is out of scope for this PR. Moved to local branch feat/jupyterlab-profiles for a follow-up PR.
Replaces the inline-bash test workflow with a pytest-based suite that manages the k3d cluster, helm install, and pod-wait lifecycle. Conftest exposes a 'cluster' session fixture and a 'hub_url' fixture that port-forwards proxy-public. CI runs uvx pytest tests/e2e -v with PYTHONUNBUFFERED=1 so live logs stream into the workflow output. Locally: uvx pytest tests/e2e -v # fresh cluster K3D_CLUSTER=k3d-nebari-dev uvx pytest tests/e2e -v # reuse
Switches the e2e harness to kind (k3d's busybox-on-scratch nodes lack a package manager, so the chart's nfs-common installer DaemonSet can't provision NFS client tools). New tests in tests/e2e/test_shared_storage.py exercise the full PR #30 spawn path against a real cluster: - test_user_in_group_can_write: alice-data writes /shared/data/... - test_shared_dir_is_group_owned: dir mode is 2775 (setgid) The DummyAuth shim in tests/e2e/fixtures/test-values.yaml maps the login username to user+groups (alice-data -> alice in [data]) so we can fake auth_state without running Keycloak. Everything else (spawner hook, init container, NSS wrapper, NFS mount) is real. Chart changes: - sharedStorage.nfsServer.mountOptions added (default []). Tests pass [nfsvers=3] because kind nodes use overlayfs which fails the volume-nfs image's NFSv4 root export. Production unchanged. Conftest infrastructure: - kind cluster fixture with KIND_KEEP=1 reuse - hosts-entry workaround so kubelet's host mount.nfs can resolve the cluster-internal NFS service FQDN (kind nodes have no cluster DNS in their host resolv.conf) - structured logging + step counter + per-cycle pod state, events from kubectl describe, and node-level kubelet journal lines - autouse failure-dump fixture (kubectl get pods/events + hub and singleuser logs)
Conftest had grown to 577 lines mixing five concerns. Extracted into focused modules each with a small interface and large hidden impl: _process.py subprocess + kubectl helpers + step counter _hub.py HubClient (cookie/login/spawn/stop session) _pod_observer.py wait_for_pod_ready + dedup'd pod-state polling _cluster.py kind lifecycle + helm install + NFS hosts workaround conftest.py shrinks to 218 lines holding only fixtures that compose the modules. Eliminates duplication of: - cookie-jar login flow (was repeated in _login_and_spawn + _stop_server) - two parallel subprocess wrappers (_run + _kctl) - inline pod-state polling loop in the spawn flow Tests still pass locally (3 passed in 35s, cluster reused).
prePuller hooks pre-pull singleuser images on every node before helm install completes. On a single-node test cluster this is pure overhead (~30s of blocking wait). Disable in test-values.yaml. kindest/node image pull was the largest variable cost in CI: 9s on a fast runner, 130s on a slow one. Cache it as a docker tarball keyed on the kind version so subsequent runs are deterministic and fast. Expected: total CI time drops from variable 3-5min to ~90-120s.
Previous heuristic used `[ -n "$img" ] && docker save` which exits 1 when grep finds no image, killing the whole step. Hardcode the v1.32.2 tag (fixed by kind v0.27.0) and use plain commands so set -e only triggers on real failures.
GH Actions ubuntu-latest runners come with kindest/node preinstalled (tagged "<none>"). The actual image fetch on cache miss was only ~12s because docker just verifies the digest. The cache step was earning ~5s in the best case and breaking the workflow when docker save couldn't find the v1.32.2 tag (image is referenced by digest, not tag). prePuller-disable change is keeping its ~30s saving — sufficient win without the cache complexity.
9 tests (was 2) covering the per-group /shared/<group> contract end-to-end:
- dir is root:users 2775 (parametrized over groups)
- pod is member of users group; NB_UMASK=0002 in env
- new files inherit gid=100 mode 0664; new subdirs gid=100 mode 02775
(setgid propagation)
- multi-group user sees + writes every group dir
- user does not see groups they don't belong to (mount-time isolation)
- file written by one user is readable + appendable by a groupmate
from a separate pod (cross-user collaboration)
Conftest adds PathStat + SpawnedUser.stat()/path_exists() so tests assert
against typed fields (mode/uid/gid) instead of parsing stat strings —
keeps tests short and behavior-focused.
Singleuser image (multi-GB) is currently pulled by kubelet inside the kind node on first user spawn, costing ~73s of every CI run. Pull it once on the runner host, save as tar, cache it (key = image ref so a values.yaml bump auto-invalidates), and side-load with `kind load image-archive`. Pre-create the cluster in the workflow so the side-load happens before any pod is scheduled — the pytest fixture's ensure_cluster() reuses the existing cluster. Cache hit: skips the ~90s registry pull entirely; only kind-load (~20s) remains. Cache miss: pull + save once (~120s), then every subsequent run benefits.
…er NFS as transitional Addresses comment on issue #29: the bundled `nfsServer.enabled=true` path relies on `quay.io/nebari/volume-nfs:0.8-repack`, a manifest-schema repack of an abandoned upstream image (nebari-dev/nebari-docker-images#230). We should not be carrying that workaround image as the recommended path for a greenfield chart. The chart already supported bringing your own RWX StorageClass; this change makes that path the documented primary: - values.yaml: reframe the sharedStorage block. Recommend an external RWX class with provider-specific examples (Longhorn, EFS, Filestore, Azure Files, nfs-subdir-external-provisioner). Add a deprecation note on the nfsServer.image block linking to issue #29. - README: add a "Shared Storage" section with the same matrix and an explicit pointer to the issue tracking removal of the in-cluster NFS path. No template changes — the external-RWX path was already rendered when nfsServer.enabled=false. Verified via `helm template` that setting only `sharedStorage.storageClass=longhorn` produces a single RWX PVC and no nfs-server pod.
…ovisioned Previous commit listed EFS/Filestore/Azure Files as recommended RWX backends. NIC does not provision those — they are separate cloud-managed services no one in NIC has wired up. NIC's actual storage reality: hetzner : longhorn (longhorn.Install in pkg/provider/hetzner) aws : longhorn (longhorn.Install in pkg/provider/aws) existing : longhorn (longhorn.Install in pkg/provider/existing) gcp : standard-rwo (no RWX provisioned) azure : managed-csi (no RWX provisioned) local : (no storage layer) So the accurate recommendation is just longhorn. Updated values.yaml and README to say so directly. The in-cluster NFS fallback stays — it covers the providers where NIC has not yet wired up an RWX class — with a pointer to issue #29 for tracking removal once that lands everywhere.
EnvoyOIDCAuthenticator stores no refresh_token (Envoy keeps only access_token + id_token in cookies) and the access_token lifetime is ~5 minutes. jhub-apps calls the env-listing callable on every Create App page render, often well after the token captured at login has expired, producing `token-exchange step 2 FAILED: HTTP 400 invalid_request "Invalid token"` and a silent empty selector. Mirror 01-spawner.py: when access_token has <30s remaining, re-fetch auth_state via the hub API (which refresh_user keeps current with fresh Envoy cookies on browser activity) before exchanging.
The temporary nebari-dev/jhub-apps#676 git pin (bearer fall-through + Starlette 1.0 TemplateResponse fix) plus pyjwt>=2.10 floor were only needed while Envoy was the OAuth client and could inject an RS256 Bearer that confused jhub-apps. With hub doing its own OAuth, Envoy no longer injects to /services/japps/* and neither workaround is needed. - jhub-apps: git@5d86277 -> ==2025.11.1 (conda-forge release) - pyjwt: >=2.10,<3 -> >=2.9,<2.10 (matches jhub-apps 2025.11.1 constraint) Lock regenerated via pixi 0.68.1 in a linux/arm64 container.
96d48c1 to
161ab2d
Compare
…ps 2025.11.1) Carries the GenericOAuthenticator switch (config/jupyterhub/00-gateway-auth.py) and jhub-apps 2025.11.1 release from conda-forge. Digest: sha256:e9b481657f34c16b367d402ca1cce79ac64b177dc9eba48f85f35be363958126
Without explicit scope, GenericOAuthenticator sends no scope= param in the authorize redirect; KC then issues a token that lacks the openid scope, and /userinfo returns 403 at token_to_user. Symptom: 500 on /hub/oauth_callback after the user signs in at KC. Add a unit test that fails if openid drops out of the scope list.
Digest: sha256:4be08f31306c4da35ceccc390688e02947f02d7eab5fbd1efddca90af8bd00fb
…Response Starlette 1.0 reordered TemplateResponse positional args to (request, name, ...); jhub-apps 2025.11.1 still calls the 2-arg form, which makes /services/japps/create-app 500 at handle_apps. Pin to the last 0.x release until jhub-apps ships a fix.
Digest: sha256:8e007b6dc55ffe5f451d016610f0733462de323df5bb2af3235a6f17b22e5ddf
…tch) Tested end-to-end on hetzner via Playwright headless: - /hub/oauth_callback returns 302 to /hub/home (no 500) - /services/japps/create-app renders (starlette<1 cap) - /services/japps/conda-environments/ returns 200 - After 6-min idle, refresh_token grant fires, token stays fresh - 3-step KC -> Nebi token exchange succeeds end-to-end
Two UX fixes: 1. auto_login=True Hub now 302s /hub/login directly to KC instead of rendering the local form with a 'Sign in with OAuth 2.0' link. Single IdP — no point making the user click through. 2. KeyCloakLogoutHandler KC v18+ rejects /protocol/openid-connect/logout when post_logout_redirect_uri is present without id_token_hint. The static logout_redirect_url can't include it (per-user), so install a handler that reads auth_state.id_token at request time and builds the URL. Falls back to no-hint URL if auth_state is missing (legacy session).
Digest: sha256:93f7139b8775b7a22ac4db313583c83cded449ac77302229e1060f27bce3d6c1
Base LogoutHandler.get() short-circuits to authenticator.logout_redirect_url when auto_login=True, so the prior override of render_logout_page never fired. Move the per-user URL building into get() itself, with default_handle_logout + handle_logout still called so hub's local session state is cleared.
…dler Authenticator-supplied handlers are appended after jupyterhub's defaults in init_handlers, so tornado's first-match routing picks the default LogoutHandler at /logout — our override via get_handlers is a dead route. Monkey-patch the base LogoutHandler.get instead.
OAuthenticator.get_handlers reads the class-level logout_handler attribute when registering the /logout route. Swap it to our subclass (class attr on KeyCloakOAuthenticator) instead of monkey-patching LogoutHandler.get or duplicating the /logout entry — the latter just appends a second tuple after oauthenticator's own (r'/logout', OAuthLogoutHandler), and tornado's first-match keeps picking the base class. KeyCloakLogoutHandler subclasses OAuthLogoutHandler and overrides render_logout_page (not get) so the inherited LogoutHandler.get still runs default_handle_logout + handle_logout (token revocation, cookie clear). For that to happen, authenticator.logout_redirect_url is left empty — otherwise LogoutHandler.get short-circuits when auto_login is True and never calls render_logout_page.
Traitlets' config loader rejects unknown attribute names with a warning and never sets the value, so c.KeyCloakOAuthenticator._kc_end_session_url was a no-op — _kc_end_session_url stayed empty on the class default, making the logout URL relative and causing a redirect loop.
… cleared LogoutHandler.get sets self._jupyterhub_user = None BEFORE calling render_logout_page (jupyterhub/handlers/login.py:89), so reading auth_state from render_logout_page always sees current_user=None. Move the id_token capture into get() before the cleanup runs.
KubeSpawner's default `pvc_name_template` for a *named* server is
`claim-{username}--{servername}`, but the chart's home volume mount is
hardcoded to `claim-{username}` (so all of a user's servers share a single
RWO PVC, co-located on one node via the pod-affinity rule).
Without an explicit override the names diverge: KubeSpawner creates a fresh
per-server PVC and the pod tries to mount a different per-user PVC. Users
who'd previously launched the default JupyterLab server still had the
per-user PVC sitting around and survived; fresh users (e.g. anyone who
first interacts with the platform via jhub-apps Create App) hit
FailedScheduling: 'persistentvolumeclaim claim-<user> not found' and the
pod sits Pending until the hub spawn-timeout (5 min) fires.
Lock the template to `claim-{username}` so ensure + mount converge.
Test in tests/unit/test_spawner_storage.py.
…state
The earlier switch to KeyCloakOAuthenticator (GenericOAuthenticator) set
`auth_refresh_age = 240`, expecting JupyterHub to keep auth_state fresh
via its built-in refresh_user. But JupyterHub's Authenticator.refresh_user
is a no-op stub (returns True) and oauthenticator's GenericOAuthenticator
does not override it. So auth_state.refresh_token stays frozen at
OAuth-callback time and expires after KC's SSO idle timeout (~30 min by
default), at which point nebi-envs's 3-step token exchange fails at
step 1 with:
invalid_grant: Token is not active
and the jhub-apps Create-App "Software Environment" dropdown silently
disappears (env list is empty when the exchange fails).
Implement refresh_user on KeyCloakOAuthenticator: POST grant_type=
refresh_token to KC's token endpoint, persist the rotated tokens back
to auth_state via the {"auth_state": ...} return shape, return False on
invalid_grant to force re-login, and return True (no-op) on transient
HTTP errors.
Tests in tests/unit/test_refresh_user.py cover the four return-shape
contracts: success, invalid_grant, transient error, no-refresh-token.
z2jh's hub waits ~10s for each managed service's HTTP port to bind. The
default jhub-apps service_workers is 4 and four uvicorn workers take
~12s to bind on CI runners, so hub crashes with
Cannot connect to managed service japps at http://hub:10202
restarts, hits the same timeout, restarts, etc. By the time the e2e
fixture's port-forward starts polling /hub/login, hub is still in this
crash-loop. The first urlopen with timeout=15s eventually raises
TimeoutError unwrapped through HubClient._request (which only catches
HTTPError), aborting every test fixture in setup.
Mirror the production overlay (gitops/apps/data-science-pack.yaml in
openteams-ai/nebari-hetzner) which pins service_workers: 1. The hub
boots cleanly within seconds and the e2e suite proceeds.
…h_user The `if access_token and not refresh_token` branch in 01-spawner.py's _nebi_pre_spawn_hook and 03-nebi-envs.py's get_nebi_environments was a fallback for EnvoyOIDCAuthenticator's auth_state, which never carried a refresh_token (Envoy only stored access_token + id_token in cookies). With KeyCloakOAuthenticator + the new refresh_user() override, auth_state always has a rotating refresh_token, so the branch never fires. Also remove the now-orphan `_fetch_fresh_auth_state` helper it called, and update two docstrings/comments that still referenced EnvoyOIDCAuthenticator as the source of the groups claim or the hub's OAuth client. Reword values.yaml comment on `forwardAccessToken: false` to drop the "avoids confusing dual-token paths" framing — there is no Envoy-injected Bearer in the current architecture.
The KC migration left two related concerns scattered:
* endpoint URLs (authorize/token/userdata/end_session) derived from
the issuer were assigned individually to traitlets in configure(),
via a small `_kc_urls` dict helper.
* the per-user logout URL had its inputs spread across the free
function `_build_logout_url(end_session_url=, id_token=,
post_logout_redirect_uri=)` and TWO stray class attributes
(`KeyCloakOAuthenticator._kc_end_session_url`,
`KeyCloakOAuthenticator._kc_post_logout_redirect_uri`) that
configure() stashed at startup so the logout handler could read
them at request time.
Replace both with a single `KeyCloakConfig` frozen dataclass that
holds every KC string the chart needs and owns the logout-URL
composition as a method.
* `KeyCloakConfig.build(issuer=..., post_logout_redirect_uri=...)`
derives every endpoint URL from the realm issuer.
* `cfg.build_logout_url(id_token)` composes the end-session URL,
omitting `id_token_hint` for legacy sessions (KC v18+ rejects
logout without it when `post_logout_redirect_uri` is set).
* configure() builds one KeyCloakConfig and stashes it on
`KeyCloakOAuthenticator.kc_config`; the logout handler reads it
via `self.authenticator.kc_config`.
Net effect: one cohesive object instead of two stray class attrs +
one free function + one dict helper. The endpoint derivation becomes
trivially testable in isolation.
Tests updated in test_keycloak_authenticator.py:
* `test_configure_attaches_kc_config_to_authenticator_class`
replaces the pair of stray-attribute assertions.
* `test_kc_config_build_logout_url_*` cover the method directly.
* `test_kc_config_from_issuer_is_pure_and_doesnt_need_configure`
pins the classmethod's pure-function semantics.
HANDOFF*.md files are in-flight working notes between agent sessions; they should never have been checked in. Remove the one that snuck in during the auth saga and add a .gitignore rule so the next one doesn't.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
End-to-end fix for the stale-token chain that broke the jhub-apps env selector and Nebi auto-auth past 5 min after login.
Root cause
Envoy Gateway v1.6 stores the OAuth2 access token in the
AccessToken-*cookie at OIDC login but does not rotate the cookie content when its internal refresh token rotates the access token. The hub reads a frozen-at-login access token from the cookie; downstream calls that need a fresh JWT (jhub-apps env selector → Keycloak Standard Token Exchange v2 → Nebi) fail with400 invalid_request "Invalid token"~5 min after login.Verified mechanism
Confirmed end-to-end against Envoy Gateway v1.6 oidc translator and Envoy proxy v1.36 oauth2 filter:
forwardAccessToken: true⇒ Envoy oauth2 filterforward_bearer_token: true.asyncRefreshAccessToken) and injects the fresh access token asAuthorization: Bearer <token>on the upstream request (finishRefreshAccessTokenFlow → setBearerToken).Changes
templates/nebariapp.yaml— passenforceAtGateway,forwardAccessToken, andtokenExchangethrough to the NebariApp CRD so the chart drives the operator-managed SecurityPolicy.values.yaml— defaults:enforceAtGateway: true,forwardAccessToken: false. Token forwarding stays off by default until consumers (jhub-apps) can tolerate a non-HS256 token in theAuthorization: Bearerheader. See Dependency below.config/jupyterhub/00-gateway-auth.py—_extract_envoy_cookiesprefers theAuthorization: Bearerheader over theAccessToken-*cookie. Header is the only always-current source post-refresh; cookie fallback retained for deployments whereforwardAccessTokenis off.config/jupyterhub/03-nebi-envs.py— defensive: when access_token inauth_stateis expiring (<30s), re-fetch via the hub API. Catches residual cases whererefresh_userhas not run since the last browser request.Dependency / follow-up
Flipping the chart default to
forwardAccessToken: truerequires upstream jhub-apps to stop treating theAuthorization: Bearerheader as guaranteed to be its own HS256 wrapper JWT. Tracked injhub-apps#676: the decoder returnsNonefor non-wrapper tokens (e.g. Keycloak RS256 forwarded by Envoy) andget_current_userfalls through to the cookie. Once #676 ships in a tagged jhub-apps release andimages/jupyterhub/pixi.tomlis bumped, this PR can flip the default totrue.For the "user idles on Create-App page for 6 min without hitting
/hub/*" gap —refresh_useronly fires on hub-routed requests so jhub-apps page traffic alone cannot keep auth_state fresh — see HANDOFF-stale-token.md for the longer-termGenericOAuthenticator-based mitigation.Companion PRs
nebari-operator#116— V2 standard-token-exchange attribute + audience mapper on peer clients.jhub-apps#676— fall-through to cookie when Bearer is not the jhub-apps wrapper.