Skip to content

feat(auth): forward access token via Bearer header + stale-token re-fetch#53

Merged
aktech merged 66 commits into
mainfrom
fix/nebi-envs-stale-token
May 15, 2026
Merged

feat(auth): forward access token via Bearer header + stale-token re-fetch#53
aktech merged 66 commits into
mainfrom
fix/nebi-envs-stale-token

Conversation

@aktech

@aktech aktech commented May 6, 2026

Copy link
Copy Markdown
Member

End-to-end fix for the stale-token chain that broke the jhub-apps env selector and Nebi auto-auth past 5 min after login.

Root cause

Envoy Gateway v1.6 stores the OAuth2 access token in the AccessToken-* cookie at OIDC login but does not rotate the cookie content when its internal refresh token rotates the access token. The hub reads a frozen-at-login access token from the cookie; downstream calls that need a fresh JWT (jhub-apps env selector → Keycloak Standard Token Exchange v2 → Nebi) fail with 400 invalid_request "Invalid token" ~5 min after login.

Verified mechanism

Confirmed end-to-end against Envoy Gateway v1.6 oidc translator and Envoy proxy v1.36 oauth2 filter:

  • forwardAccessToken: true ⇒ Envoy oauth2 filter forward_bearer_token: true.
  • On every upstream request, if access token is expired and refresh token is still valid, Envoy transparently refreshes against Keycloak (asyncRefreshAccessToken) and injects the fresh access token as Authorization: Bearer <token> on the upstream request (finishRefreshAccessTokenFlow → setBearerToken).
  • Browser-side cookies are not rotated on background refresh — only on OAuth-callback path. Upstream-side header always carries the fresh token, which is what hub reads.

Changes

  1. templates/nebariapp.yaml — pass enforceAtGateway, forwardAccessToken, and tokenExchange through to the NebariApp CRD so the chart drives the operator-managed SecurityPolicy.
  2. values.yaml — defaults: enforceAtGateway: true, forwardAccessToken: false. Token forwarding stays off by default until consumers (jhub-apps) can tolerate a non-HS256 token in the Authorization: Bearer header. See Dependency below.
  3. config/jupyterhub/00-gateway-auth.py_extract_envoy_cookies prefers the Authorization: Bearer header over the AccessToken-* cookie. Header is the only always-current source post-refresh; cookie fallback retained for deployments where forwardAccessToken is off.
  4. config/jupyterhub/03-nebi-envs.py — defensive: when access_token in auth_state is expiring (<30s), re-fetch via the hub API. Catches residual cases where refresh_user has not run since the last browser request.

Dependency / follow-up

Flipping the chart default to forwardAccessToken: true requires upstream jhub-apps to stop treating the Authorization: Bearer header as guaranteed to be its own HS256 wrapper JWT. Tracked in jhub-apps#676: the decoder returns None for non-wrapper tokens (e.g. Keycloak RS256 forwarded by Envoy) and get_current_user falls through to the cookie. Once #676 ships in a tagged jhub-apps release and images/jupyterhub/pixi.toml is bumped, this PR can flip the default to true.

For the "user idles on Create-App page for 6 min without hitting /hub/*" gap — refresh_user only fires on hub-routed requests so jhub-apps page traffic alone cannot keep auth_state fresh — see HANDOFF-stale-token.md for the longer-term GenericOAuthenticator-based mitigation.

Companion PRs

  • nebari-operator#116 — V2 standard-token-exchange attribute + audience mapper on peer clients.
  • jhub-apps#676 — fall-through to cookie when Bearer is not the jhub-apps wrapper.

aktech and others added 30 commits March 27, 2026 13:29
- Add shared-storage RWX PVC volume support with per-group subPaths
  mounted at /shared/<group> inside user pods
- Add init container that creates group directories with chmod 2775
  (setgid bit) so new files inherit the group and are group-writable
- Add libnss_wrapper.so configuration so whoami/id report the real
  username instead of 'jovyan', with NB_UMASK=0002
- Refactor pre_spawn_hook into focused single-responsibility functions:
  _get_user_groups, _setup_shared_storage, _setup_nss_wrapper
- Orchestrator _pre_spawn_hook chains Nebi auth, shared storage, and
  NSS wrapper; always registered (NSS runs even without shared storage)
- Add sharedStorage.groups allowlist and mountPathPrefix values
- Add jupyterhub.custom.shared-storage-* config keys
- gid default was 100 but z2jh sets pod securityContext GID to 1000;
  add jovyan:x:1000: group entry so 'groups' command resolves the name
- when shared PVC is disabled, mkdir -p ~/shared instead of removing it
  so users always see the directory regardless of storage configuration
- EnvoyOIDCAuthenticator now stores parsed groups in auth_state so the
  spawner can read them at spawn time (JupyterHub groups table is empty
  when manage_groups is not enabled)
- refresh_user also re-parses groups from the refreshed IdToken to keep
  auth_state current
- _pre_spawn_hook always resolves user groups, not only when shared PVC
  is enabled
- _setup_nss_wrapper creates local ~/shared/<group> dirs per group when
  no shared PVC is configured, so users always see their group dirs
Deploys quay.io/nebari/volume-nfs backed by a single RWO PVC and re-exports
it as RWX NFS, enabling shared group directories on providers like Hetzner
that only provide ReadWriteOnce storage (hcloud-volumes).

- templates/nfs-server.yaml: NFS Deployment, Service, backend RWO PVC
- templates/shared-pvc.yaml: StorageClass + PV (NFS path) + PVC when
  nfsServer.enabled; falls back to external RWX PVC otherwise
- values.yaml: sharedStorage.nfsServer.{enabled,storageClass,image} fields
k3s worker nodes on minimal OS images (Hetzner) ship without nfs-common,
causing NFS PV mounts to fail with 'bad option'. The DaemonSet uses nsenter
to install nfs-common on the host via apt-get, skipping if already present.
Gated on sharedStorage.nfsServer.installClient (default false).
… try-except

_get_user_groups accessed spawner.user.groups (SQLAlchemy lazy-loaded
relationship) from an async pre_spawn_hook, causing DetachedInstanceError
which silently aborted _setup_shared_storage and _setup_nss_wrapper.

Groups are now read only from auth_state (stored by EnvoyOIDCAuthenticator).
Each step is individually wrapped in try-except so failures are logged and
don't prevent subsequent steps from running.
I1: set c.KubeSpawner.fs_gid=100 explicitly so shared dir file ownership
    is deterministic (GID 100 = users group) rather than relying on z2jh default

I2: add Helm validation in _helpers.tpl that fails at template time if
    sharedStorage.enabled and jupyterhub.custom.shared-storage-enabled diverge

I3: use Path(g).name like classic Nebari so /projects/myproj -> myproj,
    not projects/myproj; deduplicate groups to prevent duplicate mountPaths

I4: add nodeSelector/nodeAffinity support to NFS server Deployment so
    deployers can pin it to worker nodes and avoid slow RWO PVC reattachment

I7: add argocd.argoproj.io/sync-options: Prune=false to StorageClass and
    PersistentVolume to prevent accidental deletion during ArgoCD force sync

C1: add chown 0:100 before chmod 2775 in initialize-shared-mounts init
    container so shared dirs are explicitly owned by GID 100 (users)

C2: use printf instead of echo '...' for NSS file writes to safely handle
    special characters in usernames without shell quoting issues

C3: deduplicate groups in _get_user_groups (via Path.name already handles
    most cases; added explicit dedup set for belt-and-suspenders)

M1: log exception with exc_info=True in refresh_user JWT parse failure

N3: merge into existing lifecycle_hooks instead of replacing; warn if
    a postStart hook already exists before overwriting

Logging: added comprehensive info/debug/warning logging throughout all
    pre-spawn hook functions for both happy and failure paths
Exposes a profile selector in JupyterHub matching the classic Nebari experience.
Profiles are defined under jupyterhub.custom.profiles in values.yaml and passed
directly to c.KubeSpawner.profile_list via get_config().

Default profiles:
  - Small: 1 CPU / 2 GB RAM (default)
  - Medium: 4 CPU / 8 GB RAM

kubespawner_override accepts any KubeSpawner trait so GPU profiles, custom
images, and node selectors work without code changes in the future.
When profiles list is empty, no selector is shown (single-instance mode).
Update default profile display_name and description to be more
user-friendly (e.g. "Small Instance" with "Stable environment with
1 CPU / 2 GB RAM" instead of just "Small" / "1 CPU / 2 GB RAM").
fix: add descriptive names to default server profiles
# Conflicts:
#	templates/_helpers.tpl
#	values.yaml
Profiles feature (#31) is out of scope for this PR. Moved to local
branch feat/jupyterlab-profiles for a follow-up PR.
Replaces the inline-bash test workflow with a pytest-based suite that
manages the k3d cluster, helm install, and pod-wait lifecycle.
Conftest exposes a 'cluster' session fixture and a 'hub_url' fixture
that port-forwards proxy-public.

CI runs uvx pytest tests/e2e -v with PYTHONUNBUFFERED=1 so live logs
stream into the workflow output.

Locally:
  uvx pytest tests/e2e -v                       # fresh cluster
  K3D_CLUSTER=k3d-nebari-dev uvx pytest tests/e2e -v   # reuse
Switches the e2e harness to kind (k3d's busybox-on-scratch nodes lack a
package manager, so the chart's nfs-common installer DaemonSet can't
provision NFS client tools).

New tests in tests/e2e/test_shared_storage.py exercise the full PR #30
spawn path against a real cluster:
  - test_user_in_group_can_write: alice-data writes /shared/data/...
  - test_shared_dir_is_group_owned: dir mode is 2775 (setgid)

The DummyAuth shim in tests/e2e/fixtures/test-values.yaml maps the
login username to user+groups (alice-data -> alice in [data]) so we
can fake auth_state without running Keycloak. Everything else
(spawner hook, init container, NSS wrapper, NFS mount) is real.

Chart changes:
  - sharedStorage.nfsServer.mountOptions added (default []). Tests
    pass [nfsvers=3] because kind nodes use overlayfs which fails
    the volume-nfs image's NFSv4 root export. Production unchanged.

Conftest infrastructure:
  - kind cluster fixture with KIND_KEEP=1 reuse
  - hosts-entry workaround so kubelet's host mount.nfs can resolve
    the cluster-internal NFS service FQDN (kind nodes have no
    cluster DNS in their host resolv.conf)
  - structured logging + step counter + per-cycle pod state, events
    from kubectl describe, and node-level kubelet journal lines
  - autouse failure-dump fixture (kubectl get pods/events + hub
    and singleuser logs)
Conftest had grown to 577 lines mixing five concerns. Extracted into
focused modules each with a small interface and large hidden impl:

  _process.py     subprocess + kubectl helpers + step counter
  _hub.py         HubClient (cookie/login/spawn/stop session)
  _pod_observer.py wait_for_pod_ready + dedup'd pod-state polling
  _cluster.py     kind lifecycle + helm install + NFS hosts workaround

conftest.py shrinks to 218 lines holding only fixtures that compose the
modules. Eliminates duplication of:
  - cookie-jar login flow (was repeated in _login_and_spawn + _stop_server)
  - two parallel subprocess wrappers (_run + _kctl)
  - inline pod-state polling loop in the spawn flow

Tests still pass locally (3 passed in 35s, cluster reused).
prePuller hooks pre-pull singleuser images on every node before helm
install completes. On a single-node test cluster this is pure overhead
(~30s of blocking wait). Disable in test-values.yaml.

kindest/node image pull was the largest variable cost in CI: 9s on a
fast runner, 130s on a slow one. Cache it as a docker tarball keyed on
the kind version so subsequent runs are deterministic and fast.

Expected: total CI time drops from variable 3-5min to ~90-120s.
Previous heuristic used `[ -n "$img" ] && docker save` which exits 1
when grep finds no image, killing the whole step. Hardcode the
v1.32.2 tag (fixed by kind v0.27.0) and use plain commands so set -e
only triggers on real failures.
GH Actions ubuntu-latest runners come with kindest/node preinstalled
(tagged "<none>"). The actual image fetch on cache miss was only ~12s
because docker just verifies the digest. The cache step was earning ~5s
in the best case and breaking the workflow when docker save couldn't
find the v1.32.2 tag (image is referenced by digest, not tag).

prePuller-disable change is keeping its ~30s saving — sufficient win
without the cache complexity.
9 tests (was 2) covering the per-group /shared/<group> contract end-to-end:

  - dir is root:users 2775 (parametrized over groups)
  - pod is member of users group; NB_UMASK=0002 in env
  - new files inherit gid=100 mode 0664; new subdirs gid=100 mode 02775
    (setgid propagation)
  - multi-group user sees + writes every group dir
  - user does not see groups they don't belong to (mount-time isolation)
  - file written by one user is readable + appendable by a groupmate
    from a separate pod (cross-user collaboration)

Conftest adds PathStat + SpawnedUser.stat()/path_exists() so tests assert
against typed fields (mode/uid/gid) instead of parsing stat strings —
keeps tests short and behavior-focused.
Singleuser image (multi-GB) is currently pulled by kubelet inside the
kind node on first user spawn, costing ~73s of every CI run. Pull it
once on the runner host, save as tar, cache it (key = image ref so a
values.yaml bump auto-invalidates), and side-load with `kind load
image-archive`. Pre-create the cluster in the workflow so the side-load
happens before any pod is scheduled — the pytest fixture's
ensure_cluster() reuses the existing cluster.

Cache hit: skips the ~90s registry pull entirely; only kind-load (~20s)
remains. Cache miss: pull + save once (~120s), then every subsequent run
benefits.
…er NFS as transitional

Addresses comment on issue #29: the bundled `nfsServer.enabled=true` path
relies on `quay.io/nebari/volume-nfs:0.8-repack`, a manifest-schema repack
of an abandoned upstream image (nebari-dev/nebari-docker-images#230). We
should not be carrying that workaround image as the recommended path for
a greenfield chart.

The chart already supported bringing your own RWX StorageClass; this
change makes that path the documented primary:

  - values.yaml: reframe the sharedStorage block. Recommend an external
    RWX class with provider-specific examples (Longhorn, EFS, Filestore,
    Azure Files, nfs-subdir-external-provisioner). Add a deprecation
    note on the nfsServer.image block linking to issue #29.

  - README: add a "Shared Storage" section with the same matrix and an
    explicit pointer to the issue tracking removal of the in-cluster
    NFS path.

No template changes — the external-RWX path was already rendered when
nfsServer.enabled=false. Verified via `helm template` that setting only
`sharedStorage.storageClass=longhorn` produces a single RWX PVC and no
nfs-server pod.
…ovisioned

Previous commit listed EFS/Filestore/Azure Files as recommended RWX
backends. NIC does not provision those — they are separate cloud-managed
services no one in NIC has wired up. NIC's actual storage reality:

  hetzner   : longhorn (longhorn.Install in pkg/provider/hetzner)
  aws       : longhorn (longhorn.Install in pkg/provider/aws)
  existing  : longhorn (longhorn.Install in pkg/provider/existing)
  gcp       : standard-rwo (no RWX provisioned)
  azure     : managed-csi (no RWX provisioned)
  local     : (no storage layer)

So the accurate recommendation is just longhorn. Updated values.yaml and
README to say so directly. The in-cluster NFS fallback stays — it covers
the providers where NIC has not yet wired up an RWX class — with a
pointer to issue #29 for tracking removal once that lands everywhere.
EnvoyOIDCAuthenticator stores no refresh_token (Envoy keeps only
access_token + id_token in cookies) and the access_token lifetime is
~5 minutes. jhub-apps calls the env-listing callable on every Create
App page render, often well after the token captured at login has
expired, producing
`token-exchange step 2 FAILED: HTTP 400 invalid_request "Invalid token"`
and a silent empty selector.

Mirror 01-spawner.py: when access_token has <30s remaining, re-fetch
auth_state via the hub API (which refresh_user keeps current with
fresh Envoy cookies on browser activity) before exchanging.
The temporary nebari-dev/jhub-apps#676 git pin (bearer fall-through +
Starlette 1.0 TemplateResponse fix) plus pyjwt>=2.10 floor were only
needed while Envoy was the OAuth client and could inject an RS256
Bearer that confused jhub-apps. With hub doing its own OAuth, Envoy no
longer injects to /services/japps/* and neither workaround is needed.

- jhub-apps: git@5d86277 -> ==2025.11.1 (conda-forge release)
- pyjwt: >=2.10,<3 -> >=2.9,<2.10 (matches jhub-apps 2025.11.1 constraint)

Lock regenerated via pixi 0.68.1 in a linux/arm64 container.
@aktech aktech force-pushed the fix/nebi-envs-stale-token branch from 96d48c1 to 161ab2d Compare May 14, 2026 21:20
aktech added 24 commits May 14, 2026 22:25
…ps 2025.11.1)

Carries the GenericOAuthenticator switch (config/jupyterhub/00-gateway-auth.py)
and jhub-apps 2025.11.1 release from conda-forge.

Digest: sha256:e9b481657f34c16b367d402ca1cce79ac64b177dc9eba48f85f35be363958126
Without explicit scope, GenericOAuthenticator sends no scope= param in
the authorize redirect; KC then issues a token that lacks the openid
scope, and /userinfo returns 403 at token_to_user. Symptom: 500 on
/hub/oauth_callback after the user signs in at KC.

Add a unit test that fails if openid drops out of the scope list.
Digest: sha256:4be08f31306c4da35ceccc390688e02947f02d7eab5fbd1efddca90af8bd00fb
…Response

Starlette 1.0 reordered TemplateResponse positional args to
(request, name, ...); jhub-apps 2025.11.1 still calls the 2-arg form,
which makes /services/japps/create-app 500 at handle_apps. Pin to the
last 0.x release until jhub-apps ships a fix.
Digest: sha256:8e007b6dc55ffe5f451d016610f0733462de323df5bb2af3235a6f17b22e5ddf
…tch)

Tested end-to-end on hetzner via Playwright headless:
- /hub/oauth_callback returns 302 to /hub/home (no 500)
- /services/japps/create-app renders (starlette<1 cap)
- /services/japps/conda-environments/ returns 200
- After 6-min idle, refresh_token grant fires, token stays fresh
- 3-step KC -> Nebi token exchange succeeds end-to-end
Two UX fixes:

1. auto_login=True
   Hub now 302s /hub/login directly to KC instead of rendering the
   local form with a 'Sign in with OAuth 2.0' link. Single IdP — no
   point making the user click through.

2. KeyCloakLogoutHandler
   KC v18+ rejects /protocol/openid-connect/logout when
   post_logout_redirect_uri is present without id_token_hint. The static
   logout_redirect_url can't include it (per-user), so install a handler
   that reads auth_state.id_token at request time and builds the URL.
   Falls back to no-hint URL if auth_state is missing (legacy session).
Digest: sha256:93f7139b8775b7a22ac4db313583c83cded449ac77302229e1060f27bce3d6c1
Base LogoutHandler.get() short-circuits to authenticator.logout_redirect_url
when auto_login=True, so the prior override of render_logout_page never
fired. Move the per-user URL building into get() itself, with default_handle_logout
+ handle_logout still called so hub's local session state is cleared.
…dler

Authenticator-supplied handlers are appended after jupyterhub's defaults
in init_handlers, so tornado's first-match routing picks the default
LogoutHandler at /logout — our override via get_handlers is a dead
route. Monkey-patch the base LogoutHandler.get instead.
OAuthenticator.get_handlers reads the class-level logout_handler
attribute when registering the /logout route. Swap it to our subclass
(class attr on KeyCloakOAuthenticator) instead of monkey-patching
LogoutHandler.get or duplicating the /logout entry — the latter just
appends a second tuple after oauthenticator's own (r'/logout',
OAuthLogoutHandler), and tornado's first-match keeps picking the base
class.

KeyCloakLogoutHandler subclasses OAuthLogoutHandler and overrides
render_logout_page (not get) so the inherited LogoutHandler.get still
runs default_handle_logout + handle_logout (token revocation, cookie
clear). For that to happen, authenticator.logout_redirect_url is left
empty — otherwise LogoutHandler.get short-circuits when auto_login is
True and never calls render_logout_page.
Traitlets' config loader rejects unknown attribute names with a warning
and never sets the value, so c.KeyCloakOAuthenticator._kc_end_session_url
was a no-op — _kc_end_session_url stayed empty on the class default,
making the logout URL relative and causing a redirect loop.
… cleared

LogoutHandler.get sets self._jupyterhub_user = None BEFORE calling
render_logout_page (jupyterhub/handlers/login.py:89), so reading
auth_state from render_logout_page always sees current_user=None.
Move the id_token capture into get() before the cleanup runs.
KubeSpawner's default `pvc_name_template` for a *named* server is
`claim-{username}--{servername}`, but the chart's home volume mount is
hardcoded to `claim-{username}` (so all of a user's servers share a single
RWO PVC, co-located on one node via the pod-affinity rule).

Without an explicit override the names diverge: KubeSpawner creates a fresh
per-server PVC and the pod tries to mount a different per-user PVC. Users
who'd previously launched the default JupyterLab server still had the
per-user PVC sitting around and survived; fresh users (e.g. anyone who
first interacts with the platform via jhub-apps Create App) hit
FailedScheduling: 'persistentvolumeclaim claim-<user> not found' and the
pod sits Pending until the hub spawn-timeout (5 min) fires.

Lock the template to `claim-{username}` so ensure + mount converge.

Test in tests/unit/test_spawner_storage.py.
…state

The earlier switch to KeyCloakOAuthenticator (GenericOAuthenticator) set
`auth_refresh_age = 240`, expecting JupyterHub to keep auth_state fresh
via its built-in refresh_user. But JupyterHub's Authenticator.refresh_user
is a no-op stub (returns True) and oauthenticator's GenericOAuthenticator
does not override it. So auth_state.refresh_token stays frozen at
OAuth-callback time and expires after KC's SSO idle timeout (~30 min by
default), at which point nebi-envs's 3-step token exchange fails at
step 1 with:

  invalid_grant: Token is not active

and the jhub-apps Create-App "Software Environment" dropdown silently
disappears (env list is empty when the exchange fails).

Implement refresh_user on KeyCloakOAuthenticator: POST grant_type=
refresh_token to KC's token endpoint, persist the rotated tokens back
to auth_state via the {"auth_state": ...} return shape, return False on
invalid_grant to force re-login, and return True (no-op) on transient
HTTP errors.

Tests in tests/unit/test_refresh_user.py cover the four return-shape
contracts: success, invalid_grant, transient error, no-refresh-token.
z2jh's hub waits ~10s for each managed service's HTTP port to bind. The
default jhub-apps service_workers is 4 and four uvicorn workers take
~12s to bind on CI runners, so hub crashes with

    Cannot connect to managed service japps at http://hub:10202

restarts, hits the same timeout, restarts, etc. By the time the e2e
fixture's port-forward starts polling /hub/login, hub is still in this
crash-loop. The first urlopen with timeout=15s eventually raises
TimeoutError unwrapped through HubClient._request (which only catches
HTTPError), aborting every test fixture in setup.

Mirror the production overlay (gitops/apps/data-science-pack.yaml in
openteams-ai/nebari-hetzner) which pins service_workers: 1. The hub
boots cleanly within seconds and the e2e suite proceeds.
…h_user

The `if access_token and not refresh_token` branch in 01-spawner.py's
_nebi_pre_spawn_hook and 03-nebi-envs.py's get_nebi_environments was a
fallback for EnvoyOIDCAuthenticator's auth_state, which never carried a
refresh_token (Envoy only stored access_token + id_token in cookies).
With KeyCloakOAuthenticator + the new refresh_user() override, auth_state
always has a rotating refresh_token, so the branch never fires.

Also remove the now-orphan `_fetch_fresh_auth_state` helper it called,
and update two docstrings/comments that still referenced
EnvoyOIDCAuthenticator as the source of the groups claim or the hub's
OAuth client.

Reword values.yaml comment on `forwardAccessToken: false` to drop the
"avoids confusing dual-token paths" framing — there is no
Envoy-injected Bearer in the current architecture.
The KC migration left two related concerns scattered:

  * endpoint URLs (authorize/token/userdata/end_session) derived from
    the issuer were assigned individually to traitlets in configure(),
    via a small `_kc_urls` dict helper.
  * the per-user logout URL had its inputs spread across the free
    function `_build_logout_url(end_session_url=, id_token=,
    post_logout_redirect_uri=)` and TWO stray class attributes
    (`KeyCloakOAuthenticator._kc_end_session_url`,
    `KeyCloakOAuthenticator._kc_post_logout_redirect_uri`) that
    configure() stashed at startup so the logout handler could read
    them at request time.

Replace both with a single `KeyCloakConfig` frozen dataclass that
holds every KC string the chart needs and owns the logout-URL
composition as a method.

  * `KeyCloakConfig.build(issuer=..., post_logout_redirect_uri=...)`
    derives every endpoint URL from the realm issuer.
  * `cfg.build_logout_url(id_token)` composes the end-session URL,
    omitting `id_token_hint` for legacy sessions (KC v18+ rejects
    logout without it when `post_logout_redirect_uri` is set).
  * configure() builds one KeyCloakConfig and stashes it on
    `KeyCloakOAuthenticator.kc_config`; the logout handler reads it
    via `self.authenticator.kc_config`.

Net effect: one cohesive object instead of two stray class attrs +
one free function + one dict helper. The endpoint derivation becomes
trivially testable in isolation.

Tests updated in test_keycloak_authenticator.py:
  * `test_configure_attaches_kc_config_to_authenticator_class`
    replaces the pair of stray-attribute assertions.
  * `test_kc_config_build_logout_url_*` cover the method directly.
  * `test_kc_config_from_issuer_is_pure_and_doesnt_need_configure`
    pins the classmethod's pure-function semantics.
HANDOFF*.md files are in-flight working notes between agent sessions;
they should never have been checked in. Remove the one that snuck in
during the auth saga and add a .gitignore rule so the next one doesn't.
@aktech aktech marked this pull request as ready for review May 15, 2026 22:55
@aktech aktech merged commit d187fda into main May 15, 2026
17 checks passed
@aktech aktech deleted the fix/nebi-envs-stale-token branch May 15, 2026 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants