Skip to content

feat(auth): add Dex OIDC authenticator for A2A endpoint#2056

Closed
QuentinBisson wants to merge 8 commits into
kagent-dev:mainfrom
QuentinBisson:feat/09-a2a-dex-oidc-auth
Closed

feat(auth): add Dex OIDC authenticator for A2A endpoint#2056
QuentinBisson wants to merge 8 commits into
kagent-dev:mainfrom
QuentinBisson:feat/09-a2a-dex-oidc-auth

Conversation

@QuentinBisson

Copy link
Copy Markdown
Contributor

What

Adds --auth-mode=dex-oidc to the kagent controller, implementing full Dex JWT validation (signature via JWKS, issuer, audience, expiry) on the A2A endpoint.

Why

The A2A endpoint previously had no real authentication for human callers. The unsecure and trusted-proxy modes both accept identity without verifying it. For human-facing A2A sessions driven by a Dex OIDC token (e.g. the giantswarm-ad connector on glean), the server must validate the token and reject unauthenticated requests.

How

  • DexAuthenticator (go/core/internal/httpserver/auth/dex_authn.go): fetches the OIDC discovery document at {issuer}/.well-known/openid-configuration, registers the JWKS URI in a lestrrat-go/jwx/v2 auto-refresh cache (15 min default, minimum enforced by httprc), and validates each request's Bearer token against it. User identity defaults to the email claim, falling back to sub.
  • UpstreamAuth: forwards the original Authorization: Bearer <dex-token> and X-User-Id headers to agent pods. The Python A2A SDK captures state['headers']; kagent's Python executor writes them to ADK session state under HEADERS_STATE_KEY, making the token available to create_header_provider for MCP calls when allowed_headers includes "authorization".
  • Two new flags: --auth-dex-issuer and --auth-dex-client-id. Both are required when --auth-mode=dex-oidc is set.

Operator note

The Dex client ID passed via --auth-dex-client-id (e.g. kagent-a2a) must be added to muster's trustedAudiences. Today only dex-k8s-authenticator is listed. On glean the human connector is giantswarm-ad and the existing client is kagent.

Tests

15 table-driven tests in dex_authn_test.go (valid token with groups, sub fallback, missing header, non-Bearer, expired, wrong issuer, wrong audience, garbage, upstream forwarding, discovery failure cases). Existing auth_mode_test.go updated and extended with dex-oidc validation cases.

QuentinBisson and others added 8 commits June 15, 2026 23:04
GET /api/substrate/status returned 500 on clusters without the ate.dev CRDs
installed because listSubstrateCRs listed WorkerPool/ActorTemplate
unconditionally, propagating the REST mapper NoKindMatchError as a server
error.

Gate the CRD listing loop on AteClient != nil. When substrate is not
configured (the common case), there is nothing to list and no CRD calls
are made. When substrate is configured, a missing CRD is a legitimate
misconfiguration and the error is surfaced as before.

Reported in giantswarm/giantswarm#36845.

Signed-off-by: QuentinBisson <quentin@giantswarm.io>
- Merge the two consecutive AteClient != nil blocks into one
- Use noMatchKubeClient in the not-configured test so the test
  would fail if the gate were removed (the error would propagate
  as 500 rather than the guarded nil path)
- Remove extra blank line before stubAteControl

Signed-off-by: QuentinBisson <quentin@giantswarm.io>
… NoMatchError

CRD presence (WorkerPool/ActorTemplate) and gRPC client availability (AteClient)
are independent axes. Gating listSubstrateCRs on AteClient != nil caused two bugs:

- CRDs present but ate-api not deployed: WorkerPools/ActorTemplates silently omitted
  even though the CRs exist, Enabled reported false.
- CRDs absent but AteClient set: List calls reached the API server and 500'd instead
  of degrading gracefully.

Guard each KubeClient.List with meta.IsNoMatchError: on NoMatch return empty + found=false
with no error. controller-runtime's dynamic RESTMapper reloads on NoMatch automatically,
so substrate installed after kagent boots is picked up on the next request without a
restart.

Compute Enabled from observed reality: crdsPresent || AteClient != nil.

Three new cases covered by tests:
- NoCRDs + no gRPC client  -> 200, Enabled:false
- NoCRDs + gRPC client set -> 200, Enabled:true, actors populated
- CRDs present + no gRPC   -> 200, Enabled:true, WorkerPools populated

Signed-off-by: QuentinBisson <quentin@giantswarm.io>
Revert 149a2d7. The IsNoMatchError guard is overcomplicated for this
use case. Keeping the original approach (gate CRD listing on AteClient
!= nil) and will improve with a startup CRD discovery check separately.

Signed-off-by: QuentinBisson <quentin@giantswarm.io>
Validate incoming Dex JWTs (sig/iss/aud/exp) via auto-refreshed JWKS
cache. Reject unauthenticated A2A requests. Forward the verified Bearer
token and resolved user ID to agent pods so the Python ADK session-state
headers channel carries the human identity into MCP calls.

New --auth-mode=dex-oidc requires --auth-dex-issuer and
--auth-dex-client-id. The client ID must also be added to muster's
trustedAudiences.
@github-actions github-actions Bot added the enhancement New feature or request label Jun 21, 2026
@QuentinBisson

Copy link
Copy Markdown
Contributor Author

Wrong approach — kagent already runs behind oauth2-proxy with trusted-proxy mode. The A2A endpoint uses the same path as the UI. No code change needed; the only task is registering the Dex client for the A2A audience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants