Design: Declarative Keycloak configuration#154
Conversation
Proposes replacing the imperative realm-setup-job shell script with keycloak-config-cli for declarative Keycloak configuration. Covers config schema, defaults/merge strategy, backup/restore, and operator coexistence.
|
Overall, I like this approach. As discussed, the main advantage is the declarative model compared to what we have today. That said, I have two suggestions:
|
|
Hey @dcmcand, I was looking over this design proposal again, and one question popped into my mind: should the realm configuration really live in a new Keycloak realm config (roles, identity providers, auth flows, users) feels like application-level policy rather than infrastructure. Pulling it into An alternative: NIC deploys the Keycloak Argo app with sensible defaults baked into the templates (same defaults the current shell script creates), and users customize by patching the gitops repo directly (Kustomize overlays, Helm values, or editing the ConfigMap). This would:
The trade-off is losing the "single config.yaml controls everything" convenience, but users who need to customize Keycloak are likely already comfortable working with the gitops repo. What do you think? Was there a specific reason for routing this through |
There was a problem hiding this comment.
Hey @dcmcand, did a deep read of the proposal alongside the current code (pkg/argocd/foundational.go, realm-setup-job.yaml, pkg/argocd/writer.go) and also went digging into how kcc actually works under the hood. Leaving a batch of inline comments — treating these as discussion-starters since the ADR is WIP, not a "fix this" pass.
The two big ones are Comment 1 (operator coexistence, kcc has a simpler mechanism than the prefix model described in the doc, we can lean on import.remote-state.enabled=true instead) and Comment 4 (the proposed NIC-side deep-merge engine probably isn't needed if we use kcc's remote-state for reconciliation). Most other comments build on those.
Several findings end up simplifying the design rather than asking for more spec — that's the spirit here. Open to being wrong on any of these.
|
|
||
| ### Operator Coexistence | ||
|
|
||
| keycloak-config-cli supports "managed resources" with a configurable prefix. Resources created by this system are tagged with a `nebari:` prefix. The operator's dynamically created OAuth2 clients are untagged and left untouched. This is the boundary: |
There was a problem hiding this comment.
I went digging through kcc's docs and source to figure out which env vars back this prefix mechanism, and I think we can lean on something simpler. kcc doesn't actually have a name-prefix policy for managed resources: I checked application.properties, MANAGED.md, and the issue tracker. The closest equivalents are type-level: omit a resource type from the input entirely, or set IMPORT_MANAGED_<type>=no-delete.
There's a cleaner pattern that IMO gives us better operator-coexistence guarantees with less coupling:
- Don't include
clients:in the input config at all. Per adorsys/keycloak-config-cli#1306, kcc treats "key missing" as "leave this resource type alone" (different fromclients: []which means "delete all"). Operator owns clients; NIC owns realm-level config. IMPORT_REMOTESTATE_ENABLED=true(default in 6.x): kcc tracks resources it itself created in a Keycloak realm attribute. Anything the operator creates at runtime is invisible to kcc's delete logic, even ifclients:were in the input. Stored asde.adorsys.keycloak.config.state-*inside Keycloak's DB, no extra PVC.IMPORT_MANAGED_CLIENT=no-delete: belt-and-braces. Even if a future change accidentally listed clients in the input, kcc still wouldn't delete unlisted ones.
Three independent layers protecting operator-created clients, and we don't have to retrofit a nebari: prefix onto existing clients (which would be a migration concern in itself, since today's operator-created clients don't have one).
Worth rewriting this section around those primitives? Happy to draft something.
There was a problem hiding this comment.
You were right to dig into this - the prefix mechanism doesn't exist in kcc, and your three-layer alternative is meaningfully better. Doc rewrite (just pushed) leans on exactly what you proposed: omit clients: from the input entirely, rely on import.remote-state.enabled=true (the v6.x default), set import.managed.client=no-delete for belt-and-braces. That also kills the migration concern about retrofitting prefixes onto existing operator-created clients.
Verified against kcc v6.5.0's application.properties and MANAGED.md before adopting. The doc cites both in the rewrite.
|
|
||
| ### Secrets Handling | ||
|
|
||
| Secrets never go in `config.yaml`. The config references environment variable names using keycloak-config-cli's `${env:VAR}` substitution syntax. These references are written literally into the ConfigMap - no substitution happens in NIC. |
There was a problem hiding this comment.
Two things to flag here.
Syntax: kcc 6.x uses $(env:VAR) with parens, not ${env:VAR} with braces: the prefix changed at v4.0 to avoid colliding with Keycloak's own builtin variables like ${role_uma_authorization}. Default-value form $(env:VAR:fallback) works too. Worth fixing in the schema example below so users don't copy-paste a syntax that kcc won't substitute.
Substitution is also off by default (import.var-substitution.enabled=false), so we'll need IMPORT_VARSUBSTITUTION_ENABLED=true explicitly in the Job manifest. With IMPORT_VARSUBSTITUTION_UNDEFINEDISERROR=true (default) we get fail-fast on missing env vars — which is the behavior described here.
Bigger thing: secret lifecycle. Reading this alongside pkg/argocd/foundational.go:222-292, I think we want to be explicit about which pipeline owns which secrets. Today NIC creates four Secrets directly (keycloak-admin-credentials, nebari-realm-admin-credentials, etc.) with no-op-if-exists semantics that let operators rotate in-cluster without nic deploy overwriting them.
I think both pipelines should coexist, scoped clearly:
- NIC-generated secrets (admin password, realm admin, DB) → keep current pipeline. Not user-supplied; rotation is by re-generation.
- User-supplied via
$(env:VAR)(IdP client secrets, custom credentials) → new pipeline, separate Secret name, overwrite-on-deploy (since.envis the source of truth for these).
Could the doc spec the new Secret's name/namespace and confirm nebari-realm-admin-credentials keeps its current handling? Reframing my earlier thread comment
There was a problem hiding this comment.
Both points addressed in the rewrite. Every ${env:VAR} is now $(env:VAR) and the doc calls out that import.var-substitution.enabled=true has to be set explicitly on the Job (kcc's default is off).
For the secret pipeline: split as you suggested into NIC-generated (admin, DB, realm-admin - existing pipeline at foundational.go:222-292 unchanged with its no-op-if-exists semantics) and user-supplied (new Secret keycloak-config-user-secrets, overwrite-on-deploy because .env is the source of truth). Both Secrets mount to the kcc Job. nebari-realm-admin-credentials keeps its current handling exactly.
| enabled: true | ||
| config: | ||
| clientId: "${env:GITHUB_CLIENT_ID}" | ||
| clientSecret: "${env:GITHUB_CLIENT_SECRET}" |
There was a problem hiding this comment.
Quick follow-up to the syntax note above: once we settle on $(env:...), this example block needs the parens too:
clientSecret: "$(env:GITHUB_CLIENT_SECRET)"
# ...
value: "$(env:REALM_ADMIN_PASSWORD)"Default-value form is also useful for IdP configs where missing env vars should fall back to disabled-by-default rather than failing the deploy:
clientId: "$(env:GITHUB_CLIENT_ID:disabled)"Optional: depends on whether we want missing IdP secrets to fail loudly or be silently disabled. IMO fail-loud for the realm admin password and other load-bearing values, fall-back-to-disabled for optional IdPs. Worth a paragraph on the philosophy.
There was a problem hiding this comment.
Fixed throughout the examples. On the fail-loud vs fall-back-to-disabled question: the rewrite takes a stronger stance than just philosophy. NIC's pre-deploy scanner will block deploys where a sensitive field (identityProviders[].config.clientSecret, etc.) references an env var that resolves to empty or unset. The reasoning: a silent fallback for IdP secrets means GitHub login silently breaks on next deploy, which is a much worse failure mode than "deploy errored, fix your env." For non-secret fields where a missing-equals-disabled flow makes sense, kcc's own substitution handles it.
|
|
||
| - **Scalars**: config.yaml value replaces default | ||
| - **Lists** (users, roles, identityProviders): merged by key field (`username`, `name`, `alias`). Matching key overrides that entry, new keys are appended | ||
| - **Removal**: set `enabled: false` on a default entry |
There was a problem hiding this comment.
I think this section can simplify a lot once we lean on kcc's remote-state behavior. Reading it now, NIC takes on:
- A deep-merge engine over a deeply nested upstream schema (Keycloak realm export)
- List-merge-by-key rules that we'd need to spec for ~10 list-shaped fields beyond the three named here:
clientScopes,authenticationFlows[].authenticationExecutions[], recursivegroups,requiredActions, etc. - A "removal" semantic via
enabled: false(which in Keycloak actually means disable, not delete: the alias/username still occupies its namespace)
If we set IMPORT_REMOTESTATE_ENABLED=true (default) and let kcc do the reconciliation, NIC's job shrinks to:
- Render baked-in defaults (current realm-setup-job's resources) as a base config
- Render the user's
keycloak:section over the base: shallow merge by top-level key, kcc handles the deep diff downstream - Write the result as a ConfigMap
What this gives us:
- First run: kcc creates everything declared.
- Subsequent runs: kcc only touches resources it itself created (tracked in
de.adorsys.keycloak.config.state-*realm attribute). Direct-in-Keycloak admin edits to unmanaged resources are preserved: which IMO matches the intent of "don't wipe data important to the user state of the cluster." - User-facing: users only declare changes, not the whole realm. A user adding a GitHub IdP just declares the IdP; defaults stay defaulted.
The "removal via enabled: false" thing goes away entirely (kcc manages deletions through its own state), and we don't have to spec merge keys for every nested list shape.
The trade-off: kcc reconciles back to input on next run for resources it does manage. That's by design and matches what users expect from a declarative tool, but worth being explicit in the doc — "edits to NIC-managed resources via Keycloak UI will be reverted on next sync; edits to admin-created resources outside the input are preserved."
Is there a reason we wanted NIC to own merge logic that I'm missing? My read is it adds surface without buying anything kcc doesn't already give us.
There was a problem hiding this comment.
This was the highest-leverage finding in the review. The doc now has no NIC-side merging at all. keycloak.yaml is a separate file in kcc's native input format, written through to a cluster Secret verbatim. kcc's remote-state model does the reconciliation. The "removal via enabled: false" semantic is gone (it was also wrong in Keycloak's actual model: enabled: false disables, doesn't delete).
Also added the explicit caveat you suggested: "edits to NIC-managed resources via Keycloak UI will revert on next sync; edits to admin-created resources outside the input are preserved."
|
|
||
| # Authentication flows, client scopes, etc. | ||
| authenticationFlows: [] | ||
| clientScopes: [] |
There was a problem hiding this comment.
Reading this as illustrative rather than the actual default set, but flagging for the implementation pass: pkg/argocd/templates/manifests/keycloak/realm-setup-job.yaml:83-108 creates a groups client scope with an oidc-group-membership-mapper and registers it as a realm default-default. That scope is what puts groups claims into tokens, which is what makes NebariApplication.allowedGroups work in the operator.
Two safety nets are already in our favor:
- kcc's
IMPORT_MANAGED_CLIENT_SCOPE=no-deletedefault (set this way precisely because Keycloak auto-creates default scopes that kcc can't reliably distinguish from user-created ones) IMPORT_REMOTESTATE_ENABLED=true(kcc only deletes scopes it itself created)
So "groups scope silently dropped from new defaults" is probably safe-by-omission, but only by accident. Worth walking the current shell script line-by-line during implementation and confirming each resource is represented in the new defaults, and adding an integration test that asserts post-migration token claims match today's setup. Catches it explicitly rather than relying on safety nets.
There was a problem hiding this comment.
Important catch, and worse than the comment suggests. When I checked the realm-setup-job on this branch before the rewrite, the groups scope had been silently dropped (file was 83 lines vs main's 110, no oidc-group-membership-mapper). The recent merge from main brought back the file, so this branch is probably fine again, but it was a real regression for a moment.
The rewrite makes the groups client scope an explicit baked-in default and adds a migration integration test that asserts a token issued for a user in a group contains the groups claim. That contract is what NebariApplication.allowedGroups depends on, so a regression here would silently break operator-driven auth across the platform.
|
|
||
| ## Phasing | ||
|
|
||
| **Phase 1**: Config merge + Job replacement - replace the shell script with keycloak-config-cli, add `keycloak` section to config.yaml, implement merge logic |
There was a problem hiding this comment.
Wanted to float a finer-grained split for Phase 1 to keep future doors open:
- 1a replace the shell script with kcc, using only baked-in defaults rendered into a static ConfigMap. No user
keycloak:section yet, no$(env:*)scanning, no merge logic. Answers the narrow question: "does kcc replace the shell script cleanly, including the groups scope and operator-coexistence?" - 1b add user overrides via the
keycloak:section. With remote-state doing the heavy lifting (Comment 4), this is a much smaller surface than originally proposed. - 1c add
$(env:*)scanning and the user-secrets pipeline.
Each step is independently testable, and 1a alone is forward-compatible with both the current config.yaml direction and a hypothetical "skip 1b, do gitops overlays instead" alternative. Bundling all three commits to the user-facing schema in a way that's harder to walk back if we change our minds.
Practical case for 1a-first: it isolates the "kcc replaces kcadm" risk from the "is our schema right?" question. If 1a uncovers something weird about kcc managed-resource policies (which I don't think it will, given how well-documented the remote-state model is, but still), we discover that before any user-visible schema work.
Doesn't have to be three PRs: could be three commits in one PR with clear boundaries. The point is the order of validation, not the literal git topology.
There was a problem hiding this comment.
Adopted as written. Phase 1a is kcc replaces kcadm with baked-in defaults only, 1b adds the user keycloak.yaml parsing path, 1c adds env substitution and the user-secrets Secret. Phase 1a will resolve the kcc remote-state behavior question in isolation before any user-facing schema work lands.
| clientSecret: "${env:GITHUB_CLIENT_SECRET}" | ||
| ``` | ||
|
|
||
| ## Backup & Restore |
There was a problem hiding this comment.
Restating my earlier comment in my thread now that the rest of the design is more concrete: I think this whole section should be a follow-up doc.
The reasons get clearer the more I read:
- Different code paths (CLI commands vs render-time pipeline)
- Different operational concerns:
nic backup keycloakneeds cluster access to readkeycloak-admin-credentials, then port-forward or exec to reach the Keycloak service. That's a new pattern not used elsewhere in NIC today. - Doesn't load-bear for Phase 1
- The "restore is full replace, not selective merge" semantic conflicts with the rest of the proposal (which, after Comment 4's simplification, is more like "additive overlays on defaults"). After a restore, the user's
config.yamlno longer reflects reality.
Splitting this out also lets us consider backup-via-Velero-PVC-snapshot vs. backup-via-realm-export-API as separate options, without entangling them with the realm-config pipeline.
There was a problem hiding this comment.
Moved to a follow-up doc. The reasoning you gave (different code path, different operational pattern, the "full replace" semantic conflicts with the remote-state model) all stand. The rewrite has a short stub explaining the split.
Worth noting: with the file-based model, the manual backup story becomes "kc.sh export and sanitize the result" rather than a custom CLI. The Phase 2 work is mostly about automating that sanitization, not building a parallel snapshot system.
|
|
||
| ## Open Questions | ||
|
|
||
| 1. **keycloak-config-cli version pinning strategy** - keycloak-config-cli has strict version coupling with Keycloak (6.x targets Keycloak 25.x/26.x). How do we want to manage this? Pin in the Job manifest and document the compatibility mapping? |
There was a problem hiding this comment.
For the immediate question (current Keycloak 24.0): kcc v6.5.0 is the current stable, and 24.0.5 is in their CI matrix at .github/workflows/ci.yaml. The verified-compatible image is quay.io/adorsys/keycloak-config-cli:6.5.0-24.0.5. v6.5.0 also landed several idempotency fixes worth having (release notes).
For the longer-term version pinning question, I think there's a deeper one underneath — user-facing schema compatibility, not just runtime. If our keycloak: section mirrors kcc's input which mirrors Keycloak's realm export, a Keycloak major-version upgrade can rename/add/remove fields, and existing user config.yaml files break.
Three stances we could take:
- A — Pin Keycloak aggressively. Treat upgrades as opt-in events with migration tooling.
- B — NIC-owned subset schema. Translate to upstream at render time (the "no translation layer" claim in the doc would have to go).
- C — Accept upstream as-is. Document that user configs may break across Keycloak upgrades.
Currently the doc implicitly picks C. Worth being explicit. (For "don't lock future paths" — A and C are reversible, B locks NIC into maintaining a translator forever.)
There was a problem hiding this comment.
Took stance C explicitly. The rewrite says user keycloak.yaml files may need updates across Keycloak major-version upgrades and that we'll document the path in release notes for each Keycloak bump. C is consistent with the rest of the doc's "no NIC-side schema, pass-through to kcc" philosophy, and unlike B it doesn't lock NIC into a translator forever.
Pinning the verified-compatible image (quay.io/adorsys/keycloak-config-cli:6.5.0-24.0.5 for our current Keycloak 24.0) is now an explicit dependency in the doc.
|
|
||
| 1. **keycloak-config-cli version pinning strategy** - keycloak-config-cli has strict version coupling with Keycloak (6.x targets Keycloak 25.x/26.x). How do we want to manage this? Pin in the Job manifest and document the compatibility mapping? | ||
|
|
||
| 2. **Config validation** - Should `nic validate` check the merged Keycloak config against a schema before deploy, or is runtime failure from the Job acceptable initially? |
There was a problem hiding this comment.
With Comment 4's simplification (kcc owns reconciliation, NIC does shallow render-time merge), validation gets cheaper to spec. Phase 1a needs no validation because there's no user input. Phase 1b just needs:
- Top-level shape (the
keycloak:block parses as expected) $(env:*)references resolve (we already have the env, just check before deploy)- Optional: basic field-name validation against a JSON schema generated from kcc's input format
I'd lean toward shipping at least the env-var resolution check at 1b time — failure mode "Job fails in ArgoCD with cryptic logs" is a much worse UX than "nic validate errors with missing env var: GITHUB_CLIENT_SECRET."
Punting "to later" usually means it doesn't land. If we agree on a minimal scope now we won't have to revisit.
There was a problem hiding this comment.
Promoted from open question to Phase 1b deliverable. nic validate will run the literal-secret scanner, reject unresolved env refs in sensitive fields, and verify schema shape. Same logic runs in CI on PRs against keycloak.yaml. Agree with the framing: the "cryptic PostSync failure hours after merge" UX is the worst possible outcome and cheap to prevent.
|
|
||
| **Phase 2**: Backup & restore - `nic backup keycloak`, `nic restore keycloak`, optional CronJob | ||
|
|
||
| Phase 2 depends on Phase 1 being deployed and validated. |
There was a problem hiding this comment.
Suggesting an "Alternatives Considered" section. The doc reads as if config.yaml ownership is the only path, which may end up being right, but worth capturing explicitly because the trade-offs aren't obvious from the body of the proposal.
The main alternative to name: NIC ships a default kcc input as a static ConfigMap, users customize via the gitops repo directly (Kustomize overlays, raw ConfigMap edits, Helm values). NIC stays scoped to infrastructure; the keycloak-config-cli schema lives in the gitops repo, where it's already user-editable.
What we'd gain: NIC doesn't take ownership of a deeply nested, version-coupled application-level schema. The "schema coupling" question (Open Q #1) becomes less urgent because users own their overlay format. Validation surface area shrinks.
The trade-off is losing a single pane of glass via config.yaml. Users who want to customize Keycloak now have to learn Kustomize/gitops patterns, not just edit one YAML file.
I don't think we have to pick now, but capturing the alternative in the doc means the trade-off is on record, and Phase 1a (Comment 6) is forward-compatible with either outcome. We can defer the bigger decision until we have more user feedback on which path they prefer.
There was a problem hiding this comment.
Added the section. Captured three alternatives: (A) the original embedded-in-config.yaml deep-merge design, (B) your gitops-overlay proposal, and (C) a Sealed-Secret-based variant for higher-trust environments. Each has rationale for rejection.
On B specifically: I think you were pointing at the right concern (don't couple NIC to a deeply nested upstream schema), but the answer that fell out was a separate keycloak.yaml file alongside config.yaml, applied via cluster Secrets rather than the gitops repo. NIC stays scoped to "read file, validate, plumb into cluster Secret" with no schema knowledge. The gitops repo never sees realm content. So we ended up taking the spirit of B without losing reproducibility-from-nic deploy-alone.
Major restructure addressing review feedback on PR #154: - Move from "ConfigMap in gitops repo" to "in-cluster Secrets, gitops repo holds only the Job manifest" so secrets never land in the gitops repo even as placeholders. - Replace the embedded keycloak: config.yaml section with a separate keycloak.yaml file in kcc's native input format, pass-through with no NIC-side merging. - Drop the (non-existent) managed-resource-prefix mechanism in favor of kcc's three-layer model: omit clients:, remote-state.enabled=true, managed.client=no-delete. - Fix kcc 6.x variable substitution syntax: $(env:VAR) not ${env:VAR}, and require import.var-substitution.enabled=true. - Spec the literal-secret scanner with field list, --allow-literal-secrets and --allow-credential-hashes escape hatches. - Add the groups client scope explicitly to the baked-in defaults plus migration test contract. - Move backup/restore to a follow-up doc. - Split Phase 1 into 1a/1b/1c (kcc replaces kcadm, then user file, then env substitution). - Add Alternatives Considered, CI Integration, Blue/Green sections. - Take stance C (accept upstream as-is) on Keycloak version coupling.
|
Pushed a rewrite ( |
|
@viniciusdc - replying to your March 23 thread comment: both addressed in the rewrite ( |
|
@viniciusdc - replying to your April 10 thread comment: real question, and it pushed the design in a better direction. The rewrite acknowledges that The compromise we landed on: Why not your full proposal (defaults shipped to gitops repo, customize via overlays): the source of truth ends up split between the infra repo and the gitops repo. We've decided to treat the gitops repo as untrusted for realm content at all (encrypted or otherwise), which makes "users customize via gitops" architecturally awkward. The file-based approach gives you the same "NIC stays scoped" property without that split. |
…write Aligns Phase 1a with the agreed design in PR #154's rewrite (commit 669458a on design/declarative-keycloak-config). Previous structure put the kcc realm input in a ConfigMap rendered into the gitops repo. The rewrite's "Why no ConfigMap in the gitops repo" section explicitly rejects that pattern: - The gitops repo is treated as untrusted. Placeholder-only contracts ("file in repo only contains $(env:...), never literals") create a footgun where one operator mistake leaks a secret into git history. - Realm content beyond raw secrets is still sensitive (IdP names, redirect URI patterns, credential hashes from kc.sh export). New structure: - pkg/argocd/keycloak_defaults.yaml (new) — kcc realm input in native format, embedded into the NIC binary via //go:embed. Same content as the previous gitops ConfigMap data field, no ConfigMap wrapping. - pkg/argocd/keycloak_defaults.go (new) — RenderKeycloakDefaults() helper renders the template with TemplateData (Domain substitutes into the argocd redirect URI). - pkg/argocd/foundational.go — InstallFoundationalServices now renders the embedded defaults with cfg.Domain and writes the result to a new in-cluster Secret keycloak-config-import (keycloak namespace). createKeycloakImportSecret added as a sibling of the existing Secret creators; uses createSecret's no-op-if-exists semantics so re-deploys preserve any value already present in the cluster. - templates/manifests/keycloak/realm-setup-job.yaml — volume changed from a ConfigMap reference (keycloak-realm-config, deleted) to a Secret reference (keycloak-config-import). Everything else stays: same kcc image, same env-var sourcing, same hook annotations. - templates/manifests/keycloak/realm-config-cm.yaml (deleted) — the gitops repo no longer carries realm content. Only the Job manifest remains there. Phase 1a scope is unchanged. Phase 1b will introduce a user-supplied keycloak.yaml file path that overrides these embedded defaults; the Secret-vs-ConfigMap structural decision now makes that path direct. Test: - keycloak_defaults_test.go (new) asserts the load-bearing fields of the rendered kcc input — realm settings, scope lists, groups, argocd client, the oidc-group-membership-mapper, the full defaultDefaultClientScopes list, and that no unresolved Go-template markers leak into the output. - writer_test.go TestKeycloakRealmConfigCM removed (it tested the deleted ConfigMap template).
|
closed as accepted |
Phase 1a of the declarative Keycloak configuration design proposed in #154. Replaces the imperative shell script in realm-setup-job.yaml with a declarative YAML applied by keycloak-config-cli (kcc) at PostSync. Changes: - Add `realm-config-cm.yaml`: ConfigMap carrying the kcc-format YAML for the nebari realm. Parity with the previous shell setup, including the `groups` client scope + group-membership mapper (load-bearing for operator-managed app authorization), the `argocd` OIDC client (PR #234), the `argocd-admins`/`argocd-viewers` groups, and the admin user's `argocd-admins` membership. - Rewrite `realm-setup-job.yaml`: pulls `quay.io/adorsys/keycloak-config-cli:6.5.0-26.5.5`, mounts the ConfigMap, and applies it against Keycloak. Secrets are mounted from the existing `nebari-realm-admin-credentials` and `argocd-oidc-client-secret` Secrets and substituted into the realm YAML by kcc at apply time via `$(env:VAR)` placeholders. Operator coexistence: - `IMPORT_REMOTESTATE_ENABLED=true` (kcc default since v6.x): kcc tracks resources it created in a Keycloak realm attribute. Re-runs only reconcile what kcc itself owns; operator-managed OAuth2 clients created at runtime via NebariApplication CRDs are invisible to kcc's delete logic. - `IMPORT_MANAGED_CLIENT=no-delete`: belt-and-braces. Even if a future input change accidentally removes the argocd client from `clients:`, kcc will not delete unlisted clients. The strict "omit `clients:` entirely" pattern from my #154 review was too restrictive: NIC owns the `argocd` client at bootstrap, so it has to be in the input. The two-layer protection above is the practical shape of the coexistence story. Phase 1a scope (per #154 review): only replaces the existing realm setup. No `keycloak:` section in `config.yaml`, no NIC-side env-var scanning, no merge engine. Defaults are static. Future phases add the user-facing schema (1b) and the user-secrets pipeline (1c). Migration: on existing deployments, kcc detects existing resources by their natural keys (realm name, role name, username, scope name) and updates them in place, adding them to remote state. First run on an already-configured cluster is a no-op for existing managed values; the pre-generated argocd OIDC client secret (same value on both sides) is re-applied verbatim. Refs #154.
…write Aligns Phase 1a with the agreed design in PR #154's rewrite (commit 669458a on design/declarative-keycloak-config). Previous structure put the kcc realm input in a ConfigMap rendered into the gitops repo. The rewrite's "Why no ConfigMap in the gitops repo" section explicitly rejects that pattern: - The gitops repo is treated as untrusted. Placeholder-only contracts ("file in repo only contains $(env:...), never literals") create a footgun where one operator mistake leaks a secret into git history. - Realm content beyond raw secrets is still sensitive (IdP names, redirect URI patterns, credential hashes from kc.sh export). New structure: - pkg/argocd/keycloak_defaults.yaml (new) — kcc realm input in native format, embedded into the NIC binary via //go:embed. Same content as the previous gitops ConfigMap data field, no ConfigMap wrapping. - pkg/argocd/keycloak_defaults.go (new) — RenderKeycloakDefaults() helper renders the template with TemplateData (Domain substitutes into the argocd redirect URI). - pkg/argocd/foundational.go — InstallFoundationalServices now renders the embedded defaults with cfg.Domain and writes the result to a new in-cluster Secret keycloak-config-import (keycloak namespace). createKeycloakImportSecret added as a sibling of the existing Secret creators; uses createSecret's no-op-if-exists semantics so re-deploys preserve any value already present in the cluster. - templates/manifests/keycloak/realm-setup-job.yaml — volume changed from a ConfigMap reference (keycloak-realm-config, deleted) to a Secret reference (keycloak-config-import). Everything else stays: same kcc image, same env-var sourcing, same hook annotations. - templates/manifests/keycloak/realm-config-cm.yaml (deleted) — the gitops repo no longer carries realm content. Only the Job manifest remains there. Phase 1a scope is unchanged. Phase 1b will introduce a user-supplied keycloak.yaml file path that overrides these embedded defaults; the Secret-vs-ConfigMap structural decision now makes that path direct. Test: - keycloak_defaults_test.go (new) asserts the load-bearing fields of the rendered kcc input — realm settings, scope lists, groups, argocd client, the oidc-group-membership-mapper, the full defaultDefaultClientScopes list, and that no unresolved Go-template markers leak into the output. - writer_test.go TestKeycloakRealmConfigCM removed (it tested the deleted ConfigMap template).
- Extract labelPartOf/labelManagedBy/labelManagedByValue constants so the import-secret labels stop tripping goconst (4th occurrence of the label keys). Fixes the lint failure blocking CI. - Instrument RenderKeycloakDefaults with an OTel span and thread ctx through from InstallFoundationalServices, per the pkg/ instrumentation convention. - Render with missingkey=error so a stray template field fails loudly instead of emitting "<no value>" into the realm config; add a guard assertion. - Expand TestRenderKeycloakDefaults to cover the security-relevant scalars (sslRequired, registrationAllowed, bruteForceProtected), the realm roles, the multivalued groups-mapper flag, the argocd client flags, and webOrigins. - Drop the dead doc path from keycloak_defaults.yaml; point at #154.
Summary
Design doc proposing a declarative Keycloak configuration system to replace the current imperative
realm-setup-job.yamlshell script.config.yamlSee
docs/superpowers/specs/2026-03-12-declarative-keycloak-config-design.mdfor the full design.Looking for feedback on the approach, config schema, and open questions at the bottom of the doc.