Skip to content

Commit 4d48ae7

Browse files
authored
[SSO] Add design doc for group to role mapping for OIDC (#35899)
Proposes JWT-based group-to-role sync for self-managed. On connection, Materialize reads group claims from the JWT and grants/revokes role memberships accordingly. Track these using a dedicated sentinel grantor (MZ_JWT_SYNC_ROLE_ID) to distinguish sync-managed from manually-managed grants.
1 parent 09f8729 commit 4d48ae7

1 file changed

Lines changed: 235 additions & 0 deletions

File tree

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Group-to-Role Mapping for Self-Managed
2+
3+
## Associated:
4+
1. [SSO For Self Managed](https://www.notion.so/materialize/SSO-for-Self-Managed-2a613f48d37b806f9cb2d06914454d15)
5+
2. [SCIM](https://www.notion.so/materialize/SCIM-33513f48d37b8086be2ad9ec2a9ad554)
6+
7+
## The Problem
8+
9+
Self-managed Materialize customers using OIDC SSO have no way to manage database permissions from their identity provider. Today, an admin must:
10+
11+
1. Configure OIDC so users can log in (already supported)
12+
2. Manually run `GRANT` statements in Materialize for every user
13+
14+
When a team member joins, leaves, or changes roles, the admin has to update permissions in both the IdP *and* Materialize separately. There's no connection between "Alice is in the analytics team in Okta" and "Alice can read the orders table in Materialize."
15+
16+
## Success Criteria
17+
18+
- Users' database role memberships are dynamically updated based on their IdP group memberships
19+
- Self-managed OIDC is the primary target
20+
- When a user's groups change in the IdP, their Materialize permissions update on next connection
21+
- Permissions manually set for a user are left unchanged (independent of sync)
22+
- Admins can configure whether sync failures block login (fail-open by default, fail-closed for strict compliance)
23+
- The design must not lock into any single identity provider (e.g., Frontegg); SCIM for self-managed is a near-term priority
24+
25+
## Out of Scope
26+
27+
- Auto-creating database roles from IdP groups
28+
- Syncing role *privileges* (GRANTs on objects) from the IdP, only role *membership*
29+
- Real-time push-based sync (SCIM webhook to Materialize), we sync on connection. This is standard practice (CockroachDB, PostgreSQL LDAP) and aligns with customer expectations. A limitation is that the console and internal tables only reflect state as of the user's last connection.
30+
- Revoking the user's role itself when removed from IdP (existing token-expiry behavior handles this)
31+
- Cloud / Frontegg support (future work, but the design intentionally avoids Frontegg-specific coupling to support self-managed SCIM as a near-term follow-up)
32+
33+
### Future Goals
34+
35+
- Admin controls the mapping between external group names and database roles (name remapping via regex transforms, e.g., `materialize_(.*)` -> `$1`)
36+
- Cloud / Frontegg group sync
37+
- Mid-session sync on token refresh
38+
- SCIM-based group provisioning for self-managed (complementary to JWT-based sync)
39+
40+
## Solution Proposal
41+
42+
Follow the CockroachDB pattern: extract groups from JWT claims, match group names directly to database role names (case-insensitive), sync memberships at session startup.
43+
44+
### User Experience
45+
46+
#### Admin Setup
47+
48+
**Step 1: Configure IdP to include groups in the JWT.**
49+
In their IdP (Okta, Azure AD, etc.), the admin configures the OIDC application to include a `groups` claim in the access token.
50+
51+
Example JWT payload after configuration:
52+
```json
53+
{
54+
"sub": "alice@example.com",
55+
"iss": "https://dev-123456.okta.com/oauth2/default",
56+
"groups": ["analytics", "platform_eng"],
57+
"exp": 1234567890
58+
}
59+
```
60+
61+
**Step 2: Create matching roles in Materialize with privileges.**
62+
```sql
63+
CREATE ROLE analytics;
64+
GRANT SELECT ON TABLE orders, customers TO analytics;
65+
GRANT USAGE ON SCHEMA production TO analytics;
66+
67+
CREATE ROLE platform_eng;
68+
GRANT ALL ON SCHEMA infrastructure TO platform_eng;
69+
```
70+
71+
The role names must match the IdP group names (case-insensitive).
72+
73+
**Step 3: Enable group sync.**
74+
```sql
75+
ALTER SYSTEM SET oidc_group_role_sync_enabled = true;
76+
-- Optional: change the claim name if your IdP uses something other than "groups"
77+
-- ALTER SYSTEM SET oidc_group_claim = 'groups';
78+
-- Optional: require group sync to succeed for login (fail-closed mode for strict compliance)
79+
-- ALTER SYSTEM SET oidc_group_role_sync_strict = true;
80+
```
81+
#### End User Experience
82+
83+
Alice logs in via `psql` with her JWT. On connection:
84+
- Her user role `alice` is auto-provisioned if it doesn't exist (existing behavior)
85+
- The sync reads `groups: ["analytics", "platform_eng"]` from her JWT
86+
- Materialize internally runs `GRANT analytics TO alice` and `GRANT platform_eng TO alice`
87+
- Alice can now query tables she has access to through those roles
88+
89+
Later, Alice moves to a different team. Her IdP admin removes her from `analytics` and adds her to `data_eng`. On her next login:
90+
- The sync sees her JWT now has `groups: ["data_eng", "platform_eng"]`
91+
- Materialize revokes `analytics` from Alice and grants `data_eng`
92+
- Her permissions update automatically
93+
94+
If an admin also manually ran `GRANT reporting TO alice`, that grant is unaffected by sync: manual grants are never touched.
95+
96+
### Where Groups Come From
97+
98+
Groups come from a configurable JWT claim (default: `"groups"`). The claim value can be a JSON array of strings or a single string.
99+
100+
### Configuration
101+
102+
New system variables (set via `ALTER SYSTEM SET`, stored in `SystemVars`):
103+
104+
```sql
105+
ALTER SYSTEM SET oidc_group_role_sync_enabled = true;
106+
ALTER SYSTEM SET oidc_group_claim = 'groups';
107+
ALTER SYSTEM SET oidc_group_role_sync_strict = false; -- default: fail-open
108+
```
109+
110+
| Variable | Default | Description |
111+
|---|---|---|
112+
| `oidc_group_role_sync_enabled` | `false` | Feature gate for group-to-role sync |
113+
| `oidc_group_claim` | `'groups'` | JWT claim name containing group memberships |
114+
| `oidc_group_role_sync_strict` | `false` | When `true`, login is rejected if sync fails (fail-closed). When `false`, sync errors are logged but login proceeds (fail-open). |
115+
116+
Group names from the JWT are matched directly to Materialize role names (case-insensitive). Roles must be pre-created in Materialize; group sync does not auto-create roles (only auto-creates users). Pre-created roles serve as the allowlist for IdP groups. Roles prefixed with `mz_` or `pg_` are always excluded from sync to prevent privilege escalation to system roles.
117+
118+
### Sync Logic
119+
120+
The sync happens in `handle_startup_inner()`, after the user role is auto-provisioned but before the session is fully established.
121+
122+
```
123+
1. Extract groups from JWT claims (via configured claim name)
124+
2. Normalize: lowercase, deduplicate, sort
125+
3. Look up each group name as a role name in the catalog (skip non-existent with warning). Reject any group that maps to a role prefixed with `mz_` or `pg_` (log warning).
126+
4. Get user's current RoleMembership.map from catalog
127+
5. Partition current memberships:
128+
- sync_granted: entries where grantor == MZ_JWT_SYNC_ROLE_ID (sentinel)
129+
- manual_granted: entries where grantor != sentinel (leave untouched)
130+
6. Diff against sync_granted only:
131+
- To grant: target roles NOT in sync_granted AND NOT in manual_granted
132+
- To revoke: sync_granted roles NOT in target roles
133+
7. Execute GRANTs (with sentinel grantor) and REVOKEs via catalog_transact
134+
```
135+
136+
Note: if a role is already manually granted, we don't also sync-grant it. The manual grant takes precedence and won't be revoked by sync.
137+
138+
### How Manual Grants Are Preserved (Grantor Field)
139+
140+
We need to distinguish "roles granted by JWT sync" from "roles granted manually by SQL". We only revoke roles that were originally granted by the sync mechanism.
141+
142+
The **grantor** field already exists and is persisted in `RoleMembership`. Today it records who ran the `GRANT` statement. We use a **sentinel grantor** to mark sync-granted memberships:
143+
144+
- JWT group sync grants a role → grantor = `MZ_JWT_SYNC_ROLE_ID` (a new dedicated sentinel role)
145+
- Human runs `GRANT role TO user` → grantor = their own role ID
146+
147+
**Note**: `RoleMembership.map` is `BTreeMap<RoleId, RoleId>` each role membership stores exactly one grantor. A role cannot simultaneously have both a manual grant and a sync grant; only one grantor is recorded. This means "manual wins" semantics must be enforced by the sync logic's ordering, not by the data model. Enforced by: sync checking the current grantor for each role membership before acting; if a role is already granted manually, the sync skips it entirely (to not overwrite the grantor); if a role is granted with the sentinel grantor (sync) and an admin later manually re-grants it, the manual `GRANT` overwrites the grantor to the admin's role ID.
148+
149+
If the admin later `REVOKE`s that manual grant, the role membership is removed entirely. On next login, the sync sees the role is absent and would re-grants it with the sentinel grantor.
150+
151+
### Security: Shadowed Permissions
152+
153+
Manually-granted permissions that coexist with sync-managed memberships create a risk of **shadowed permissions**: a user may retain access through a manual grant even after their IdP group membership is revoked. This is a known trade-off of the "manual grants are never touched" design.
154+
155+
To mitigate this:
156+
- The `oidc_group_role_sync_strict` mode (when enabled) rejects login if sync fails, preventing stale permissions from persisting silently.
157+
- Admins should audit manual grants periodically. The sentinel grantor (`MZ_JWT_SYNC_ROLE_ID`) makes it possible to distinguish sync-managed from manual memberships via `mz_role_members`.
158+
159+
### Edge Cases
160+
161+
The default behavior is fail-open: send a NOTICE to the client and skip on misconfiguration, allowing login to proceed. When `oidc_group_role_sync_strict = true`, sync failures reject the login instead.
162+
163+
**Group maps to non-existent role**: Send a NOTICE to the client (e.g., `NOTICE: group "foo" has no matching Materialize role, skipping`), emit a server log warning, and skip. No audit log entry for unmatched groups — the audit log only records actual GRANT/REVOKE operations.
164+
165+
**Missing groups claim vs empty groups claim**: These are different.
166+
- `groups: []` (explicit empty) means revoke all sync-granted roles, keep manual grants. The user proceeds with whatever manual grants and default privileges they have (same as any user without synced roles).
167+
- No `groups` claim at all means skip sync entirely, preserve current state. This prevents IdP misconfiguration from stripping all roles.
168+
169+
**Circular membership**: Pre-check for cycles before building ops. Skip with warning.
170+
171+
**Reserved/system roles**: Pre-filter target roles to skip any role prefixed with `mz_` or `pg_` with a warning. This prevents IdP groups from escalating to system-level privileges.
172+
173+
**All groups map to non-existent roles**: Login proceeds. User still has manual grants.
174+
175+
**Case sensitivity**: Normalize group names to lowercase for matching against catalog role names.
176+
177+
Note: Roles prefixed with `mz_` or `pg_` are reserved system roles and must not be synced from IdP groups (denied).
178+
179+
### Observability
180+
181+
For MVP, sync activity is surfaced through:
182+
183+
- **`mz_audit_log`**: All GRANTs and REVOKEs from sync are logged via `Op::GrantRole`/`Op::RevokeRole`. The audit log entries should indicate the source as JWT group sync (e.g., by recording the grantor as `MZ_JWT_SYNC_ROLE_ID` in the event details) so admins can distinguish sync-initiated changes from manual ones.
184+
- **`mz_role_members`**: The `grantor` column distinguishes sync-managed memberships (grantor = `MZ_JWT_SYNC_ROLE_ID`) from manual grants, allowing admins to query for sync state.
185+
- **Server logs**: Warnings for skipped groups, unmatched groups, cycles, reserved system role attempts (`mz_*`, `pg_*`).
186+
- **Client NOTICEs**: Unmatched groups, reserved role attempts, and sync errors are sent as NOTICEs to the connecting client. Since users may belong to many IdP groups unrelated to Materialize, verbose notice output can be controlled by a session variable (deferred to post-MVP; MVP always sends notices).
187+
188+
A dedicated system table or `SHOW EXTERNAL_GROUPS` command is deferred.
189+
190+
## Minimal Viable Prototype
191+
192+
### Work Items
193+
194+
1. Add `MZ_JWT_SYNC_ROLE_ID` sentinel role (catalog migration)
195+
2. Add `groups()` method to `OidcClaims`
196+
3. Add `groups: Option<Vec<String>>` to `ExternalUserMetadata`
197+
4. Add system variables: `oidc_group_role_sync_enabled`, `oidc_group_claim`, `oidc_group_role_sync_strict`
198+
5. Implement sync logic in `handle_startup_inner` (including `mz_`/`pg_` prefix filtering)
199+
6. Audit log entries for sync GRANTs/REVOKEs with source attribution (automatic via existing Op path, grantor recorded in event details)
200+
7. Tests: unit tests for group extraction/normalization, reserved role filtering, strict/fail-open modes, end-to-end tests
201+
202+
## Alternatives
203+
204+
### CockroachDB (Prior Art)
205+
206+
CockroachDB (v25.4+) implements this via [JWT authorization with group claims](https://www.cockroachlabs.com/docs/stable/jwt-authorization):
207+
208+
1. **Groups extraction**: Reads `groups` claim from JWT (configurable claim name). Falls back to querying IdP's userinfo endpoint.
209+
2. **Normalization**: Groups are lowercased, deduplicated, sorted.
210+
3. **Role sync**: Each group name is matched to a database role name. User is GRANTed matching roles and REVOKEd roles that no longer match.
211+
4. **Empty groups = login rejected**: If groups resolve to empty, login fails.
212+
5. **Configuration**:
213+
- `server.jwt_authentication.group_claim`, which JWT field has groups (default: `"groups"`)
214+
- `server.jwt_authentication.authorization.enabled`, feature gate
215+
6. **Roles must pre-exist**: No auto-creation of roles from group names.
216+
217+
Reference PR: [cockroachdb/cockroach#147318](https://github.com/cockroachdb/cockroach/pull/147318)
218+
219+
The design follows CockroachDB's design with small differences: we do not reject login on empty groups (we allow it if there are manual grants), and we use the grantor field to distinguish sync-managed from manually-managed role memberships.
220+
221+
### Kubernetes Custom Resource Definitions
222+
223+
Alternatively: model group-to-role mappings as Kubernetes CRDs, letting admins declare mappings declaratively in their cluster manifests. A controller would watch these resources and reconcile role memberships in Materialize.
224+
225+
This was rejected because it would require building and maintaining a new API server (or operator) alongside environmentd. The JWT already carries the group information at connection time, so it would simply add latency and complexity without clear benefit over reading the claims directly.
226+
227+
## Open Questions
228+
229+
1. **Sync on the connection hot path**: `handle_startup_inner` runs on every connection. The sync does a `catalog_transact`, which takes a write lock on the catalog. **Mitigation**: skip the catalog transaction entirely if the user's groups haven't changed since the last sync (compare the sorted group list from the JWT against the current sync-granted memberships). This should make the common case (reconnect with same groups) a no-op read.
230+
2. **Unmatched group observability**: Unmatched groups are surfaced as client NOTICEs and server log warnings. The `mz_audit_log` only records actual GRANT/REVOKE operations (not skipped groups), since users may belong to many IdP groups unrelated to Materialize and logging each one would be noisy. A dedicated system table or session-variable-controlled verbosity for notices can be added post-MVP.
231+
3. **Edge case behaviour**: Are these the right choices? (See Security: Shadowed Permissions section for the `strict` mode trade-off.)
232+
4. **Frontegg group claims in JWT**: Can Frontegg be configured to include group membership as a claim in the JWT access token it issues? **Action**: Before implementation, someone from Cloud should verify that Frontegg JWTs include group membership for the authenticated user, and whether upstream IdP group memberships are also passed through.
233+
5. **Two authentication paths for Cloud support**: The Frontegg authenticator uses app passwords. Users authenticate with a client ID and secret key, which are exchanged with Frontegg's API for a JWT (`exchange_app_password()`). This means group sync for Cloud would need its own implementation: extract groups from the JWT returned by the app password exchange, and push updated group memberships through another (or existing channel) on each token refresh. This is a separate code path from OIDC.
234+
6. **SCIM for self-managed**: SCIM support for self-managed deployments is important and should be a near-term follow-up. The current JWT-based design is intentionally provider-agnostic (no Frontegg-specific coupling) to ensure we can layer SCIM on top without rearchitecting. Key consideration: SCIM would enable push-based provisioning (vs. pull-on-connect), complementing this design rather than replacing it.
235+
7. **Group name transformation**: Enterprise IdPs often use prefixed group names (e.g., `materialize_platform_eng`). A regex-based transform (e.g., `oidc_group_name_transform = 'materialize_(.*)'` -> `$1`) would be valuable. Deferred to post-MVP but the sync logic should be structured to allow inserting a transform step easily.

0 commit comments

Comments
 (0)