Skip to content

Commit a5de004

Browse files
stefanonardoclaude
andcommitted
rbac: scope capi-controllers permissions to least-privilege
Replace wildcard RBAC rules with enumerated resources and verbs for the capi-controllers ServiceAccount. Validated with audit2rbac and e2e tests on an AWS cluster with MachineAPIMigration enabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 5656e20 commit a5de004

7 files changed

Lines changed: 457 additions & 35 deletions

File tree

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
---
2+
name: rbac-review
3+
description: "Use when the user wants to audit, update, or regenerate RBAC permissions for a controller or service account. Covers: adding new permissions after code changes, auditing a service account for least privilege, periodic RBAC reviews, or regenerating RBAC manifests after adding new CRD types or controllers. Trigger on any mention of \"rbac\" in the context of updating or reviewing permissions."
4+
disable-model-invocation: false
5+
---
6+
7+
Audit and regenerate RBAC permissions for a ServiceAccount. Requires an active cluster connection (`oc whoami` must succeed).
8+
9+
The target ServiceAccount is specified via `$ARGUMENTS` (e.g., `/rbac-review openshift-cluster-api:capi-controllers`). If not provided, ask the user.
10+
11+
## Prerequisites
12+
13+
Verify before starting:
14+
```bash
15+
oc whoami
16+
which audit2rbac || echo "audit2rbac not found — install with: go install github.com/liggitt/audit2rbac/cmd/audit2rbac@latest"
17+
```
18+
19+
If `oc whoami` fails, stop and ask the user to connect to a cluster first.
20+
21+
If `audit2rbac` is not found, build from source (go install fails due to replace directives):
22+
```bash
23+
cd /tmp && git clone https://github.com/liggitt/audit2rbac.git audit2rbac-build \
24+
&& cd audit2rbac-build && go build -o ~/go/bin/audit2rbac ./cmd/audit2rbac
25+
```
26+
27+
## Steps 1 & 2: Run in parallel
28+
29+
Step 1 (e2e + audit2rbac) and Step 2 (static analysis) are independent. Start e2e first (it takes ~20 minutes), then do static analysis while it runs. Merge results in Step 3.
30+
31+
### Step 1: Run e2e tests and collect audit2rbac baseline
32+
33+
The cluster must have RBAC that doesn't cause controllers to crash on 403s (so all code paths are exercised).
34+
35+
**Restart the target Deployment before running e2e** so that startup-only operations (pod self-read, initial lease create, ClusterOperator create) appear in the audit window. Without this, those operations are "Code only" in the gap report.
36+
37+
```bash
38+
oc rollout restart deployment/<name> -n <namespace>
39+
oc rollout status deployment/<name> -n <namespace> --timeout=120s
40+
make e2e 2>&1 | tee /tmp/e2e-baseline.log
41+
```
42+
43+
Collect and clean audit logs. `oc adm node-logs` prefixes each line with the node hostname, which breaks audit2rbac's JSON parser — strip it with `sed`:
44+
45+
```bash
46+
oc adm node-logs --role=master --path=kube-apiserver/audit.log \
47+
| sed 's/^[^ ]* //' > /tmp/audit-clean.log
48+
audit2rbac --filename=/tmp/audit-clean.log \
49+
--serviceaccount=<namespace>:<name> | tee /tmp/audit2rbac-output.yaml
50+
```
51+
52+
### Step 2: Static analysis of controller code
53+
54+
Find all Deployments that use the target ServiceAccount (search `manifests/` for `serviceAccountName`). From each Deployment, identify its containers and binaries, then find all controllers each binary registers. For each controller, trace resource access:
55+
56+
- `client.Get/List/Create/Update/Patch/Delete` — determines verbs. Note: controller-runtime backs `client.Get` with an informer cache, so any resource accessed via `Get` also needs `list` and `watch`
57+
- `Status().Patch/Update` — requires `/status` subresource rule
58+
- `builder.For/Watches/Owns` in `SetupWithManager()` — requires `get, list, watch`
59+
- `Recorder.Event/Eventf` — requires `events` create/patch
60+
- `ownerReference` with `blockOwnerDeletion: true` — requires `update` on the owner's `/finalizers` subresource (invisible to audit2rbac)
61+
- `ValidatingAdmissionPolicyBinding` with `spec.paramRef` — the API server requires `list` permission on the `spec.paramKind` resource type in the `spec.paramRef.namespace` when creating or updating the binding. This is invisible to both audit2rbac and naive code tracing. Check what `paramKind` each VAP uses and add `list` for those resource types.
62+
- Leader election leases and feature gate informers
63+
64+
For vendored libraries (boxcutter, controller-runtime, etc.), trust audit2rbac over static analysis for verb accuracy. These libraries may use `update` (PUT) internally even when the calling code only shows `patch` calls.
65+
66+
Map each resource to the namespace where it's accessed. Check `pkg/util/platform.go` for the full list of platform-specific infra types.
67+
68+
## Step 3: Gap report
69+
70+
Wait for **both** Step 1 and Step 2 to complete before presenting the gap report. Present it once, not incrementally.
71+
72+
Compare audit2rbac output against static analysis. Produce a table with columns: API Group, Resource, Verbs, Scope, Code evidence, Status.
73+
74+
Status values:
75+
- **Confirmed** — in both audit2rbac and code
76+
- **Code only** — in code but not audit2rbac. Must be justified (different platform, first-install path, `blockOwnerDeletion`, VAP paramRef authorization, etc.)
77+
- **Audit only** — in audit2rbac but not code. Investigate.
78+
79+
Present the gap report and wait for user approval before proceeding.
80+
81+
## Step 4: Build new RBAC
82+
83+
Find the existing RBAC manifests for the target ServiceAccount (search `manifests/` for ClusterRole/Role and ClusterRoleBinding/RoleBinding referencing the SA). Update them following the scoping principle: ClusterRole for cluster-scoped resources, namespace-scoped Roles for everything else.
84+
85+
## Step 5: Verify by running e2e tests
86+
87+
Use `oc replace` (not `oc apply`) to deploy the updated RBAC manifests — `oc apply` on CVO-managed resources does not fully replace rules because the `last-applied-configuration` annotation is missing, so old wildcard rules survive alongside new enumerated rules, giving false confidence.
88+
89+
```bash
90+
oc replace -f <roles-manifest>
91+
oc replace -f <bindings-manifest>
92+
```
93+
94+
For new Roles/RoleBindings that don't exist yet on the cluster, use `oc create` first or `oc replace --force` (which deletes and recreates).
95+
96+
Then restart the Deployments that use the SA, check controller logs for `forbidden`/`403` errors, and run `make e2e`.
97+
98+
## Step 6: Update docs/rbac.md
99+
100+
Update `docs/rbac.md` to reflect the new RBAC structure — which Roles/ClusterRoles exist, their scopes, and what they cover.

docs/rbac.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# RBAC Management
2+
3+
## Structure
4+
5+
The `capi-controllers` ServiceAccount (used by both the `capi-controllers` and `machine-api-migration` containers) has permissions split across scopes:
6+
7+
| Manifest | Kind | Scope | Purpose |
8+
|----------|------|-------|---------|
9+
| `03_rbac_roles.yaml` | ClusterRole `openshift-capi-controllers` | cluster-wide | Cluster-scoped resources only: infrastructures, clusteroperators, featuregates, clusterversions, nodes, CRDs |
10+
| `03_rbac_roles.yaml` | Role `capi-controllers` | `openshift-cluster-api` | CAPI core + infra provider resources, secrets, events, leases |
11+
| `03_rbac_roles.yaml` | Role `capi-controllers` | `openshift-machine-api` | MAPI machines, machinesets, controlplanemachinesets, secrets, events |
12+
| `03_rbac_roles.yaml` | Role `capi-controllers-kube-system` | `kube-system` | Secrets (vSphere credentials) |
13+
| `03_rbac_roles.yaml` | Role `cluster-capi-operator-pull-secret` | `openshift-config` | Pull-secret read |
14+
15+
The principle: each permission lives in the narrowest scope where it's used. Cluster-scoped Kubernetes objects (nodes, CRDs, clusteroperators) must be in the ClusterRole. Namespaced resources go into a Role in the specific namespace where they're accessed.
16+
17+
## Updating RBAC
18+
19+
Use the `/rbac-review` skill to audit and regenerate RBAC permissions. It requires a live cluster and uses audit2rbac + static code analysis to derive least-privilege rules.

0 commit comments

Comments
 (0)