This document provides operational guidance for running TokenSmith-based authorization in OpenCHAMI services.
It is non-normative. The normative behavior/contract is:
Policy loading mechanics are described in:
- A baseline embedded Casbin model + policy ships in TokenSmith.
- If you do not configure a policy directory, the baseline policy is the effective policy.
- Policy is loaded at process start; no hot reload in v1.
Use filesystem policy fragments when you need to:
- extend the baseline RBAC (e.g., add additional objects/actions),
- add temporary allowances for staged rollout or incident response,
- override or deny permissions by removing or avoiding grants.
Mount a directory into each service (e.g., via Kubernetes ConfigMap/Secret/volume), and point the service at it via:
TOKENSMITH_POLICY_DIR(preferred)AUTHZ_POLICY_DIR
TokenSmith loads *.csv fragments in lexical order by filename.
Recommended convention:
00-baseline.csv(do not use; baseline is embedded)10-org.csv20-site.csv90-emergency.csv
apiVersion: v1
kind: ConfigMap
metadata:
name: openchami-authz-policy
labels:
app.kubernetes.io/name: openchami-authz-policy
data:
10-site.csv: |
# Example: grant viewer read of a custom object
p, role:viewer, custom:status, read
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metadata-service
spec:
template:
spec:
containers:
- name: metadata-service
env:
- name: TOKENSMITH_POLICY_DIR
value: /etc/tokensmith/authz
volumeMounts:
- name: authz-policy
mountPath: /etc/tokensmith/authz
readOnly: true
volumes:
- name: authz-policy
configMap:
name: openchami-authz-policyRecommended staged rollout (per service):
-
off
- Authorization disabled.
- Use this while wiring middleware and validating authn/principal extraction.
-
shadow
- Authorization evaluated but not enforced.
- Monitor for shadow denials and fix principals/policy gaps.
- Keep this enabled long enough to cover expected operational use cases.
-
enforce
- Denied/indeterminate/error decisions block with HTTP 403.
- Ensure you have an incident rollback plan (switch back to shadow/off).
TokenSmith computes a deterministic policy hash referred to as policy_version.
You should validate policy_version when:
- rolling out new policy fragments,
- troubleshooting unexpected access decisions,
- verifying that all replicas are running the same policy.
Where to find policy_version:
- Service startup logs during policy load.
- AuthZ decision logs/metrics emitted by the middleware.
- The 403 response body returned by the AuthZ middleware in enforce mode.
If different pods show different policy_version values, verify that the same fragments are mounted everywhere and that pods were restarted.
For services using pkg/authz/chi, expose a diagnostics endpoint so operators can quickly confirm mode and effective policy source/version.
Suggested route:
GET /authz/diagnostics
Suggested wiring:
import (
"net/http"
authzchi "github.com/openchami/tokensmith/pkg/authz/chi"
)
func registerDiagnostics(mux *http.ServeMux, mode string, policyVersion string, source authzchi.PolicySource) {
mux.Handle("/authz/diagnostics", authzchi.DiagnosticsHandler(mode, policyVersion, source))
}At startup, log mode + policy details once:
authzchi.LogStartupDiagnostics(mode, authorizer.PolicyVersion(), authzchi.PolicySourceBaselineFragments)Expected response shape:
{
"mode": "enforce",
"policy_version": "<sha256>",
"policy_source": "baseline+fragments"
}Use this endpoint during rollouts to detect mismatched pods quickly.
Use this sequence for every policy or mode change:
- Deploy with mode
shadow. - Verify each pod returns the same
policy_versionfrom diagnostics. - Confirm shadow denials align with expected unmapped/denied paths.
- Fix principal mapping or policy fragments until shadow denials are understood.
- Switch mode to
enforce. - Re-check diagnostics and startup logs after rollout.
If any pod reports a different policy_version, stop rollout and verify:
- mounted policy directory content,
- env var (
TOKENSMITH_POLICY_DIRorAUTHZ_POLICY_DIR), - pod restart completion.
For general TokenSmith troubleshooting (token exchange, OIDC, local minting), see:
This section covers authorization-specific issues.
Most common causes:
- Service not restarted (no hot reload).
- Fragment not mounted at the expected path.
- Wrong env var set (
TOKENSMITH_POLICY_DIRvsAUTHZ_POLICY_DIR). - Filename does not match
*.csvor has unexpected ordering.
Checklist:
- Confirm the principal identity:
- user principals need
suband roles/groups. - service principals should map to role
service.
- user principals need
- Confirm the object/action mapping used by the service matches the policy.
- Compare
policy_versionin the denial body to what you expect.
This is expected: shadow mode does not block.
Use shadow denials to:
- identify missing role/group mappings,
- identify missing policy grants for legitimate workflows,
- estimate impact before switching to enforce.
Checklist:
- Confirm runtime config sets the intended mode (
off,shadow,enforce). - Confirm service startup logs contain the same mode and
policy_versionas diagnostics. - Confirm policy source matches deployment intent:
baseline-onlyif no policy dir is mounted,baseline+fragmentswhen policy fragments are configured.
The baseline policy already includes core RBAC. These examples show typical additional snippets you might deploy.
# Admin is typically already granted full access by baseline.
p, role:admin, *, *# Example: allow operator to update boot parameters
p, role:operator, boot:parameters, update
# Example: do NOT grant delete
# (absence of a delete rule results in deny)p, role:viewer, metadata:nodes, read
p, role:viewer, boot:configs, readHow service identities are expressed depends on your AuthN/principal extraction.
A common pattern is to map a service client id (or azp) into the service role.
# Map a specific service principal into role:service
# (exact subject string depends on your service principal mapping)
# Example subject style used in the contract examples: "service:boot-service".
# If you use grouping policies, you may also use Casbin's g() relationships.
#
# Example using g() to link a service principal to the role:
# g, service:boot-service, role:service
#
# Then grant the role permissions:
p, role:service, metadata:nodes, readIf you're still stuck after following the above, use this checklist:
-
Verify token validity
# Decode token to see claims echo "<JWT>" | cut -d. -f2 | base64 -d | jq . # Confirm: sub, aud, auth_level, auth_methods, auth_events are present
-
Verify principal extraction
# Enable debug logging in your service export LOG_LEVEL=debug # Look for logs showing the extracted principal ID, type, roles
-
Verify policy parsing
# Check startup logs for policy_version and any parsing errors -
Test policy matching directly
# Use Casbin's own tools to test matchers (if you have access to the model/policy files) # Example: Does the policy matcher correctly match your path?
-
Confirm mode is active
# Call diagnostics endpoint or check env vars echo $AUTHZ_POLICY_MODE # should be "enforce" or "shadow", not "off"
-
Check for path normalization issues
# Verify the router receives the same path as the policy matcher evaluates # Log both the raw HTTP path and the normalized object in your handler
If none of these help, file an issue with:
- Your policy model (Casbin
*.conffile) - Your policy CSV snippets
- The principal identity (anonymized)
- The request path and HTTP method
- The
policy_versionfrom the denial response
See also: docs/troubleshooting.md