|
| 1 | +0012: Auditability for Authorization Changes |
| 2 | +############################################ |
| 3 | + |
| 4 | +Status |
| 5 | +****** |
| 6 | + |
| 7 | +**Draft** |
| 8 | + |
| 9 | +Context |
| 10 | +******* |
| 11 | + |
| 12 | +The existing architecture (see `ADR 0005`_) introduced ``ExtendedCasbinRule``, which adds |
| 13 | +``created_at``, ``updated_at``, and a ``metadata`` JSON field to the ``CasbinRule`` table. |
| 14 | +This is not an audit trail: there is no actor, no operation type, and no mechanism for |
| 15 | +downstream consumers to react to changes. |
| 16 | + |
| 17 | +Operators and developers need answers the current system cannot provide: |
| 18 | + |
| 19 | +- Who assigned this role, and when? |
| 20 | +- Who removed a user's access, and was it intentional? |
| 21 | +- Why was a permission check denied? |
| 22 | + |
| 23 | +A spike (OEPM-Spike: RBAC AuthZ Auditability) examined how peer systems approach this. |
| 24 | +Auditability decomposes into three dimensions: |
| 25 | + |
| 26 | +1. **Attribution**: who changed access? (role assignments, removals) |
| 27 | +2. **Explainability**: why was access granted or denied? (policy evaluation at check time) |
| 28 | +3. **Usage**: who used access? (resource access events, business operations) |
| 29 | + |
| 30 | +`SpiceDB`_ and `OpenFGA`_ track the full authorization graph as a versioned changelog, |
| 31 | +enabling historical reconstruction. Keycloak uses event listeners on administrative actions. |
| 32 | +openedx-authz sits between these: a mutable policy store with no built-in audit layer. |
| 33 | +(See `OEPM-Spike\: RBAC AuthZ Auditability`_ for the peer system analysis.) |
| 34 | + |
| 35 | +The pycasbin ecosystem has no audit plugin. Two transitive dependencies cover what is needed: |
| 36 | +``django-crum`` (via ``edx-django-utils``) for actor capture, and ``django-simple-history`` |
| 37 | +(via ``edx-organizations``) for point-in-time state reconstruction. |
| 38 | + |
| 39 | +Decision |
| 40 | +******** |
| 41 | + |
| 42 | +Three independent mechanisms, each answering a different question: |
| 43 | + |
| 44 | +- ``OpenedxPublicSignal``: something happened, react now |
| 45 | +- ``RoleAssignmentAudit``: what happened, in what order, performed by whom |
| 46 | +- ``django-simple-history`` on ``ExtendedCasbinRule``: what was the full state at time T |
| 47 | + (future work) |
| 48 | + |
| 49 | +See the `OEPM-Spike\: RBAC AuthZ Auditability`_ for the architecture diagram of the three |
| 50 | +flows. |
| 51 | + |
| 52 | +#. Attribution: Role Lifecycle Events and Audit Table |
| 53 | +===================================================== |
| 54 | + |
| 55 | +Emit an ``OpenedxPublicSignal`` from ``openedx_authz.api.roles`` after every successful role |
| 56 | +assignment or removal, via ``transaction.on_commit``. A synchronous Django signal receiver |
| 57 | +writes the event to ``RoleAssignmentAudit`` in the same process. |
| 58 | + |
| 59 | +The handler is enabled by default. Operators with Aspects or a SIEM can disable it via a |
| 60 | +Django setting to avoid the redundant write. If the handler fails, the Casbin write and the |
| 61 | +event are unaffected. |
| 62 | + |
| 63 | +Event payload |
| 64 | +------------- |
| 65 | + |
| 66 | +.. code:: python |
| 67 | +
|
| 68 | + { |
| 69 | + "operation": "created" | "deleted", |
| 70 | + "subject": "<namespaced subject key, e.g. user^alice>", |
| 71 | + "role": "<namespaced role key, e.g. role^instructor>", |
| 72 | + "scope": "<namespaced scope key, e.g. course-v1^course-v1:Org+Course+Run>", |
| 73 | + "actor_id": <database ID of the caller (int), or None for system actor>, |
| 74 | + } |
| 75 | +
|
| 76 | +The actor is resolved from ``django_crum.get_current_user()`` at API call time. No callers |
| 77 | +need to pass ``actor_id=`` explicitly. |
| 78 | +
|
| 79 | +Audit table |
| 80 | +----------- |
| 81 | +
|
| 82 | +``RoleAssignmentAudit`` mirrors the event payload. Registered in Django admin, filterable by |
| 83 | +user, role, scope, actor_id, and timestamp. |
| 84 | +
|
| 85 | +Subject, role, and scope are stored as plain namespaced key strings (e.g. ``user^alice``, |
| 86 | +``role^instructor``, ``lib^lib:Org1:lib1``). There are no FK references to live ``Subject``, |
| 87 | +``Scope``, or Casbin tables. Audit records survive the deletion of the underlying objects by |
| 88 | +design: the value of an audit log depends on its unconditional durability. |
| 89 | +
|
| 90 | +Because there are no FK references, the namespace prefix embedded in each string is the only |
| 91 | +available signal for categorizing records by type. Admin filters (e.g. "content library", |
| 92 | +"course") rely on ``scope__startswith`` lookups against that prefix rather than relational |
| 93 | +joins. |
| 94 | +
|
| 95 | +Developer extensibility |
| 96 | +----------------------- |
| 97 | +
|
| 98 | +Plugin authors register handlers on the ``OpenedxPublicSignal`` to react to role lifecycle |
| 99 | +events (notifications, cache updates, analytics). Developers without an event bus can consume |
| 100 | +the underlying Django signal directly. If an event bus is configured, events are forwarded to |
| 101 | +Aspects or external systems automatically. |
| 102 | +
|
| 103 | +#. Explainability: Real-Time Decision Context |
| 104 | +============================================= |
| 105 | +
|
| 106 | +Expose ``enforce_ex()`` through the public Python API. It returns ``(result, explain_rule)``: |
| 107 | +the boolean decision and the matched policy rule. Callers get the exact rule that allowed or |
| 108 | +denied the request. |
| 109 | +
|
| 110 | +Enforcement events are opt-in via ``AUTHZ_ENFORCEMENT_EVENTS_ENABLED``. When enabled, each |
| 111 | +check fires an ``OpenedxPublicSignal`` forwarded to plugin consumers or an event bus. No audit |
| 112 | +table is written: the volume makes per-check storage impractical. |
| 113 | +
|
| 114 | +Historical explainability ("why did this user have access last Tuesday?") is deferred. Two |
| 115 | +options are available, both requiring a breaking change to ``is_user_allowed`` to accept |
| 116 | +``as_of``: |
| 117 | +
|
| 118 | +- **Option A (event replay):** Replay ``ASSIGN``/``REMOVE`` events from ``RoleAssignmentAudit`` |
| 119 | + up to T. No extra infrastructure; the data is already there once attribution is implemented. |
| 120 | + The `Auth0 FGA Logging API`_ uses this same pattern: their logging API is an event store |
| 121 | + that you replay to answer historical questions. |
| 122 | +- **Option B (snapshots):** Add ``HistoricalRecords()`` to ``ExtendedCasbinRule`` and use |
| 123 | + ``as_of(T)`` for the full rule state, including policy definitions. History collection must |
| 124 | + start before the target timestamp. |
| 125 | +
|
| 126 | +``authz.policy`` is loaded into the DB and covered by Option B. ``model.conf`` is not |
| 127 | +persisted. A ``model_hash`` field on ``ExtendedCasbinRule`` would let historical queries |
| 128 | +detect whether the model changed. |
| 129 | +
|
| 130 | +Consequences |
| 131 | +************ |
| 132 | +
|
| 133 | +#. **Operators get a filterable role assignment history in Django admin.** No external |
| 134 | + tooling required. |
| 135 | +
|
| 136 | +#. **Developers get a stable** ``OpenedxPublicSignal`` **extension point.** First formally |
| 137 | + defined event in openedx-authz. Callers of ``openedx_authz.api.roles`` need no signature |
| 138 | + changes. |
| 139 | +
|
| 140 | +#. **Events are best-effort.** If the audit write fails, the Casbin policy is still durable. |
| 141 | + Consumers requiring guaranteed delivery must implement their own retry logic. |
| 142 | +
|
| 143 | +#. **``actor_id`` is nullable.** Non-request contexts (management commands, background tasks) |
| 144 | + record ``None``, logged as a system operation. ``actor_id`` is stored as a plain integer |
| 145 | + (the database ID of the caller) rather than a FK to ``User``. This avoids a dependency |
| 146 | + on the ``User`` table and keeps audit records fully independent from live data. Attribution |
| 147 | + is preserved unconditionally: deleting or retiring a user does not affect existing records. |
| 148 | +
|
| 149 | +#. **Audit records are independent from live authorization state.** Deleting a subject, |
| 150 | + scope, or role does not remove its audit history. Records may reference identifiers that |
| 151 | + no longer exist. |
| 152 | +
|
| 153 | +#. **``RoleAssignmentAudit`` introduces a new migration.** No existing table is modified. |
| 154 | +
|
| 155 | +#. **The** ``OpenedxPublicSignal`` **schema is a public API surface.** Field additions are |
| 156 | + backward-compatible; removals and renames are breaking changes. |
| 157 | +
|
| 158 | +#. **``RoleAssignmentAudit`` is not tamper-proof.** Compliance-grade immutability is a |
| 159 | + later-phase concern. |
| 160 | +
|
| 161 | +#. **No new dependencies introduced.** ``django-crum`` and ``django-simple-history`` are |
| 162 | + already transitive dependencies. |
| 163 | +
|
| 164 | +#. **Usage auditing belongs at the application layer** (Open edX tracking events, Aspects), |
| 165 | + not in the authorization library. |
| 166 | +
|
| 167 | +#. **Developers can retrieve the matched policy rule at check time** for "why was this |
| 168 | + denied?" debugging. The explanation is point-in-time only; historical explainability is |
| 169 | + deferred. |
| 170 | +
|
| 171 | +#. **Enforcement events are opt-in by design.** Enabling them without an external consumer |
| 172 | + produces events that are emitted and discarded. |
| 173 | +
|
| 174 | +Alternatives Considered |
| 175 | +*********************** |
| 176 | +
|
| 177 | +``django-simple-history`` on ``ExtendedCasbinRule`` as the attribution audit trail |
| 178 | +=================================================================================== |
| 179 | +
|
| 180 | +Rejected for three reasons: |
| 181 | +
|
| 182 | +- ``save_policy`` (`casbin-django-orm-adapter adapter.py`_) uses ``QuerySet.delete()`` and |
| 183 | + ``bulk_create``, both of which bypass model signals. History snapshots reflect when the |
| 184 | + table was written, not when a role was assigned. |
| 185 | +- ``ExtendedCasbinRule`` fields (``ptype``, ``v0``--``v5``) are semi-opaque and require an |
| 186 | + interpretation layer. ``RoleAssignmentAudit`` translates at write time. |
| 187 | +
|
| 188 | +``django-simple-history`` remains the right tool for Option B (point-in-time state |
| 189 | +reconstruction), where it is a snapshot mechanism, not an operation log. |
| 190 | +
|
| 191 | +References |
| 192 | +********** |
| 193 | +
|
| 194 | +- `ADR 0002`_ |
| 195 | +- `ADR 0004`_ |
| 196 | +- `ADR 0005`_ |
| 197 | +- `Auth0 FGA Logging API`_ |
| 198 | +- `openedx-events documentation`_ |
| 199 | +- `django-simple-history documentation`_ |
| 200 | +- `django-crum documentation`_ |
| 201 | +- `OEPM-Spike: RBAC AuthZ Auditability`_ |
| 202 | +
|
| 203 | +.. _ADR 0002: https://github.com/openedx/openedx-authz/blob/main/docs/decisions/0002-authorization-model-foundation.rst |
| 204 | +.. _ADR 0004: https://github.com/openedx/openedx-authz/blob/main/docs/decisions/0004-technology-selection.rst |
| 205 | +.. _ADR 0005: https://github.com/openedx/openedx-authz/blob/main/docs/decisions/0005-architecture-and-data-modeling.rst |
| 206 | +.. _Auth0 FGA Logging API: https://auth0.com/blog/auth0-fga-logging-api-a-complete-audit-trail-for-authorization/ |
| 207 | +.. _SpiceDB: https://github.com/authzed/spicedb |
| 208 | +.. _OpenFGA: https://openfga.dev/ |
| 209 | +.. _openedx-events documentation: https://docs.openedx.org/projects/openedx-events/en/latest/ |
| 210 | +.. _django-simple-history documentation: https://django-simple-history.readthedocs.io/ |
| 211 | +.. _django-crum documentation: https://pypi.org/project/django-crum/ |
| 212 | +.. _casbin-django-orm-adapter adapter.py: https://github.com/officialpycasbin/django-orm-adapter/blob/main/casbin_adapter/adapter.py |
| 213 | +.. _OEPM-Spike\: RBAC AuthZ Auditability: https://openedx.atlassian.net/wiki/spaces/OEPM/pages/6045859842/Spike+-+RBAC+AuthZ+-+Auditability |
0 commit comments