|
| 1 | +Standardize Error Responses |
| 2 | +============================ |
| 3 | + |
| 4 | +:Status: Accepted |
| 5 | +:Date: 2026-03-31 |
| 6 | +:Deciders: API Working Group |
| 7 | +:Technical Story: Open edX REST API Standards – Error response interoperability |
| 8 | + |
| 9 | +Context |
| 10 | +------- |
| 11 | + |
| 12 | +Open edX APIs currently return errors in multiple incompatible shapes (e.g., ``{"error": ...}``, |
| 13 | +``{"detail": ...}``, nested field errors, and even HTTP 200 responses containing ``"success": false``). This |
| 14 | +inconsistency makes it difficult for external clients and AI systems to reliably detect and map error |
| 15 | +states across services. |
| 16 | + |
| 17 | +Objectives |
| 18 | +---------- |
| 19 | + |
| 20 | +We want error responses that: |
| 21 | + |
| 22 | +* Use **correct HTTP status codes** (4xx/5xx) for failures, and avoid masking errors behind HTTP 200. |
| 23 | +* Provide a **single, predictable JSON shape** so clients can implement one parsing path across services. |
| 24 | +* Include **machine-readable identifiers** (e.g. a URI for the error class) so tools and integrations can |
| 25 | + classify failures without scraping free-form text. |
| 26 | +* Carry a **short human-readable summary** plus a **specific explanation** for this request when helpful. |
| 27 | +* Tie errors to the **request** when useful (e.g. request path or URL) for support and logging. |
| 28 | +* Represent **validation failures** in a consistent way (e.g. field/path to messages) instead of ad-hoc nesting. |
| 29 | +* Are **documented and enforced** in DRF (central exception handling + schema generation). |
| 30 | + |
| 31 | +Decision |
| 32 | +-------- |
| 33 | + |
| 34 | +We will standardize all Open edX REST APIs to return errors using a **structured JSON error object** for |
| 35 | +non-2xx responses that meets the objectives above. |
| 36 | + |
| 37 | +Implementation requirements: |
| 38 | + |
| 39 | +* Use appropriate HTTP status codes (4xx/5xx). Avoid returning HTTP 200 for error conditions. |
| 40 | +* Return a consistent payload with these core fields: |
| 41 | + |
| 42 | + * ``type`` (URI identifying the problem type) |
| 43 | + * ``title`` (short, developer/operator-facing summary of the error class; not intended for display to end users) |
| 44 | + * ``status`` (HTTP status code) |
| 45 | + * ``detail`` (stable, developer-facing explanation specific to this occurrence; safe for log |
| 46 | + aggregators and APM tools — see *Note on RFC 9457 deviation* below) |
| 47 | + * ``instance`` (the URI of the request that produced this error, e.g. the request path; see |
| 48 | + *Note on ``instance``* below) |
| 49 | + * ``user_message`` *(optional)* — a human-readable, translatable string intended for |
| 50 | + display in MFEs or end-user UIs. MFE clients should prefer mapping the ``type`` URI to a |
| 51 | + locally-translated string; use ``user_message`` when the server must supply context that cannot |
| 52 | + be expressed by ``type`` alone. |
| 53 | + |
| 54 | +* For validation errors, include a predictable extension member ``errors``: a dict mapping each |
| 55 | + invalid field/path to a list of error message strings. This maps directly onto DRF's native |
| 56 | + ``ValidationError.detail`` dict, so the central exception handler can populate it without |
| 57 | + per-view changes. Example:: |
| 58 | + |
| 59 | + "errors": { |
| 60 | + "course_id": ["This field is required."], |
| 61 | + "display_name": ["Ensure this field has no more than 255 characters."] |
| 62 | + } |
| 63 | + |
| 64 | +* Define a small catalog of common ``type`` URIs for shared errors. Initial entries: |
| 65 | + |
| 66 | + .. list-table:: |
| 67 | + :header-rows: 1 |
| 68 | + :widths: 50 10 40 |
| 69 | + |
| 70 | + * - URI |
| 71 | + - Status |
| 72 | + - When to use |
| 73 | + * - ``https://docs.openedx.org/errors/not-found`` |
| 74 | + - 404 |
| 75 | + - Resource does not exist |
| 76 | + * - ``https://docs.openedx.org/errors/authz`` |
| 77 | + - 403 |
| 78 | + - Authenticated but not authorized |
| 79 | + * - ``https://docs.openedx.org/errors/authn`` |
| 80 | + - 401 |
| 81 | + - Not authenticated |
| 82 | + * - ``https://docs.openedx.org/errors/validation`` |
| 83 | + - 400 |
| 84 | + - Request body / query-param validation failure |
| 85 | + * - ``https://docs.openedx.org/errors/rate-limited`` |
| 86 | + - 429 |
| 87 | + - Rate limit exceeded |
| 88 | + * - ``https://docs.openedx.org/errors/internal`` |
| 89 | + - 500 |
| 90 | + - Unexpected server error |
| 91 | + |
| 92 | + App-specific types may extend this catalog; they must still be absolute URIs. |
| 93 | + |
| 94 | + While many catalog entries map 1-to-1 with an HTTP status code, ``type`` provides |
| 95 | + sub-category granularity that HTTP status alone cannot express (e.g. ``authn`` vs |
| 96 | + ``authz`` vs ``validation`` vs ``not-found`` are all 4xx but represent distinct failure |
| 97 | + classes). App-specific ``type`` extensions add even finer-grained identifiers (e.g. |
| 98 | + ``https://docs.openedx.org/errors/enrollment/already-enrolled``). The ``status`` field is |
| 99 | + a convenience duplicate for clients that triage responses by status code without |
| 100 | + inspecting the body further. |
| 101 | + |
| 102 | + These URIs serve as **opaque, stable identifiers** first. They *should* eventually resolve to |
| 103 | + human-readable documentation pages on ``docs.openedx.org`` describing the error class, its |
| 104 | + causes, and remediation steps — but dereference-ability is not a requirement for the initial |
| 105 | + rollout. Clients must treat ``type`` as an opaque string and never rely on HTTP-fetching it at |
| 106 | + runtime. |
| 107 | +* Error responses must respect the content type signalled by the request. The platform must not |
| 108 | + produce HTML error pages when the request used JSON (i.e. when ``Content-Type: application/json`` |
| 109 | + or ``Accept: application/json`` was sent). The platform-level DRF exception handler must catch |
| 110 | + exceptions that would otherwise produce Django's default HTML error page and return a JSON body |
| 111 | + in the standardized format instead. Endpoints not using DRF's ``APIView`` must be identified and |
| 112 | + wrapped accordingly. |
| 113 | +* For **5xx / unhandled exceptions** in **production** (``DEBUG=False``), the handler must return |
| 114 | + a **generic error body** — no stack traces, no internal exception messages, and no sensitive |
| 115 | + system details must be included in the response. Only the ``https://docs.openedx.org/errors/internal`` |
| 116 | + ``type`` and a fixed ``"Internal Server Error"`` title are safe to return. Detailed diagnostics |
| 117 | + belong in server-side logs and APM tooling, not in API responses. |
| 118 | + |
| 119 | + In **development** (``DEBUG=True``), the handler MAY include additional diagnostic information |
| 120 | + (e.g. the exception class and message) in an extension field (e.g. ``debug_detail``) to ease |
| 121 | + local debugging. Stack traces should still be written to the server log regardless of mode. |
| 122 | +* Preserve **CORS headers** on error responses. When the exception handler short-circuits the |
| 123 | + normal response cycle, ``Access-Control-*`` headers set by ``django-cors-headers`` can be |
| 124 | + dropped, causing browsers to surface a misleading CORS error rather than the actual error |
| 125 | + body. The platform-level exception handler must ensure CORS headers are not stripped from |
| 126 | + error responses. |
| 127 | +* Ensure the schema is **documented in drf-spectacular** by registering the standardized error |
| 128 | + shape as a reusable component (``#/components/schemas/ErrorResponse``), so all API endpoint |
| 129 | + docs automatically reference it for 4xx/5xx response types. |
| 130 | + |
| 131 | +Note on RFC 9457 deviation |
| 132 | +~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 133 | + |
| 134 | +`RFC 9457 <https://www.rfc-editor.org/rfc/rfc9457>`_ (Problem Details for HTTP APIs) defines |
| 135 | +``detail`` as a "human-readable explanation" intended for the client/end-user. This ADR |
| 136 | +intentionally deviates from that definition: we use ``detail`` for a **stable, developer-facing, |
| 137 | +English-language** string that is safe to forward to APM systems and log aggregators. User-facing |
| 138 | +copy is carried in the separate ``user_message`` field instead. This separation keeps localizable, |
| 139 | +UI-bound strings out of the machine-readable layer while still providing a meaningful explanation |
| 140 | +for developers and on-call engineers. |
| 141 | + |
| 142 | +Note on ``instance`` |
| 143 | +~~~~~~~~~~~~~~~~~~~~ |
| 144 | + |
| 145 | +The ``instance`` field in this ADR is the **path of the request that produced the error** (e.g. |
| 146 | +``request.path``, yielding ``/api/courses/v1/``). A path-only value is preferred over a full |
| 147 | +absolute URL (``request.build_absolute_uri()``) because it is useful for correlation and support |
| 148 | +without embedding the server hostname or protocol, which can vary across environments. RFC 9457 |
| 149 | +permits ``instance`` to be either relative or absolute and does not require it to be |
| 150 | +dereferenceable; using the request path is a valid application of the field. |
| 151 | + |
| 152 | +Relevance in edx-platform |
| 153 | +------------------------- |
| 154 | + |
| 155 | +Current error shapes in the codebase are inconsistent: |
| 156 | + |
| 157 | +* **DeveloperErrorViewMixin** (``openedx/core/lib/api/view_utils.py``) returns |
| 158 | + ``{"developer_message": "...", "error_code": "..."}`` and for validation |
| 159 | + ``{"developer_message": "...", "field_errors": {field: {"developer_message": "..."}}}``. |
| 160 | +* **Instructor API** (``lms/djangoapps/instructor/views/api.py``) uses |
| 161 | + ``JsonResponse({"error": msg}, 400)``. |
| 162 | +* **Registration** (``openedx/core/djangoapps/user_authn/views/register.py``) returns |
| 163 | + HTTP 200 with ``success: true/false`` and ``error_code`` for some failures. |
| 164 | +* **ORA Staff Grader** (``lms/djangoapps/ora_staff_grader/errors.py``) uses a custom |
| 165 | + ``ErrorSerializer`` with an ``error`` field. |
| 166 | +* **Enrollment API** (``openedx/core/djangoapps/enrollments/``) returns |
| 167 | + ``{"message": "..."}`` or ``{"message": "...", "localizedMessage": "..."}`` for errors. |
| 168 | + |
| 169 | +Code example (target shape) |
| 170 | +--------------------------- |
| 171 | + |
| 172 | +**Example structured error response (4xx):** |
| 173 | + |
| 174 | +.. code-block:: json |
| 175 | +
|
| 176 | + { |
| 177 | + "type": "https://docs.openedx.org/errors/validation", |
| 178 | + "title": "Validation Error", |
| 179 | + "status": 400, |
| 180 | + "detail": "The request body failed validation.", |
| 181 | + "user_message": "Some required fields are missing or invalid.", |
| 182 | + "instance": "/api/courses/v1/", |
| 183 | + "errors": { |
| 184 | + "course_id": ["This field is required."], |
| 185 | + "display_name": ["Ensure this field has no more than 255 characters."] |
| 186 | + } |
| 187 | + } |
| 188 | +
|
| 189 | +**Attaching a** ``user_message`` **to an exception:** |
| 190 | + |
| 191 | +Because ``user_message`` is detected via ``hasattr``, it can be set on any ``APIException`` |
| 192 | +instance before raising — no subclass required: |
| 193 | + |
| 194 | +.. code-block:: python |
| 195 | +
|
| 196 | + from django.utils.translation import gettext_lazy as _ |
| 197 | + from rest_framework.exceptions import APIException |
| 198 | +
|
| 199 | + exc = APIException("Enrollment limit reached for course-v1:edX+DemoX+Demo_Course.") |
| 200 | + exc.user_message = _("This course is currently full. Please try again later.") |
| 201 | + raise exc |
| 202 | +
|
| 203 | +The central exception handler's ``hasattr(exc, 'user_message')`` check picks this up |
| 204 | +automatically, requiring no per-view changes. |
| 205 | + |
| 206 | +**Example DRF exception handler emitting the standard shape:** |
| 207 | + |
| 208 | +.. code-block:: python |
| 209 | +
|
| 210 | + # Central exception handler (e.g. in openedx/core/lib/api/exceptions.py) |
| 211 | + def standardized_error_exception_handler(exc, context): |
| 212 | + from rest_framework.views import exception_handler |
| 213 | + response = exception_handler(exc, context) |
| 214 | + if response is None: |
| 215 | + # DRF returned None — unhandled exception (e.g. IntegrityError, unexpected 5xx). |
| 216 | + # Always return a generic body; never include stack traces or exception details. |
| 217 | + return Response( |
| 218 | + { |
| 219 | + "type": "https://docs.openedx.org/errors/internal", |
| 220 | + "title": "Internal Server Error", |
| 221 | + "status": 500, |
| 222 | + "detail": "An unexpected error occurred. Please try again later.", |
| 223 | + }, |
| 224 | + status=500, |
| 225 | + ) |
| 226 | + request = context.get("request") |
| 227 | + body = { |
| 228 | + "type": f"https://docs.openedx.org/errors/{_error_type(exc)}", |
| 229 | + "title": _error_title(exc), |
| 230 | + "status": response.status_code, |
| 231 | + "detail": _flatten_detail(response.data), |
| 232 | + } |
| 233 | + if request: |
| 234 | + body["instance"] = request.path |
| 235 | + if hasattr(exc, "user_message") and exc.user_message: |
| 236 | + body["user_message"] = exc.user_message |
| 237 | + if isinstance(exc, ValidationError) and hasattr(exc, "detail"): |
| 238 | + body["errors"] = _normalize_validation_errors(exc.detail) |
| 239 | + response.data = body |
| 240 | + response["Content-Type"] = "application/json" |
| 241 | + return response |
| 242 | +
|
| 243 | +Consequences |
| 244 | +------------ |
| 245 | + |
| 246 | +Positive |
| 247 | +~~~~~~~~ |
| 248 | + |
| 249 | +* Clients can implement a single error-handling path across services. |
| 250 | +* AI agents and external integrations can programmatically detect and classify error states. |
| 251 | +* Removes "hidden failures" caused by HTTP 200 + ``success: false`` patterns. |
| 252 | + |
| 253 | +Negative / Trade-offs |
| 254 | +~~~~~~~~~~~~~~~~~~~~~ |
| 255 | + |
| 256 | +* Requires refactoring of existing endpoints and tests that currently depend on ad-hoc error shapes. |
| 257 | +* Some clients may need a migration period if they parse legacy error formats. |
| 258 | + |
| 259 | +Alternatives Considered |
| 260 | +----------------------- |
| 261 | + |
| 262 | +* **Keep per-app formats**: rejected due to interoperability and client complexity. |
| 263 | +* **Use DRF defaults only**: rejected because DRF defaults still vary across validation/auth exceptions |
| 264 | + unless centrally handled and documented. |
| 265 | +* **`drf-standardized-errors <https://github.com/ghazi-git/drf-standardized-errors>`_**: a well-maintained |
| 266 | + third-party library that implements RFC 9457-style responses for DRF. Considered but not adopted |
| 267 | + because: (a) it would add a new dependency to platform core, (b) we need custom behavior for CORS |
| 268 | + header preservation and the non-``APIView`` 500 path that would require overriding most of the |
| 269 | + library anyway, and (c) the contract defined here is lightweight enough to implement directly in |
| 270 | + the platform exception handler without a library. |
| 271 | + |
| 272 | +Rollout Plan |
| 273 | +------------ |
| 274 | + |
| 275 | +Error response format changes are considered backwards-compatible: well-behaved clients should |
| 276 | +handle unexpected JSON fields gracefully (robustness principle). The default migration path is |
| 277 | +therefore **in-place** — update the exception handler and, where needed, individual views without |
| 278 | +bumping the URL version. Teams with clients that are tightly coupled to a legacy error shape MAY |
| 279 | +version their endpoint following ADR-0037 (API Versioning Strategy) and maintain both shapes |
| 280 | +during a deprecation window. |
| 281 | + |
| 282 | +1. Introduce a shared DRF exception handler (platform-level) that emits the standardized error shape, |
| 283 | + including catching unhandled exceptions that would otherwise produce Django's HTML 500 page. |
| 284 | +2. Verify CORS headers (``Access-Control-*``) are preserved on all error responses; update the |
| 285 | + exception handler if ``django-cors-headers`` does not run before it. |
| 286 | +3. Update existing endpoint unit tests to assert the standardized error shape. Contract tests |
| 287 | + across services are optional but encouraged for endpoints consumed by external clients. |
| 288 | +4. Audit and fix endpoints that still return HTML errors on 500 (e.g. non-``APIView`` entry points). |
| 289 | +5. Migrate apps module-by-module; keep a short deprecation window for legacy shapes where feasible. |
| 290 | +6. Update API documentation to specify the standard error schema. |
| 291 | + |
| 292 | +References |
| 293 | +---------- |
| 294 | + |
| 295 | +* Open edX REST API Standards: "Inconsistent Error Response Structure" and alignment with structured, |
| 296 | + interoperable error payloads across services. |
| 297 | +* `RFC 9457 – Problem Details for HTTP APIs <https://www.rfc-editor.org/rfc/rfc9457>`_ |
| 298 | +* `drf-standardized-errors <https://github.com/ghazi-git/drf-standardized-errors>`_ |
0 commit comments