Skip to content

Commit 73ae76c

Browse files
docs: add ADR-0029 standardized error responses decision (#38246)
1 parent 151d827 commit 73ae76c

1 file changed

Lines changed: 298 additions & 0 deletions

File tree

Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
Standardize Error Responses
2+
============================
3+
4+
:Status: Accepted
5+
:Date: 2026-03-31
6+
:Deciders: API Working Group
7+
:Technical Story: Open edX REST API Standards – Error response interoperability
8+
9+
Context
10+
-------
11+
12+
Open edX APIs currently return errors in multiple incompatible shapes (e.g., ``{"error": ...}``,
13+
``{"detail": ...}``, nested field errors, and even HTTP 200 responses containing ``"success": false``). This
14+
inconsistency makes it difficult for external clients and AI systems to reliably detect and map error
15+
states across services.
16+
17+
Objectives
18+
----------
19+
20+
We want error responses that:
21+
22+
* Use **correct HTTP status codes** (4xx/5xx) for failures, and avoid masking errors behind HTTP 200.
23+
* Provide a **single, predictable JSON shape** so clients can implement one parsing path across services.
24+
* Include **machine-readable identifiers** (e.g. a URI for the error class) so tools and integrations can
25+
classify failures without scraping free-form text.
26+
* Carry a **short human-readable summary** plus a **specific explanation** for this request when helpful.
27+
* Tie errors to the **request** when useful (e.g. request path or URL) for support and logging.
28+
* Represent **validation failures** in a consistent way (e.g. field/path to messages) instead of ad-hoc nesting.
29+
* Are **documented and enforced** in DRF (central exception handling + schema generation).
30+
31+
Decision
32+
--------
33+
34+
We will standardize all Open edX REST APIs to return errors using a **structured JSON error object** for
35+
non-2xx responses that meets the objectives above.
36+
37+
Implementation requirements:
38+
39+
* Use appropriate HTTP status codes (4xx/5xx). Avoid returning HTTP 200 for error conditions.
40+
* Return a consistent payload with these core fields:
41+
42+
* ``type`` (URI identifying the problem type)
43+
* ``title`` (short, developer/operator-facing summary of the error class; not intended for display to end users)
44+
* ``status`` (HTTP status code)
45+
* ``detail`` (stable, developer-facing explanation specific to this occurrence; safe for log
46+
aggregators and APM tools — see *Note on RFC 9457 deviation* below)
47+
* ``instance`` (the URI of the request that produced this error, e.g. the request path; see
48+
*Note on ``instance``* below)
49+
* ``user_message`` *(optional)* — a human-readable, translatable string intended for
50+
display in MFEs or end-user UIs. MFE clients should prefer mapping the ``type`` URI to a
51+
locally-translated string; use ``user_message`` when the server must supply context that cannot
52+
be expressed by ``type`` alone.
53+
54+
* For validation errors, include a predictable extension member ``errors``: a dict mapping each
55+
invalid field/path to a list of error message strings. This maps directly onto DRF's native
56+
``ValidationError.detail`` dict, so the central exception handler can populate it without
57+
per-view changes. Example::
58+
59+
"errors": {
60+
"course_id": ["This field is required."],
61+
"display_name": ["Ensure this field has no more than 255 characters."]
62+
}
63+
64+
* Define a small catalog of common ``type`` URIs for shared errors. Initial entries:
65+
66+
.. list-table::
67+
:header-rows: 1
68+
:widths: 50 10 40
69+
70+
* - URI
71+
- Status
72+
- When to use
73+
* - ``https://docs.openedx.org/errors/not-found``
74+
- 404
75+
- Resource does not exist
76+
* - ``https://docs.openedx.org/errors/authz``
77+
- 403
78+
- Authenticated but not authorized
79+
* - ``https://docs.openedx.org/errors/authn``
80+
- 401
81+
- Not authenticated
82+
* - ``https://docs.openedx.org/errors/validation``
83+
- 400
84+
- Request body / query-param validation failure
85+
* - ``https://docs.openedx.org/errors/rate-limited``
86+
- 429
87+
- Rate limit exceeded
88+
* - ``https://docs.openedx.org/errors/internal``
89+
- 500
90+
- Unexpected server error
91+
92+
App-specific types may extend this catalog; they must still be absolute URIs.
93+
94+
While many catalog entries map 1-to-1 with an HTTP status code, ``type`` provides
95+
sub-category granularity that HTTP status alone cannot express (e.g. ``authn`` vs
96+
``authz`` vs ``validation`` vs ``not-found`` are all 4xx but represent distinct failure
97+
classes). App-specific ``type`` extensions add even finer-grained identifiers (e.g.
98+
``https://docs.openedx.org/errors/enrollment/already-enrolled``). The ``status`` field is
99+
a convenience duplicate for clients that triage responses by status code without
100+
inspecting the body further.
101+
102+
These URIs serve as **opaque, stable identifiers** first. They *should* eventually resolve to
103+
human-readable documentation pages on ``docs.openedx.org`` describing the error class, its
104+
causes, and remediation steps — but dereference-ability is not a requirement for the initial
105+
rollout. Clients must treat ``type`` as an opaque string and never rely on HTTP-fetching it at
106+
runtime.
107+
* Error responses must respect the content type signalled by the request. The platform must not
108+
produce HTML error pages when the request used JSON (i.e. when ``Content-Type: application/json``
109+
or ``Accept: application/json`` was sent). The platform-level DRF exception handler must catch
110+
exceptions that would otherwise produce Django's default HTML error page and return a JSON body
111+
in the standardized format instead. Endpoints not using DRF's ``APIView`` must be identified and
112+
wrapped accordingly.
113+
* For **5xx / unhandled exceptions** in **production** (``DEBUG=False``), the handler must return
114+
a **generic error body** — no stack traces, no internal exception messages, and no sensitive
115+
system details must be included in the response. Only the ``https://docs.openedx.org/errors/internal``
116+
``type`` and a fixed ``"Internal Server Error"`` title are safe to return. Detailed diagnostics
117+
belong in server-side logs and APM tooling, not in API responses.
118+
119+
In **development** (``DEBUG=True``), the handler MAY include additional diagnostic information
120+
(e.g. the exception class and message) in an extension field (e.g. ``debug_detail``) to ease
121+
local debugging. Stack traces should still be written to the server log regardless of mode.
122+
* Preserve **CORS headers** on error responses. When the exception handler short-circuits the
123+
normal response cycle, ``Access-Control-*`` headers set by ``django-cors-headers`` can be
124+
dropped, causing browsers to surface a misleading CORS error rather than the actual error
125+
body. The platform-level exception handler must ensure CORS headers are not stripped from
126+
error responses.
127+
* Ensure the schema is **documented in drf-spectacular** by registering the standardized error
128+
shape as a reusable component (``#/components/schemas/ErrorResponse``), so all API endpoint
129+
docs automatically reference it for 4xx/5xx response types.
130+
131+
Note on RFC 9457 deviation
132+
~~~~~~~~~~~~~~~~~~~~~~~~~~
133+
134+
`RFC 9457 <https://www.rfc-editor.org/rfc/rfc9457>`_ (Problem Details for HTTP APIs) defines
135+
``detail`` as a "human-readable explanation" intended for the client/end-user. This ADR
136+
intentionally deviates from that definition: we use ``detail`` for a **stable, developer-facing,
137+
English-language** string that is safe to forward to APM systems and log aggregators. User-facing
138+
copy is carried in the separate ``user_message`` field instead. This separation keeps localizable,
139+
UI-bound strings out of the machine-readable layer while still providing a meaningful explanation
140+
for developers and on-call engineers.
141+
142+
Note on ``instance``
143+
~~~~~~~~~~~~~~~~~~~~
144+
145+
The ``instance`` field in this ADR is the **path of the request that produced the error** (e.g.
146+
``request.path``, yielding ``/api/courses/v1/``). A path-only value is preferred over a full
147+
absolute URL (``request.build_absolute_uri()``) because it is useful for correlation and support
148+
without embedding the server hostname or protocol, which can vary across environments. RFC 9457
149+
permits ``instance`` to be either relative or absolute and does not require it to be
150+
dereferenceable; using the request path is a valid application of the field.
151+
152+
Relevance in edx-platform
153+
-------------------------
154+
155+
Current error shapes in the codebase are inconsistent:
156+
157+
* **DeveloperErrorViewMixin** (``openedx/core/lib/api/view_utils.py``) returns
158+
``{"developer_message": "...", "error_code": "..."}`` and for validation
159+
``{"developer_message": "...", "field_errors": {field: {"developer_message": "..."}}}``.
160+
* **Instructor API** (``lms/djangoapps/instructor/views/api.py``) uses
161+
``JsonResponse({"error": msg}, 400)``.
162+
* **Registration** (``openedx/core/djangoapps/user_authn/views/register.py``) returns
163+
HTTP 200 with ``success: true/false`` and ``error_code`` for some failures.
164+
* **ORA Staff Grader** (``lms/djangoapps/ora_staff_grader/errors.py``) uses a custom
165+
``ErrorSerializer`` with an ``error`` field.
166+
* **Enrollment API** (``openedx/core/djangoapps/enrollments/``) returns
167+
``{"message": "..."}`` or ``{"message": "...", "localizedMessage": "..."}`` for errors.
168+
169+
Code example (target shape)
170+
---------------------------
171+
172+
**Example structured error response (4xx):**
173+
174+
.. code-block:: json
175+
176+
{
177+
"type": "https://docs.openedx.org/errors/validation",
178+
"title": "Validation Error",
179+
"status": 400,
180+
"detail": "The request body failed validation.",
181+
"user_message": "Some required fields are missing or invalid.",
182+
"instance": "/api/courses/v1/",
183+
"errors": {
184+
"course_id": ["This field is required."],
185+
"display_name": ["Ensure this field has no more than 255 characters."]
186+
}
187+
}
188+
189+
**Attaching a** ``user_message`` **to an exception:**
190+
191+
Because ``user_message`` is detected via ``hasattr``, it can be set on any ``APIException``
192+
instance before raising — no subclass required:
193+
194+
.. code-block:: python
195+
196+
from django.utils.translation import gettext_lazy as _
197+
from rest_framework.exceptions import APIException
198+
199+
exc = APIException("Enrollment limit reached for course-v1:edX+DemoX+Demo_Course.")
200+
exc.user_message = _("This course is currently full. Please try again later.")
201+
raise exc
202+
203+
The central exception handler's ``hasattr(exc, 'user_message')`` check picks this up
204+
automatically, requiring no per-view changes.
205+
206+
**Example DRF exception handler emitting the standard shape:**
207+
208+
.. code-block:: python
209+
210+
# Central exception handler (e.g. in openedx/core/lib/api/exceptions.py)
211+
def standardized_error_exception_handler(exc, context):
212+
from rest_framework.views import exception_handler
213+
response = exception_handler(exc, context)
214+
if response is None:
215+
# DRF returned None — unhandled exception (e.g. IntegrityError, unexpected 5xx).
216+
# Always return a generic body; never include stack traces or exception details.
217+
return Response(
218+
{
219+
"type": "https://docs.openedx.org/errors/internal",
220+
"title": "Internal Server Error",
221+
"status": 500,
222+
"detail": "An unexpected error occurred. Please try again later.",
223+
},
224+
status=500,
225+
)
226+
request = context.get("request")
227+
body = {
228+
"type": f"https://docs.openedx.org/errors/{_error_type(exc)}",
229+
"title": _error_title(exc),
230+
"status": response.status_code,
231+
"detail": _flatten_detail(response.data),
232+
}
233+
if request:
234+
body["instance"] = request.path
235+
if hasattr(exc, "user_message") and exc.user_message:
236+
body["user_message"] = exc.user_message
237+
if isinstance(exc, ValidationError) and hasattr(exc, "detail"):
238+
body["errors"] = _normalize_validation_errors(exc.detail)
239+
response.data = body
240+
response["Content-Type"] = "application/json"
241+
return response
242+
243+
Consequences
244+
------------
245+
246+
Positive
247+
~~~~~~~~
248+
249+
* Clients can implement a single error-handling path across services.
250+
* AI agents and external integrations can programmatically detect and classify error states.
251+
* Removes "hidden failures" caused by HTTP 200 + ``success: false`` patterns.
252+
253+
Negative / Trade-offs
254+
~~~~~~~~~~~~~~~~~~~~~
255+
256+
* Requires refactoring of existing endpoints and tests that currently depend on ad-hoc error shapes.
257+
* Some clients may need a migration period if they parse legacy error formats.
258+
259+
Alternatives Considered
260+
-----------------------
261+
262+
* **Keep per-app formats**: rejected due to interoperability and client complexity.
263+
* **Use DRF defaults only**: rejected because DRF defaults still vary across validation/auth exceptions
264+
unless centrally handled and documented.
265+
* **`drf-standardized-errors <https://github.com/ghazi-git/drf-standardized-errors>`_**: a well-maintained
266+
third-party library that implements RFC 9457-style responses for DRF. Considered but not adopted
267+
because: (a) it would add a new dependency to platform core, (b) we need custom behavior for CORS
268+
header preservation and the non-``APIView`` 500 path that would require overriding most of the
269+
library anyway, and (c) the contract defined here is lightweight enough to implement directly in
270+
the platform exception handler without a library.
271+
272+
Rollout Plan
273+
------------
274+
275+
Error response format changes are considered backwards-compatible: well-behaved clients should
276+
handle unexpected JSON fields gracefully (robustness principle). The default migration path is
277+
therefore **in-place** — update the exception handler and, where needed, individual views without
278+
bumping the URL version. Teams with clients that are tightly coupled to a legacy error shape MAY
279+
version their endpoint following ADR-0037 (API Versioning Strategy) and maintain both shapes
280+
during a deprecation window.
281+
282+
1. Introduce a shared DRF exception handler (platform-level) that emits the standardized error shape,
283+
including catching unhandled exceptions that would otherwise produce Django's HTML 500 page.
284+
2. Verify CORS headers (``Access-Control-*``) are preserved on all error responses; update the
285+
exception handler if ``django-cors-headers`` does not run before it.
286+
3. Update existing endpoint unit tests to assert the standardized error shape. Contract tests
287+
across services are optional but encouraged for endpoints consumed by external clients.
288+
4. Audit and fix endpoints that still return HTML errors on 500 (e.g. non-``APIView`` entry points).
289+
5. Migrate apps module-by-module; keep a short deprecation window for legacy shapes where feasible.
290+
6. Update API documentation to specify the standard error schema.
291+
292+
References
293+
----------
294+
295+
* Open edX REST API Standards: "Inconsistent Error Response Structure" and alignment with structured,
296+
interoperable error payloads across services.
297+
* `RFC 9457 – Problem Details for HTTP APIs <https://www.rfc-editor.org/rfc/rfc9457>`_
298+
* `drf-standardized-errors <https://github.com/ghazi-git/drf-standardized-errors>`_

0 commit comments

Comments
 (0)