Skip to content

Handle network failures in portal and remote channel endpoints#14813

Open
rtibbles wants to merge 1 commit into
learningequality:developfrom
rtibbles:handle-network-failures
Open

Handle network failures in portal and remote channel endpoints#14813
rtibbles wants to merge 1 commit into
learningequality:developfrom
rtibbles:handle-network-failures

Conversation

@rtibbles
Copy link
Copy Markdown
Member

@rtibbles rtibbles commented Jun 6, 2026

Summary

  • The views that proxy to KDP and Studio (portal register/validate_token, remote channel list/retrieve) crashed with 500s on network-level failures: NetworkLocationResponseFailure.response can be None, timeouts and connection failures were uncaught, and a non-404 Studio error fell off the end of an except block (200 with null body on list, spurious 404 on retrieve)
  • Network-level failures now consistently return 503 {"status": "offline"}
  • The portal endpoints no longer reflect upstream bodies wholesale: error responses are reduced to the error constants the frontend matches on, and validate_token's success response to the project name the frontend uses; non-JSON bodies (e.g. CDN error pages) are treated as network-level failures

References

Reviewer guidance

  • To exercise the real paths: the facility register flow (invalid token, valid token, already registered) against live KDP, and a Studio channel import with the network disconnected — the latter should now report offline rather than erroring
  • Behaviour change to weigh: a non-404 Studio error in RemoteChannelViewSet previously gave 200 with a null body (list) or a spurious 404 (retrieve); both now give 503 offline — worth checking the device plugin's channel import flow handles that
  • Any portal error other than the two whitelisted constants now reaches the client as an empty list at the upstream status code

AI usage

Written with Claude Code, test-first. I reviewed the diff and directed two revisions — first dropping reflection of raw upstream bytes, then reducing the reflection to the explicit whitelist of error constants — and verified the affected suites pass.

Fixes a family of crashes in the views that proxy to the Kolibri Data
Portal and Studio, surfaced while removing real network access from the
test suite:

- NetworkLocationResponseFailure.response is None when the request
  failed without an HTTP response, so accessing response.status_code or
  response.json() on it raised AttributeError (500) in the portal
  register and validate_token actions, the remote channel lookup, and
  the v2 community library lookup.
- A non-404 upstream error in the remote channel lookup fell off the
  end of the except block, so list returned 200 with a null body and
  retrieve returned a spurious 404.
- NetworkLocationResponseTimeout was uncaught in the remote channel and
  validate_token actions, and connection failures were uncaught in
  register, all returning 500.
- A non-JSON error body (e.g. a Cloudflare HTML error page) crashed
  register, and validate_token reflected the raw bytes back to the
  client.

Network-level failures now consistently return 503 with an "offline"
status, and 404 from the content server still maps to Http404. The
portal endpoints no longer reflect upstream response bodies wholesale:
error bodies are reduced to the recognized error constants the frontend
matches on (with ALREADY_REGISTERED_FOR_COMMUNITY now mirrored in
error_constants.py), the validate_token success response is reduced to
the project name the frontend uses, and a missing or non-JSON body is
treated as another network-level failure: 503 offline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added DEV: backend Python, databases, networking, filesystem... SIZE: medium labels Jun 6, 2026
@rtibbles rtibbles marked this pull request as ready for review June 6, 2026 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DEV: backend Python, databases, networking, filesystem... SIZE: medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant