Skip to content

Use JSON string instead of dict when passing ARC across process boundary (API → Celery) #201

Description

@Zalfsten

Background

When the ARC ingestion pipeline dispatches a sync task to Celery, the ARC is currently passed as a Python dict. Celery's JSON serializer then serializes it internally, and on the worker side json.dumps must be called again before ARC.from_rocrate_json_string can be invoked — two serialization steps.

The API client already uses the correct pattern: ARC is passed as a serialized JSON string, not a dict.

Proposed Change

Pass the ARC as a JSON string at the point of Celery dispatch (API side). The worker receives the string and can call ARC.from_rocrate_json_string directly — one explicit serialization step, no intermediate json.dumps.

Affected code:

  • middleware/api/src/middleware/api/business_logic/arc_manager.py — serialize to JSON string before dispatching
  • Celery task handler (worker side) — remove any intermediate json.dumps call

Pros of JSON string over dict

  • One explicit serialization step instead of two
  • Celery treats the string as an opaque value — no risk of float-rounding or key-order changes introduced by Celery's own serializer
  • Directly compatible with ARC.from_rocrate_json_string on the worker — no intermediate step
  • Easier to log and debug (the payload is already a valid JSON document)

Related

  • middleware/api/spec/arc-manager/design.md — Key Decision 7 (currently documents dict; update once implemented)
  • spec/principles.md — "ARC objects must not cross process boundaries via pickle"

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions