Skip to content

Review API model definitions and dependencies (v1/v2/v3) #203

@Zalfsten

Description

@Zalfsten

Summary

A review of the API model classes across all versions (v1, v2, v3) is needed.
Several concerns have been identified that require deliberate design decisions
before they can be resolved. This issue documents the current state and serves
as the starting point for that review.

Concerns

1. arc field typed as dict (untyped free-form JSON)

In both v2.CreateOrUpdateArcRequest and v3.CreateArcRequest the ARC
payload is typed as arc: dict. There is no Pydantic model that validates the
RO-Crate structure. The only semantic check (presence of identifier) happens
deep inside the business logic. This makes the API contract opaque to clients
and tools (OpenAPI schema shows object with no properties).

2. rdi absent from v3.ArcResponse

The business logic layer (ArcOperationResult) carries both client_id and
rdi in its return value. The v3 HTTP response (ArcResponse) exposes
client_id but not rdi. For a standalone POST /v3/arcs the client
already knows the rdi (it sent it), so omitting it may be intentional.
Whether this is the desired behaviour for the harvest-context endpoint
(POST /v3/harvests/{harvest_id}/arcs) — where the client does not supply
rdi explicitly — should be confirmed.

3. Two classes named ArcResponse

common.ArcResponse is an internal BL result model with fields id,
status, timestamp. v3.ArcResponse is the HTTP response model with
arc_id, status, metadata, events. The name collision is a potential
source of confusion when reading business logic code that receives an
ArcOperationResult containing a common.ArcResponse and hands it to the
v3.ArcResponse constructor.

4. Two status enums in one HTTP response

v3.ArcResponse contains both:

  • status: ArcStatus (CREATED / UPDATED) — operation outcome
  • metadata.status: ArcLifecycleStatus (ACTIVE / PROCESSING / …) — storage state

This is valid but must be documented clearly so API consumers understand the
distinction.

5. v1 still in active use

POST /v1/arcs returns a task_id for async polling; v2 and v3 have
progressively removed async in favour of synchronous responses. The deprecation
path and timeline should be made explicit.


Model Dependency Diagram

Common / shared types

classDiagram
    direction LR
    class ApiResponse {
        +client_id: str | None
        +message: str
    }
    class ArcResponseCommon {
        <<common>>
        +id: str
        +status: ArcStatus
        +timestamp: str
    }
    class ArcOperationResult {
        +rdi: str
        +arc: ArcResponseCommon
    }
    ApiResponse <|-- ArcOperationResult
Loading

V1 models

classDiagram
    direction LR
    class CreateOrUpdateArcsRequest {
        +rdi: str
        +arcs: list[dict]
    }
    class CreateOrUpdateArcsResponse {
        +rdi: str | None
        +arcs: list[ArcResponseCommon]
        +task_id: str | None
        +status: str | None
    }
    class ArcTaskTicket {
        +rdi: str
        +task_id: str
    }
    class GetTaskStatusResponse {
        +task_id: str
        +status: str
        +result: CreateOrUpdateArcsResponse | None
        +error: str | None
    }
    ApiResponse <|-- CreateOrUpdateArcsResponse
    ApiResponse <|-- ArcTaskTicket
    GetTaskStatusResponse --> CreateOrUpdateArcsResponse
Loading

V2 models

classDiagram
    direction LR
    class CreateOrUpdateArcRequest {
        +rdi: str
        +arc: dict
    }
    class CreateOrUpdateArcResponse {
        +task_id: str
        +status: TaskStatus
    }
    class GetTaskStatusResponseV2 {
        +status: TaskStatus
        +result: ArcOperationResult | None
    }
    ApiResponse <|-- CreateOrUpdateArcResponse
    ApiResponse <|-- GetTaskStatusResponseV2
    GetTaskStatusResponseV2 --> ArcOperationResult
Loading

V3 models

classDiagram
    direction LR
    class CreateArcRequest {
        +rdi: str
        +arc: dict
    }
    class SubmitHarvestArcRequest {
        +arc: dict
    }
    class ArcMetadata {
        +arc_hash: str
        +status: ArcLifecycleStatus
        +first_seen: str
        +last_seen: str
    }
    class ArcEventSummary {
        +timestamp: str
        +type: str
        +message: str
    }
    class ArcResponseV3 {
        <<HTTP response>>
        +arc_id: str
        +status: ArcStatus
        +metadata: ArcMetadata
        +events: list[ArcEventSummary]
    }
    ApiResponse <|-- ArcResponseV3
    ArcResponseV3 --> ArcMetadata
    ArcResponseV3 --> ArcEventSummary
Loading

Endpoint-to-Model Map

V1 (deprecated)

Endpoint Request Response
POST /v1/arcs CreateOrUpdateArcsRequest CreateOrUpdateArcsResponse (202)
GET /v1/arcs/tasks/{task_id} GetTaskStatusResponse

V2

Endpoint Request Response
POST /v2/arcs CreateOrUpdateArcRequest CreateOrUpdateArcResponse (202)
GET /v2/arcs/tasks/{task_id} GetTaskStatusResponse (contains ArcOperationResult)

V3

Endpoint Request Response Notes
POST /v3/arcs CreateArcRequest ArcResponse (200, sync) rdi in request, not in response
POST /v3/harvests/{id}/arcs SubmitHarvestArcRequest ArcResponse (200, sync) rdi resolved from harvest, not in response

Suggested Actions

  • Decide whether rdi should be included in v3.ArcResponse (especially for the harvest-upload endpoint where the client does not supply it).
  • Evaluate introducing a typed Pydantic model for the RO-Crate payload instead of arc: dict, or at minimum add an arc: dict[str, Any] annotation with an explicit note in the OpenAPI description.
  • Rename common.ArcResponse to ArcRecord or ArcResult to eliminate the name collision with v3.ArcResponse.
  • Add OpenAPI description fields to both status fields in v3.ArcResponse explaining ArcStatus vs ArcLifecycleStatus.
  • Document (or implement) the v1 deprecation timeline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apimodelsAPI model definitions and structurereviewRequires a review before action

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions