You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A review of the API model classes across all versions (v1, v2, v3) is needed.
Several concerns have been identified that require deliberate design decisions
before they can be resolved. This issue documents the current state and serves
as the starting point for that review.
Concerns
1. arc field typed as dict (untyped free-form JSON)
In both v2.CreateOrUpdateArcRequest and v3.CreateArcRequest the ARC
payload is typed as arc: dict. There is no Pydantic model that validates the
RO-Crate structure. The only semantic check (presence of identifier) happens
deep inside the business logic. This makes the API contract opaque to clients
and tools (OpenAPI schema shows object with no properties).
2. rdi absent from v3.ArcResponse
The business logic layer (ArcOperationResult) carries both client_id and rdi in its return value. The v3 HTTP response (ArcResponse) exposes client_id but notrdi. For a standalone POST /v3/arcs the client
already knows the rdi (it sent it), so omitting it may be intentional.
Whether this is the desired behaviour for the harvest-context endpoint
(POST /v3/harvests/{harvest_id}/arcs) — where the client does not supply rdi explicitly — should be confirmed.
3. Two classes named ArcResponse
common.ArcResponse is an internal BL result model with fields id, status, timestamp. v3.ArcResponse is the HTTP response model with arc_id, status, metadata, events. The name collision is a potential
source of confusion when reading business logic code that receives an ArcOperationResult containing a common.ArcResponse and hands it to the v3.ArcResponse constructor.
metadata.status: ArcLifecycleStatus (ACTIVE / PROCESSING / …) — storage state
This is valid but must be documented clearly so API consumers understand the
distinction.
5. v1 still in active use
POST /v1/arcs returns a task_id for async polling; v2 and v3 have
progressively removed async in favour of synchronous responses. The deprecation
path and timeline should be made explicit.
Model Dependency Diagram
Common / shared types
classDiagram
direction LR
class ApiResponse {
+client_id: str | None
+message: str
}
class ArcResponseCommon {
<<common>>
+id: str
+status: ArcStatus
+timestamp: str
}
class ArcOperationResult {
+rdi: str
+arc: ArcResponseCommon
}
ApiResponse <|-- ArcOperationResult
Decide whether rdi should be included in v3.ArcResponse (especially for the harvest-upload endpoint where the client does not supply it).
Evaluate introducing a typed Pydantic model for the RO-Crate payload instead of arc: dict, or at minimum add an arc: dict[str, Any] annotation with an explicit note in the OpenAPI description.
Rename common.ArcResponse to ArcRecord or ArcResult to eliminate the name collision with v3.ArcResponse.
Add OpenAPI description fields to both status fields in v3.ArcResponse explaining ArcStatus vs ArcLifecycleStatus.
Document (or implement) the v1 deprecation timeline.
Summary
A review of the API model classes across all versions (v1, v2, v3) is needed.
Several concerns have been identified that require deliberate design decisions
before they can be resolved. This issue documents the current state and serves
as the starting point for that review.
Concerns
1.
arcfield typed asdict(untyped free-form JSON)In both
v2.CreateOrUpdateArcRequestandv3.CreateArcRequestthe ARCpayload is typed as
arc: dict. There is no Pydantic model that validates theRO-Crate structure. The only semantic check (presence of
identifier) happensdeep inside the business logic. This makes the API contract opaque to clients
and tools (OpenAPI schema shows
objectwith no properties).2.
rdiabsent fromv3.ArcResponseThe business logic layer (
ArcOperationResult) carries bothclient_idandrdiin its return value. The v3 HTTP response (ArcResponse) exposesclient_idbut notrdi. For a standalonePOST /v3/arcsthe clientalready knows the
rdi(it sent it), so omitting it may be intentional.Whether this is the desired behaviour for the harvest-context endpoint
(
POST /v3/harvests/{harvest_id}/arcs) — where the client does not supplyrdiexplicitly — should be confirmed.3. Two classes named
ArcResponsecommon.ArcResponseis an internal BL result model with fieldsid,status,timestamp.v3.ArcResponseis the HTTP response model witharc_id,status,metadata,events. The name collision is a potentialsource of confusion when reading business logic code that receives an
ArcOperationResultcontaining acommon.ArcResponseand hands it to thev3.ArcResponseconstructor.4. Two status enums in one HTTP response
v3.ArcResponsecontains both:status: ArcStatus(CREATED/UPDATED) — operation outcomemetadata.status: ArcLifecycleStatus(ACTIVE/PROCESSING/ …) — storage stateThis is valid but must be documented clearly so API consumers understand the
distinction.
5. v1 still in active use
POST /v1/arcsreturns atask_idfor async polling; v2 and v3 haveprogressively removed async in favour of synchronous responses. The deprecation
path and timeline should be made explicit.
Model Dependency Diagram
Common / shared types
classDiagram direction LR class ApiResponse { +client_id: str | None +message: str } class ArcResponseCommon { <<common>> +id: str +status: ArcStatus +timestamp: str } class ArcOperationResult { +rdi: str +arc: ArcResponseCommon } ApiResponse <|-- ArcOperationResultV1 models
classDiagram direction LR class CreateOrUpdateArcsRequest { +rdi: str +arcs: list[dict] } class CreateOrUpdateArcsResponse { +rdi: str | None +arcs: list[ArcResponseCommon] +task_id: str | None +status: str | None } class ArcTaskTicket { +rdi: str +task_id: str } class GetTaskStatusResponse { +task_id: str +status: str +result: CreateOrUpdateArcsResponse | None +error: str | None } ApiResponse <|-- CreateOrUpdateArcsResponse ApiResponse <|-- ArcTaskTicket GetTaskStatusResponse --> CreateOrUpdateArcsResponseV2 models
classDiagram direction LR class CreateOrUpdateArcRequest { +rdi: str +arc: dict } class CreateOrUpdateArcResponse { +task_id: str +status: TaskStatus } class GetTaskStatusResponseV2 { +status: TaskStatus +result: ArcOperationResult | None } ApiResponse <|-- CreateOrUpdateArcResponse ApiResponse <|-- GetTaskStatusResponseV2 GetTaskStatusResponseV2 --> ArcOperationResultV3 models
classDiagram direction LR class CreateArcRequest { +rdi: str +arc: dict } class SubmitHarvestArcRequest { +arc: dict } class ArcMetadata { +arc_hash: str +status: ArcLifecycleStatus +first_seen: str +last_seen: str } class ArcEventSummary { +timestamp: str +type: str +message: str } class ArcResponseV3 { <<HTTP response>> +arc_id: str +status: ArcStatus +metadata: ArcMetadata +events: list[ArcEventSummary] } ApiResponse <|-- ArcResponseV3 ArcResponseV3 --> ArcMetadata ArcResponseV3 --> ArcEventSummaryEndpoint-to-Model Map
V1 (deprecated)
POST /v1/arcsCreateOrUpdateArcsRequestCreateOrUpdateArcsResponse(202)GET /v1/arcs/tasks/{task_id}GetTaskStatusResponseV2
POST /v2/arcsCreateOrUpdateArcRequestCreateOrUpdateArcResponse(202)GET /v2/arcs/tasks/{task_id}GetTaskStatusResponse(containsArcOperationResult)V3
POST /v3/arcsCreateArcRequestArcResponse(200, sync)rdiin request, not in responsePOST /v3/harvests/{id}/arcsSubmitHarvestArcRequestArcResponse(200, sync)rdiresolved from harvest, not in responseSuggested Actions
rdishould be included inv3.ArcResponse(especially for the harvest-upload endpoint where the client does not supply it).arc: dict, or at minimum add anarc: dict[str, Any]annotation with an explicit note in the OpenAPI description.common.ArcResponsetoArcRecordorArcResultto eliminate the name collision withv3.ArcResponse.descriptionfields to both status fields inv3.ArcResponseexplainingArcStatusvsArcLifecycleStatus.