Skip to content

Update API interface type CreateArcRequest #226

@Zalfsten

Description

@Zalfsten

This change will break the API

Current this type look like this:

class CreateArcRequest(BaseModel):
    rdi: Annotated[str, Field(description="Research Data Infrastructure identifier")]
    arc: Annotated[dict, Field(description="ARC definition in RO-Crate JSON format")]

It is proposed to change the arc type from dict to str.

We're currently using dict as arc type, because we cannot use ARCtrl.ARC here. pydantic BaseModels can only consist of other BaseModels or basic types. The ‘dict’ type is the closest pydantic-compatible approximation to ARC that we can achieve. By using this appraoch, pydantic can at least check, if the incoming string can be parsed into a dict.

But this choice makes it cumbersome and expensive to create an object of type CreateArcRequest: we need an ARC as dict. Typically this requires this conversion chain arc->str->dict. Then we can construct the CreateArcRequest that will immediately be converted it back to a JSON string.

arc_dict = json.loads(arc.ToROCrateJsonString())
request = CreateArcRequest(rdi="test"i, arc=arc_dict)
body = request.model_dump_json()

If arc was a str, this would remove the JSON parsing step:

arc_json = arc.ToROCrateJsonString()
request = CreateArcRequest(rdi="test"i, arc=arc_json)
body = request.model_dump_json()

Also server-side this could simplify things. Currently the ARC JSON string in fact is parsed twice: first by pydantic/FastAPI that converts it into a dict, then by ARCtrl that converts the dict into an ARC. There are several server-side approaches for this:

  1. Just change dict into str, thus skipping any automatic pydantic validations of the filed and parse the string manually into an ARC.
  2. Don't use str but even ARCtrl.ARC as type for the arc field. This requires a pydantic field_serializer:
class ArcModel(BaseModel):
    rdi: str
    arc: ARCtrl.ARC

    @field_validator("arc", mode="before")
    @classmethod
    def parse_arc(cls, v: Any):
        if isinstance(v, ARCtrl.ARC):
            return v
        if isinstance(v, str):
            return ARCtrl.ARC.FromROCrateJsonString(v)
        raise TypeError(f"Unsupported type for arc: {type(v)}")
 
    @field_serializer("arc")
    def serialize_arc(self, arc: ARCtrl.ARC):
        return json.loads(arc.to_json())

This has the downside that we always need to deal with ARCtrl.ARC objects. The development of SQL-to-ARC showed, that in practice it might also be, that we might have to deal with the JSON string right away. So this would again introduce an ARC->JSON->ARC conversion. So I object this solution.
3. It could also be possible to combine the str and ARCtrl.ARC approach in the model by introduction a Union: ARCtrl.ARC | str. This should be further investigated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions