Skip to content

Enhance Atum Reader: FlowReader getting checkpoints by CheckpointProperties as filter criteria #435

@lsulak

Description

@lsulak

Background

Currently, the GET operation on Flow's checkpoints is only done by either:

  • no filter = get all checkpoints, paginated, order can be order by desc (checkpoint creation time)
  • checkpoint name filter = same as above but ignores all checkpoints not matching on the name

We recently introduced a new concept called Checkpoint Properties. We can filter on them. This can be implemented in FlowReader.scala, a new function similar to getCheckpointsOfNamePage, and then the function queryCheckpoints must be also enriched to support this, and also the DB function flows.get_flow_checkpoints must support and use it (join on the runs.checkpoint_properties table and filter by matching key & value with the input).

Feature

Checkpoints basically identify a specific run of a measurement. That can be leveraged - maybe it's also a specific run of a single ETL pipeline, maybe it's just one of many checkpoints of the overall pipeline. Our users can record metadata (job run ID for instance) into Checkpoint Properties and then retrieve only such checkpoints.

The change will require

  • a new public functions in FlowReader.scala, DB side,
  • support of this on server side, in FlowController.getFlowCheckpoints function (see FlowController.scala as the starting point),
  • eventually, a DB function queryCheckpoints must be changed also.

Business Value

Consumers using Atum Reader know exactly that checkpoints retrieved are only the ones associated with their input checkpoint properties. So the 'listing' functionality is more precise.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    🆕 To groom

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions