Background
Currently, the GET operation on Flow's checkpoints is only done by either:
- no filter = get all checkpoints, paginated, order can be order by desc (checkpoint creation time)
- checkpoint name filter = same as above but ignores all checkpoints not matching on the name
We recently introduced a new concept called Checkpoint Properties. We can filter on them. This can be implemented in FlowReader.scala, a new function similar to getCheckpointsOfNamePage, and then the function queryCheckpoints must be also enriched to support this, and also the DB function flows.get_flow_checkpoints must support and use it (join on the runs.checkpoint_properties table and filter by matching key & value with the input).
Feature
Checkpoints basically identify a specific run of a measurement. That can be leveraged - maybe it's also a specific run of a single ETL pipeline, maybe it's just one of many checkpoints of the overall pipeline. Our users can record metadata (job run ID for instance) into Checkpoint Properties and then retrieve only such checkpoints.
The change will require
- a new public functions in
FlowReader.scala, DB side,
- support of this on server side, in
FlowController.getFlowCheckpoints function (see FlowController.scala as the starting point),
- eventually, a DB function
queryCheckpoints must be changed also.
Business Value
Consumers using Atum Reader know exactly that checkpoints retrieved are only the ones associated with their input checkpoint properties. So the 'listing' functionality is more precise.
Background
Currently, the GET operation on Flow's checkpoints is only done by either:
We recently introduced a new concept called Checkpoint Properties. We can filter on them. This can be implemented in
FlowReader.scala, a new function similar togetCheckpointsOfNamePage, and then the functionqueryCheckpointsmust be also enriched to support this, and also the DB functionflows.get_flow_checkpointsmust support and use it (join on theruns.checkpoint_propertiestable and filter by matching key & value with the input).Feature
Checkpoints basically identify a specific run of a measurement. That can be leveraged - maybe it's also a specific run of a single ETL pipeline, maybe it's just one of many checkpoints of the overall pipeline. Our users can record metadata (job run ID for instance) into Checkpoint Properties and then retrieve only such checkpoints.
The change will require
FlowReader.scala, DB side,FlowController.getFlowCheckpointsfunction (seeFlowController.scalaas the starting point),queryCheckpointsmust be changed also.Business Value
Consumers using Atum Reader know exactly that checkpoints retrieved are only the ones associated with their input checkpoint properties. So the 'listing' functionality is more precise.