Motiv
On the 20th May workshop, discussion around tracking the data where it's been generated and fed across stages in the same pipeline, was popped.
In other words, if the 'features that are being generated on-the-fly' requires tracable tracking (like a directed acyclic graph), then having a standalone representation of such pipeline, embedded in experiment tracking tools like mlflow makes it more understandable.
hamilton
As a result, one of the commonly adopted solutions is hamilton which does the 'data feature' transformations in a clean manner.
This official blogpost from the devs seems fit to elaborate the integration with mlflow use case: https://blog.dagworks.io/p/tracking-pipelines-with-mlflow-and
addition to docs
I was wondering if such tool is worthy (can resolve a common problem RSE's are dealing with), I can create a brief toy example of it, and send a PR to be added to the model_tracking or maybe data_tracking (doesn't exist) docs.
Thank you all!
Motiv
On the 20th May workshop, discussion around tracking the data where it's been generated and fed across stages in the same pipeline, was popped.
In other words, if the 'features that are being generated on-the-fly' requires tracable tracking (like a directed acyclic graph), then having a standalone representation of such pipeline, embedded in experiment tracking tools like
mlflowmakes it more understandable.hamiltonAs a result, one of the commonly adopted solutions is
hamiltonwhich does the 'data feature' transformations in a clean manner.This official blogpost from the devs seems fit to elaborate the integration with
mlflowuse case: https://blog.dagworks.io/p/tracking-pipelines-with-mlflow-andaddition to docs
I was wondering if such tool is worthy (can resolve a common problem RSE's are dealing with), I can create a brief toy example of it, and send a PR to be added to the
model_trackingor maybedata_tracking(doesn't exist) docs.Thank you all!