Skip to content

introducing hamilton+mlflow integration for data-transform heavy tracking to model_tracking module #7

Description

@Nikronic

Motiv

On the 20th May workshop, discussion around tracking the data where it's been generated and fed across stages in the same pipeline, was popped.

In other words, if the 'features that are being generated on-the-fly' requires tracable tracking (like a directed acyclic graph), then having a standalone representation of such pipeline, embedded in experiment tracking tools like mlflow makes it more understandable.

hamilton

As a result, one of the commonly adopted solutions is hamilton which does the 'data feature' transformations in a clean manner.

This official blogpost from the devs seems fit to elaborate the integration with mlflow use case: https://blog.dagworks.io/p/tracking-pipelines-with-mlflow-and

addition to docs

I was wondering if such tool is worthy (can resolve a common problem RSE's are dealing with), I can create a brief toy example of it, and send a PR to be added to the model_tracking or maybe data_tracking (doesn't exist) docs.

Thank you all!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions