MLflow for Darts implementation#3022
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Hey @daidahao, adding this draft PR in the meantime so you and @dennisbader can have a look at what I have currently regarding the integration. There are still some decisions I am not too thrilled about and decisions to be made about the overall direction, but I'm happy to talk more about it during the meeting. Thanks for being so active for the library, really nice to be working together :) |
Co-authored-by: Zhihao Dai <zhihao.dai@eng.ox.ac.uk>
Co-authored-by: Zhihao Dai <zhihao.dai@eng.ox.ac.uk>
Co-authored-by: Zhihao Dai <zhihao.dai@eng.ox.ac.uk>
Co-authored-by: Zhihao Dai <zhihao.dai@eng.ox.ac.uk>
Co-authored-by: Zhihao Dai <zhihao.dai@eng.ox.ac.uk>
Co-authored-by: Zhihao Dai <zhihao.dai@eng.ox.ac.uk>
Co-authored-by: Zhihao Dai <zhihao.dai@eng.ox.ac.uk>
|
@mizeller @jakubchlapek @dennisbader Greetings! I've addressed most of the comments here, except for a few discussion points. I've left a Sincere apologies for suggesting post-fitting metrics in the first place! I didn't realise the complexity involved. My suggestion is to skip post-fitting metrics for now or settle for compromises such as non-terminated active runs (at the risk of cross-logging). Other than that, I am truly proud of what we have achieved here and will hand this over to @mizeller for runups and more great work. |
|
@mizeller @jakubchlapek @dennisbader Just thought of an easy way to track Since Logging metrics then becomes a lot easier. When patching What do you think? |
|
Hi @daidahao, it's been a minute haha, Michel will be continuing on this, but I've found some time to checkout the changes so far. Thanks a lot for the review and your updates, they look good and make sense to me :). Regarding the post-fitting metrics, I agree that the issue is more complicated than originally envisioned. I think the with mlflow.start_run(run_name="whatever"):
# metrics should log here nicelycould bring nice value and be useful. Important that we document the risk of crosslogging for multiple active runs though. Another issue that I'd like to focus on first to have in the merged PR would be the support for if apply_retrain:
# fit a new instance of the model
model = model.untrained_model()
model._fit_wrapper(
series=train_series_tf,
past_covariates=past_covariates_tf,
future_covariates=future_covariates_tf,
sample_weight=sample_weight_tf,
val_series=val_series_tf,
**fit_kwargs,
)This leads to the covariates not saving to |
|
Thank you for the response here. For metric logging, I agree with your suggestion on leaving it as is at some risks of cross-logging models as long as those risks are clearly documented. In the long term, I would prefer a more robust solution using either a
For the |
|
@daidahao I planned to work on the If you're available before, feel free:) I unfortunately didnt have any capacity the last few weeks - sorry about that! |
|
Thanks everyone for all the work and the recent pushes to this PR 🚀 |
|
Off the top of my head, the status on the MLFlow PR:
series = AirPassengersDataset().load().astype(np.float32)
series_multiple = [series, series / 3.]
series_multivariate = series.stack(series / 3.)
series_multiple_multivariate = [series.stack(series / 3.), series.stack(series / 10.)]
model.backtest(
series=series,
historical_forecasts=hfc,
last_points_only=False,
metric=[darts_metrics.mape, darts_metrics.rmse, darts_metrics.ape, darts_metrics.mase, darts_metrics.mase],
metric_kwargs=[{}, {}, {}, {"m": 1}, {"m": 2}],
reduction=None,
)
Also, I believe the TODO regarding metrics/kwargs was implemented in the most recent commit by @jakubchlapek - very cool! :) |
jakubchlapek
left a comment
There was a problem hiding this comment.
Hey, looks nice @mizeller, just a few comments on the historical forecasts. The hfcs solution is nice.
| if metric is None: | ||
| try: | ||
| sig = inspect.signature(original) | ||
| bound = sig.bind(self, *args, **kwargs) | ||
| bound.apply_defaults() | ||
| metric = bound.arguments.get("metric") | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
I'd say we can remove this, I don't believe anyone will pass in metrics positionally and it adds unnecessary complexity to the code (default mape will then still be covered by else branch)
| # 2-D and higher: skip to keep MVP simple | ||
|
|
There was a problem hiding this comment.
i'd prefer to include this in the PR if possible :)
| if isinstance(metric, (list | tuple)) and isinstance(result, list): | ||
| # multiple metrics → result is list[scalar_or_array], one per metric | ||
| for name, r in zip(names, result): | ||
| _log(f"backtest_{name}", r) | ||
| elif ( | ||
| isinstance(metric, list | tuple) | ||
| and result_arr is not None | ||
| and result_arr.ndim == 1 | ||
| and len(result_arr) == len(names) | ||
| ): | ||
| # multiple metrics with scalar reduction returned as a 1-D ndarray | ||
| # (e.g. np.mean/median/percentile) — log each as a separate scalar | ||
| for name, r in zip(names, result_arr): | ||
| autologging_client.log_metrics( | ||
| run_id=run_id, metrics={f"backtest_{name}": float(r)} | ||
| ) | ||
| elif result_arr is not None and result_arr.ndim == 2: | ||
| # (N_windows, N_metrics) ndarray — multi-metric + reduction=None | ||
| for col_i, name in enumerate(names[: result_arr.shape[1]]): | ||
| for step, val in enumerate(result_arr[:, col_i]): | ||
| autologging_client.log_metrics( | ||
| run_id=run_id, | ||
| metrics={f"backtest_{name}": float(val)}, | ||
| step=step, | ||
| ) | ||
| elif isinstance(result, list): | ||
| # single metric, multiple series → result is list[scalar_or_array] | ||
| for s_i, r in enumerate(result): | ||
| _log(f"backtest_{names[0]}_{s_i}", r) | ||
| else: | ||
| _log(f"backtest_{names[0]}", result) |
There was a problem hiding this comment.
ideally we would like to also support multivariate series where we can log per component if no reduction (e.g. maybe [backtest_MAE_x, backtest_MAE_y]). I worry that this approach can then get a bit complex with all the branches. Maybe we can think about normalizing the result to a dataframe first which could simplify logging? Let me know what you think here
Checklist before merging this PR:
Addresses #2092 .
Summary
Provides a custom MLflow flavor for Darts on Darts' side. Supports autologging, logging, saving and loading of the models.
This PR focuses on the base MLflow integration, leaving serving of the models to be discussed in the future.
Included an example quickstart for the integration, however consider all of this a draft :)
Find example code in the .ipynb, however also providing a code snippet here as a quick reproducible example: