You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/mlflow.md
+95-25Lines changed: 95 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,23 +4,24 @@ This module provides **native integration** between Feast and [MLflow](https://m
4
4
5
5
## Overview
6
6
7
-
When enabled, the integration logs to the active MLflow run during:
8
-
9
-
-**Historical feature retrieval** — `get_historical_features()` tags the run with feature refs, feature views, entity count, and retrieval duration
10
-
-**Online feature retrieval** — `get_online_features()` tags the run with the same metadata
11
-
-**Entity DataFrame archival** — optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility
12
-
13
-
The integration also provides utilities for:
14
-
15
-
-**Model → Feature Service resolution** — map any MLflow model URI back to its Feast feature service
16
-
-**Training reproducibility** — reconstruct the exact entity DataFrame from a past MLflow run
7
+
When enabled, the integration provides:
8
+
9
+
-**Historical feature retrieval** -- `get_historical_features()` tags the run with feature refs, feature views, entity count, and retrieval duration
10
+
-**Online feature retrieval** -- `get_online_features()` tags the run with the same metadata
11
+
-**Entity DataFrame archival** -- optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility
12
+
-**Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
13
+
-**Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
14
+
-**Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
15
+
-**Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run
16
+
-**Training-to-prediction linkage** -- `FeastMlflowClient.load_model()` links prediction runs back to their training runs
17
+
-**Feast MLflow Client** -- a thin wrapper that eliminates direct `import mlflow` in user code
17
18
18
19
## Installation
19
20
20
21
MLflow is an optional dependency. Install it with:
21
22
22
23
```bash
23
-
pip install mlflow
24
+
pip install feast[mlflow]
24
25
```
25
26
26
27
## Configuration
@@ -41,23 +42,29 @@ mlflow:
41
42
auto_log: true
42
43
auto_log_entity_df: true
43
44
entity_df_max_rows: 100000
45
+
log_execution_context: true
46
+
log_operations: false
47
+
ops_experiment_suffix: "-feast-ops"
44
48
```
45
49
46
50
### Configuration options
47
51
48
52
| Option | Type | Default | Description |
49
53
|--------|------|---------|-------------|
50
54
| `enabled` | bool | `false` | Enable or disable the MLflow integration |
51
-
| `tracking_uri` | string | *(none)* | MLflow tracking server URI. When not set, the `MLFLOW_TRACKING_URI` environment variable is used. If neither is set, MLflow falls back to its own default (`./mlruns`). |
55
+
| `tracking_uri` | string | *(none)* | MLflow tracking server URI. Falls back to `MLFLOW_TRACKING_URI` env var, then MLflow default (`./mlruns`). |
52
56
| `auto_log` | bool | `true` | Automatically log feature metadata on every retrieval |
53
57
| `auto_log_entity_df` | bool | `false` | Save the entity DataFrame as an MLflow artifact (`entity_df.parquet`) |
54
-
| `entity_df_max_rows` | int | `100000` | Maximum entity DataFrame rows to save as an artifact. DataFrames exceeding this limit are skipped to avoid OOM and slow uploads. |
58
+
| `entity_df_max_rows` | int | `100000` | Maximum entity DataFrame rows to save as an artifact |
59
+
| `log_execution_context` | bool | `true` | Tag runs with execution context (pipeline, workbench, feature_server, standalone) |
60
+
| `log_operations` | bool | `false` | Log `feast apply` and `feast materialize` to a separate MLflow experiment |
61
+
| `ops_experiment_suffix` | string | `"-feast-ops"` | Suffix for the operations experiment name |
55
62
56
63
## What gets logged
57
64
58
-
When `auto_log: true`, each `get_historical_features` or `get_online_features` call records the following on the active MLflow run:
65
+
### Tags on retrieval runs
59
66
60
-
### Tags
67
+
When `auto_log: true`, each `get_historical_features` or `get_online_features` call records:
61
68
62
69
| Tag | Example | Description |
63
70
|-----|---------|-------------|
@@ -69,6 +76,18 @@ When `auto_log: true`, each `get_historical_features` or `get_online_features` c
69
76
| `feast.entity_count` | `200` | Number of entities in the request |
70
77
| `feast.feature_count` | `5` | Number of features retrieved |
| `feast.namespace` | Pipeline or workbench | `user-project` |
90
+
72
91
### Metrics
73
92
74
93
| Metric | Example | Description |
@@ -77,7 +96,17 @@ When `auto_log: true`, each `get_historical_features` or `get_online_features` c
77
96
78
97
### Artifacts
79
98
80
-
When `auto_log_entity_df: true`, the entity DataFrame is saved as `entity_df.parquet` in the run's artifacts (if the row count is within `entity_df_max_rows`), enabling exact reproduction of training data.
99
+
When `auto_log_entity_df: true`, the entity DataFrame is saved as `entity_df.parquet`.
100
+
101
+
### Operation logs (when `log_operations: true`)
102
+
103
+
`feast apply` and `feast materialize` create runs in the `{project}-feast-ops` experiment:
0 commit comments