feast-dev
diff --git a/‎docs/reference/mlflow.md‎
Lines changed: 95 additions & 25 deletions b/‎docs/reference/mlflow.md‎
Lines changed: 95 additions & 25 deletions
diff --git a/‎sdk/python/feast/feature_store.py‎
Lines changed: 88 additions & 0 deletions b/‎sdk/python/feast/feature_store.py‎
Lines changed: 88 additions & 0 deletions
diff --git a/‎sdk/python/feast/mlflow_integration/__init__.py‎
Lines changed: 6 additions & 0 deletions b/‎sdk/python/feast/mlflow_integration/__init__.py‎
Lines changed: 6 additions & 0 deletions
@@ -4,23 +4,24 @@ This module provides **native integration** between Feast and [MLflow](https://m
 
 ## Overview
 
-When enabled, the integration logs to the active MLflow run during:
-
-- **Historical feature retrieval** — `get_historical_features()` tags the run with feature refs, feature views, entity count, and retrieval duration
-- **Online feature retrieval** — `get_online_features()` tags the run with the same metadata
-- **Entity DataFrame archival** — optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility
-
-The integration also provides utilities for:
-
-- **Model → Feature Service resolution** — map any MLflow model URI back to its Feast feature service
-- **Training reproducibility** — reconstruct the exact entity DataFrame from a past MLflow run
+When enabled, the integration provides:
+
+- **Historical feature retrieval** -- `get_historical_features()` tags the run with feature refs, feature views, entity count, and retrieval duration
+- **Online feature retrieval** -- `get_online_features()` tags the run with the same metadata
+- **Entity DataFrame archival** -- optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility
+- **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
+- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
+- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
+- **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run
+- **Training-to-prediction linkage** -- `FeastMlflowClient.load_model()` links prediction runs back to their training runs
+- **Feast MLflow Client** -- a thin wrapper that eliminates direct `import mlflow` in user code
 
 ## Installation
 
 MLflow is an optional dependency. Install it with:
 
 ```bash
-pip install mlflow
+pip install feast[mlflow]
 ```
 
 ## Configuration
@@ -41,23 +42,29 @@ mlflow:
   auto_log: true
   auto_log_entity_df: true
   entity_df_max_rows: 100000
+  log_execution_context: true
+  log_operations: false
+  ops_experiment_suffix: "-feast-ops"
 ```
 
 ### Configuration options
 
 | Option | Type | Default | Description |
 |--------|------|---------|-------------|
 | `enabled` | bool | `false` | Enable or disable the MLflow integration |
-| `tracking_uri` | string | *(none)* | MLflow tracking server URI. When not set, the `MLFLOW_TRACKING_URI` environment variable is used. If neither is set, MLflow falls back to its own default (`./mlruns`). |
+| `tracking_uri` | string | *(none)* | MLflow tracking server URI. Falls back to `MLFLOW_TRACKING_URI` env var, then MLflow default (`./mlruns`). |
 | `auto_log` | bool | `true` | Automatically log feature metadata on every retrieval |
 | `auto_log_entity_df` | bool | `false` | Save the entity DataFrame as an MLflow artifact (`entity_df.parquet`) |
-| `entity_df_max_rows` | int | `100000` | Maximum entity DataFrame rows to save as an artifact. DataFrames exceeding this limit are skipped to avoid OOM and slow uploads. |
+| `entity_df_max_rows` | int | `100000` | Maximum entity DataFrame rows to save as an artifact |
+| `log_execution_context` | bool | `true` | Tag runs with execution context (pipeline, workbench, feature_server, standalone) |
+| `log_operations` | bool | `false` | Log `feast apply` and `feast materialize` to a separate MLflow experiment |
+| `ops_experiment_suffix` | string | `"-feast-ops"` | Suffix for the operations experiment name |
 
 ## What gets logged
 
-When `auto_log: true`, each `get_historical_features` or `get_online_features` call records the following on the active MLflow run:
+### Tags on retrieval runs
 
-### Tags
+When `auto_log: true`, each `get_historical_features` or `get_online_features` call records:
 
 | Tag | Example | Description |
 |-----|---------|-------------|
@@ -69,6 +76,18 @@ When `auto_log: true`, each `get_historical_features` or `get_online_features` c
 | `feast.entity_count` | `200` | Number of entities in the request |
 | `feast.feature_count` | `5` | Number of features retrieved |
 
+### Execution context tags
+
+When `log_execution_context: true`:
+
+| Tag | When set | Example |
+|-----|----------|---------|
+| `feast.execution_context` | Always | `pipeline` / `workbench` / `feature_server` / `standalone` |
+| `feast.kfp_run_id` | Pipeline (KFP) | `abc-123-def` |
+| `feast.kfp_pipeline` | Pipeline (KFP) | `fraud-training-pipeline` |
+| `feast.workbench` | RHOAI workbench | `my-jupyter-notebook` |
+| `feast.namespace` | Pipeline or workbench | `user-project` |
+
 ### Metrics
 
 | Metric | Example | Description |
@@ -77,7 +96,17 @@ When `auto_log: true`, each `get_historical_features` or `get_online_features` c
 
 ### Artifacts
 
-When `auto_log_entity_df: true`, the entity DataFrame is saved as `entity_df.parquet` in the run's artifacts (if the row count is within `entity_df_max_rows`), enabling exact reproduction of training data.
+When `auto_log_entity_df: true`, the entity DataFrame is saved as `entity_df.parquet`.
+
+### Operation logs (when `log_operations: true`)
+
+`feast apply` and `feast materialize` create runs in the `{project}-feast-ops` experiment:
+
+| Tag | Example |
+|-----|---------|
+| `feast.operation` | `apply` / `materialize` / `materialize_incremental` |
+| `feast.feature_views_changed` | `driver_hourly_stats` (apply only) |
+| `feast.materialize.feature_views` | `driver_hourly_stats` (materialize only) |
 
 ## Usage
 
@@ -97,37 +126,78 @@ with mlflow.start_run(run_name="my_training"):
         entity_df=entity_df,
     ).to_df()
 
-    # Feature metadata is already logged to this run — no extra code needed
     model = train(training_df)
     mlflow.sklearn.log_model(model, "model")
 ```
 
-### Resolve a model back to its feature service
+### FeastMlflowClient (zero mlflow imports)
+
+The `FeastMlflowClient` wraps MLflow so user code never needs `import mlflow`:
+
+```python
+from feast import FeatureStore
 
-Given an MLflow model URI, determine which Feast feature service was used during training:
+store = FeatureStore(".")
+client = store.get_mlflow_client()
+
+# Training
+with client.start_run(run_name="v1_training"):
+    df = store.get_historical_features(
+        features=store.get_feature_service("driver_activity_v1"),
+        entity_df=entity_df,
+    ).to_df()
+
+    model = LogisticRegression().fit(X, y)
+    client.log_params({"model_type": "logistic_regression"})
+    client.log_metrics({"f1": 0.85})
+    client.log_model(model, "model")
+    train_run_id = client.active_run_id
+
+client.register_model(f"runs:/{train_run_id}/model", "driver_model")
+
+# Prediction (auto-links to training run)
+with client.start_run(run_name="prediction"):
+    model = client.load_model("models:/driver_model/1")
+    # This run is now tagged with feast.training_run_id pointing to train_run_id
+    online_features = store.get_online_features(...).to_dict()
+    predictions = model.predict(...)
+```
+
+### FeastMlflowClient API
+
+| Method | Description |
+|--------|-------------|
+| `store.get_mlflow_client()` | Create a client from the FeatureStore |
+| `client.start_run(run_name, tags)` | Context manager, auto-tags `feast.project` |
+| `client.log_params(params)` | Log parameters |
+| `client.log_metrics(metrics, step)` | Log metrics |
+| `client.log_metric(key, value, step)` | Log a single metric |
+| `client.log_model(model, path, flavor)` | Log model + auto-attach `required_features.json` |
+| `client.load_model(model_uri)` | Load model + auto-tag prediction run with training lineage |
+| `client.register_model(model_uri, name)` | Register + auto-tag version with `feast.feature_service` |
+| `client.resolve_features(model_uri)` | Resolve model URI to Feast feature service name |
+| `client.get_training_entity_df(run_id)` | Recover entity DataFrame from a past run |
+| `client.mlflow` | Escape hatch: raw mlflow module |
+| `client.active_run_id` | Current active run ID |
+
+### Resolve a model back to its feature service
 
 ```python
 from feast.mlflow_integration import resolve_feature_service_from_model_uri
 
 fs_name = resolve_feature_service_from_model_uri("models:/my_model/1")
-# Returns "driver_activity_v1" — resolved from the training run's tags
 ```
 
 Resolution order:
 1. Model version tag `feast.feature_service` (explicit override)
 2. Training run tag `feast.feature_service` (set by auto-log)
 
-If neither tag is found, a `FeastMlflowModelResolutionError` is raised with guidance on how to set the tag.
-
 ### Reproduce training from a past run
 
-Retrieve the exact entity DataFrame that was used in a previous training run:
-
 ```python
 from feast.mlflow_integration import get_entity_df_from_mlflow_run
 
 entity_df = get_entity_df_from_mlflow_run(run_id="abc123")
-# Returns the entity DataFrame saved during the original run
 
 training_df = store.get_historical_features(
     features=store.get_feature_service("driver_activity_v1"),
 
@@ -226,6 +226,16 @@ def __init__(
 
         self._init_mlflow_tracking()
 
+    def get_mlflow_client(self):
+        """Return a :class:`~feast.mlflow_integration.client.FeastMlflowClient`.
+
+        The client wraps MLflow so that ``import mlflow`` is never needed
+        in user code.  Configuration is inherited from ``feature_store.yaml``.
+        """
+        from feast.mlflow_integration.client import FeastMlflowClient
+
+        return FeastMlflowClient(self)
+
     def _init_mlflow_tracking(self):
         """Configure the global MLflow tracking URI and experiment from feature_store.yaml.
 
@@ -1187,6 +1197,24 @@ def _apply_diffs(
         # Emit OpenLineage events for applied objects
         self._emit_openlineage_apply_diffs(registry_diff)
 
+        # Emit MLflow events for applied objects (Phase 7)
+        self._mlflow_log_apply_diffs(registry_diff)
+
+    def _mlflow_log_apply_diffs(self, registry_diff: RegistryDiff):
+        """Log apply operation to MLflow ops experiment."""
+        try:
+            mlflow_cfg = self.config.mlflow
+            if mlflow_cfg is None or not mlflow_cfg.enabled or not mlflow_cfg.log_operations:
+                return
+            objects: List[Any] = []
+            for feast_object_diff in registry_diff.feast_object_diffs:
+                if feast_object_diff.new_feast_object is not None:
+                    objects.append(feast_object_diff.new_feast_object)
+            if objects:
+                self._mlflow_log_apply(objects)
+        except Exception as e:
+            _logger.debug("MLflow apply logging failed: %s", e)
+
     def _emit_openlineage_apply_diffs(self, registry_diff: RegistryDiff):
         """Emit OpenLineage events for objects applied via diffs."""
         if self.openlineage_emitter is None:
@@ -1482,6 +1510,26 @@ def apply(
         # Emit OpenLineage events for applied objects
         self._emit_openlineage_apply(objects)
 
+        # Emit MLflow events for applied objects (Phase 7)
+        self._mlflow_log_apply(objects)
+
+    def _mlflow_log_apply(self, objects: List[Any]):
+        """Log applied objects to MLflow ops experiment."""
+        try:
+            mlflow_cfg = self.config.mlflow
+            if mlflow_cfg is None or not mlflow_cfg.enabled or not mlflow_cfg.log_operations:
+                return
+            from feast.mlflow_integration.logger import log_apply_to_mlflow
+
+            log_apply_to_mlflow(
+                changed_objects=objects,
+                project=self.project,
+                tracking_uri=mlflow_cfg.get_tracking_uri(),
+                ops_experiment_suffix=mlflow_cfg.ops_experiment_suffix,
+            )
+        except Exception as e:
+            _logger.debug("MLflow apply logging failed: %s", e)
+
     def _emit_openlineage_apply(self, objects: List[Any]):
         """Emit OpenLineage events for applied objects."""
         if self.openlineage_emitter is None:
@@ -2062,6 +2110,12 @@ def tqdm_builder(length):
             self._emit_openlineage_materialize_complete(
                 ol_run_id, feature_views_to_materialize
             )
+
+            # Emit MLflow event for materialization (Phase 7)
+            _mat_duration = time.monotonic() - _retrieval_start if '_retrieval_start' in dir() else 0
+            self._mlflow_log_materialize(
+                feature_views_to_materialize, None, end_date, _mat_duration, incremental=True,
+            )
         except Exception as e:
             # Emit OpenLineage FAIL event
             self._emit_openlineage_materialize_fail(ol_run_id, str(e))
@@ -2190,11 +2244,45 @@ def tqdm_builder(length):
             self._emit_openlineage_materialize_complete(
                 ol_run_id, feature_views_to_materialize
             )
+
+            # Emit MLflow event for materialization (Phase 7)
+            self._mlflow_log_materialize(
+                feature_views_to_materialize, start_date, end_date, 0, incremental=False,
+            )
         except Exception as e:
             # Emit OpenLineage FAIL event
             self._emit_openlineage_materialize_fail(ol_run_id, str(e))
             raise
 
+    def _mlflow_log_materialize(
+        self,
+        feature_views: List[Any],
+        start_date: Optional[datetime],
+        end_date: datetime,
+        duration_seconds: float,
+        incremental: bool = False,
+    ):
+        """Log materialization to MLflow ops experiment."""
+        try:
+            mlflow_cfg = self.config.mlflow
+            if mlflow_cfg is None or not mlflow_cfg.enabled or not mlflow_cfg.log_operations:
+                return
+            from feast.mlflow_integration.logger import log_materialize_to_mlflow
+
+            fv_names = [getattr(fv, "name", str(fv)) for fv in feature_views]
+            log_materialize_to_mlflow(
+                feature_view_names=fv_names,
+                start_date=start_date,
+                end_date=end_date,
+                duration_seconds=duration_seconds,
+                project=self.project,
+                tracking_uri=mlflow_cfg.get_tracking_uri(),
+                incremental=incremental,
+                ops_experiment_suffix=mlflow_cfg.ops_experiment_suffix,
+            )
+        except Exception as e:
+            _logger.debug("MLflow materialize logging failed: %s", e)
+
     def _emit_openlineage_materialize_start(
         self,
         feature_views: List[Any],
 
@@ -26,13 +26,16 @@
       from a previous MLflow run's artifacts.
 """
 
+from feast.mlflow_integration.client import FeastMlflowClient
 from feast.mlflow_integration.config import MlflowConfig
 from feast.mlflow_integration.entity_df_builder import (
     FeastMlflowEntityDfError,
     get_entity_df_from_mlflow_run,
 )
 from feast.mlflow_integration.logger import (
+    log_apply_to_mlflow,
     log_feature_retrieval_to_mlflow,
+    log_materialize_to_mlflow,
     log_training_dataset_to_mlflow,
 )
 from feast.mlflow_integration.model_resolver import (
@@ -41,9 +44,12 @@
 )
 
 __all__ = [
+    "FeastMlflowClient",
     "MlflowConfig",
     "log_feature_retrieval_to_mlflow",
     "log_training_dataset_to_mlflow",
+    "log_apply_to_mlflow",
+    "log_materialize_to_mlflow",
     "resolve_feature_service_from_model_uri",
     "FeastMlflowModelResolutionError",
     "get_entity_df_from_mlflow_run",