Skip to content

feat: Feast-MLflow Integration#6235

Open
Vperiodt wants to merge 16 commits intofeast-dev:masterfrom
Vperiodt:feast-mlflow
Open

feat: Feast-MLflow Integration#6235
Vperiodt wants to merge 16 commits intofeast-dev:masterfrom
Vperiodt:feast-mlflow

Conversation

@Vperiodt
Copy link
Copy Markdown
Contributor

@Vperiodt Vperiodt commented Apr 8, 2026

What this PR does / why we need it:

final_mlflow_demo.mp4
  • Auto-logging: Feature retrieval metadata is tagged on the active MLflow run (feast.feature_refs, feast.feature_views, feast.feature_service, feast.entity_count, etc.)
  • Entity DataFrame archival: Optionally saves the training entity DataFrame as an MLflow artifact (entity_df.parquet) for full reproducibility
  • Model-to-feature-service resolution: resolve_feature_service_from_model_uri() maps any MLflow model URI back to its Feast feature service enabling serving pipelines to auto-discover which features a model needs
  • Entity DataFrame reconstruction: get_entity_df_from_mlflow_run() rebuilds the exact entity DataFrame from a past run's artifacts, enabling training reproducibility
  • Configuration : Controlled entirely via feature_store.yaml under a new mlflow: block

Which issue(s) this PR fixes:

Checks

  • I've made sure the tests are passing.
  • My commits are signed off (git commit -s)
  • My PR title follows conventional commits format

Testing Strategy

  • Unit tests
  • Integration tests
  • Manual tests
  • Testing is not required for this change

Misc


Open with Devin

github-advanced-security[bot]

This comment was marked as resolved.

@Vperiodt Vperiodt marked this pull request as ready for review April 9, 2026 12:06
@Vperiodt Vperiodt requested a review from a team as a code owner April 9, 2026 12:06
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@Vperiodt Vperiodt changed the title Feast-MLflow Integration feat: Feast-MLflow Integration Apr 15, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 19 additional findings in Devin Review.

Open in Devin Review

Comment thread sdk/python/feast/feature_store.py Outdated
with tempfile.TemporaryDirectory() as tmp_dir:
path = os.path.join(tmp_dir, "entity_df.parquet")
entity_df.to_parquet(path, index=False)
mlflow.log_artifact(path)
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Mixing global mlflow.log_artifact() with explicit-URI client causes artifact/metadata to target different servers

In _auto_log_entity_df_info, tags and params are logged via a locally-created MlflowClient(tracking_uri=tracking_uri) (lines 304, 309-318), but the entity DataFrame artifact is uploaded via the global mlflow.log_artifact(path) (line 327). The global function uses the tracking URI set by mlflow.set_tracking_uri(), not the explicit tracking_uri from the config. If the global tracking URI is changed by another library in the process, or if _init_mlflow_tracking failed silently (caught by except Exception at sdk/python/feast/feature_store.py:249), the artifact would be uploaded to a different server than where the tags/params were logged, splitting metadata across two servers.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 20 additional findings in Devin Review.

Open in Devin Review

Comment thread sdk/python/feast/ui_server.py Outdated
try:
import mlflow

tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 UI server ignores MLFLOW_TRACKING_URI env var, falls back to hardcoded localhost

In ui_server.py, both the /api/mlflow-runs and /api/mlflow-feature-models endpoints resolve the MLflow tracking URI using mlflow_cfg.tracking_uri or "http://127.0.0.1:5000". This reads the raw tracking_uri field from the config and falls back to a hardcoded localhost URL, completely bypassing the MLFLOW_TRACKING_URI environment variable. In contrast, the rest of the codebase (feature_store.py:243, feature_store.py:325, feature_store.py:1696, feature_store.py:2885) correctly calls mlflow_cfg.get_tracking_uri() which checks the env var via sdk/python/feast/mlflow_integration/config.py:19-29. When a user sets MLFLOW_TRACKING_URI without setting tracking_uri in YAML (which is a very common deployment pattern, and documented in the PR's own docs at docs/reference/mlflow.md:51), the UI endpoints will incorrectly connect to http://127.0.0.1:5000 instead of the env-var-specified server.

Suggested change
tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000"
tracking_uri = mlflow_cfg.get_tracking_uri() or "http://127.0.0.1:5000"
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread sdk/python/feast/ui_server.py Outdated
try:
import mlflow

tracking_uri = mlflow_cfg.tracking_uri or "http://127.0.0.1:5000"
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Second UI endpoint also ignores MLFLOW_TRACKING_URI env var

Same issue as in the /api/mlflow-runs endpoint: the /api/mlflow-feature-models endpoint at sdk/python/feast/ui_server.py:234 uses mlflow_cfg.tracking_uri or "http://127.0.0.1:5000" instead of mlflow_cfg.get_tracking_uri(), causing the MLFLOW_TRACKING_URI environment variable to be ignored.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Vperiodt added 12 commits April 20, 2026 14:51
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

View 22 additional findings in Devin Review.

Open in Devin Review

Comment thread sdk/python/feast/feature_store.py Outdated
)

# Emit MLflow event for materialization (Phase 7)
_mat_duration = time.monotonic() - _retrieval_start if '_retrieval_start' in dir() else 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Undefined _retrieval_start variable in materialize_incremental causes bogus duration or NameError

In materialize_incremental, the code at line 2115 references _retrieval_start which is never defined in that method scope. The expression '_retrieval_start' in dir() checks local variable names but dir() without arguments returns module-level names, not local variables -- so the check is unreliable. If _retrieval_start is not in the returned list, _mat_duration defaults to 0 (acceptable but meaningless). However, if it IS in the name list (e.g., a module-level _retrieval_start existed from another context), it would attempt time.monotonic() - _retrieval_start on a potentially unrelated value, resulting in a wrong duration or a NameError/TypeError at runtime.

Suggested change
_mat_duration = time.monotonic() - _retrieval_start if '_retrieval_start' in dir() else 0
_mat_duration = 0
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +185 to +189
if tracking_uri:
mlflow.set_tracking_uri(tracking_uri)

experiment_name = f"{project}{ops_experiment_suffix}"
mlflow.set_experiment(experiment_name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 log_apply_to_mlflow and log_materialize_to_mlflow mutate global MLflow tracking URI and experiment, corrupting concurrent user runs

Both log_apply_to_mlflow (sdk/python/feast/mlflow_integration/logger.py:185-189) and log_materialize_to_mlflow (sdk/python/feast/mlflow_integration/logger.py:251-255) call mlflow.set_tracking_uri(tracking_uri) and mlflow.set_experiment(experiment_name) globally. These mutate process-wide global state. If a user has an active MLflow run in a different experiment (e.g. during feast apply inside a training script), these calls redirect subsequent MLflow operations to the ops experiment. While the functions try to restore mlflow.set_experiment(project) afterward (line 224, 274), the tracking URI is never restored. In the exception path, if mlflow.start_run() raises (line 202/258), the experiment remains set to the ops experiment. This corrupts any concurrent or subsequent user operations in the same process.

Prompt for agents
The functions log_apply_to_mlflow and log_materialize_to_mlflow in sdk/python/feast/mlflow_integration/logger.py mutate process-wide global state by calling mlflow.set_tracking_uri() and mlflow.set_experiment(). This corrupts any concurrent user MLflow runs in the same process.

The fix should save and restore the original tracking URI and experiment before/after the ops logging, or preferably use the MlflowClient API exclusively (which already takes tracking_uri as a parameter) instead of the global mlflow module-level functions. The MlflowClient.create_run() method allows creating runs in specific experiments without mutating global state. This pattern is already used in log_feature_retrieval_to_mlflow which correctly uses client = mlflow.MlflowClient(tracking_uri=tracking_uri) without touching global state.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +497 to +506
def mlflow(self):
"""Get the MLflow configuration."""
if not self._mlflow:
if isinstance(self.mlflow_config, Dict):
from feast.mlflow_integration.config import MlflowConfig

self._mlflow = MlflowConfig(**self.mlflow_config)
elif self.mlflow_config:
self._mlflow = self.mlflow_config
return self._mlflow
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 mlflow property returns falsy None on subsequent calls when MlflowConfig(enabled=False) is set

The mlflow property in RepoConfig (sdk/python/feast/repo_config.py:497-506) uses if not self._mlflow: as the guard. When the config is a valid MlflowConfig object with enabled=False, self._mlflow will be a truthy MlflowConfig instance (Pydantic models are truthy), so this works correctly on first access. However, if mlflow_config is None (the default), self._mlflow stays None forever and the property returns None -- which is fine. The real issue is: if mlflow_config is an empty dict {}, MlflowConfig(**{}) creates a valid config with enabled=False, and self._mlflow becomes truthy. But if mlflow_config is set to some falsy-like Pydantic object (unlikely but possible with custom subclasses), the if not self._mlflow guard would re-create it every time. This is the same pattern as openlineage property so it's consistent, but not actually a bug for standard usage.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread docs/reference/mlflow.md Outdated
- **Online feature retrieval** -- `get_online_features()` tags the run with the same metadata
- **Entity DataFrame archival** -- optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility
- **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this is actually useful

Copy link
Copy Markdown
Contributor Author

@Vperiodt Vperiodt Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this is not updated one, thought of this one for backtracking from mllfow run to the workbench experiment but now it is omitted

Comment thread docs/reference/mlflow.md Outdated
- **Entity DataFrame archival** -- optionally saves the training entity DataFrame as an MLflow artifact for full reproducibility
- **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Comment thread docs/reference/mlflow.md Outdated
- **Execution context tagging** -- tags runs with where they ran (workbench, KFP pipeline, feature server, or standalone)
- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
- **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Comment thread docs/reference/mlflow.md Outdated
- **Operation logging** -- optionally logs `feast apply` and `feast materialize` to a separate MLflow experiment
- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
- **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run
- **Training-to-prediction linkage** -- `FeastMlflowClient.load_model()` links prediction runs back to their training runs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice

Comment thread docs/reference/mlflow.md Outdated
- **Model-to-Feature resolution** -- map any MLflow model URI back to its Feast feature service
- **Training reproducibility** -- reconstruct the exact entity DataFrame from a past MLflow run
- **Training-to-prediction linkage** -- `FeastMlflowClient.load_model()` links prediction runs back to their training runs
- **Feast MLflow Client** -- a thin wrapper that eliminates direct `import mlflow` in user code
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

_consecutive_failures = 0


def log_feature_retrieval_to_mlflow(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess if we have this enabled why do we need to even have the wrapper? can't we full handle this for the user?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is, can't we just hide all of the mlflow client usage to the user?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the auto-logging in logger.py fully handled for users who already use mlflow.start_run() or client.start_run() considering our FeastMlflowClient wrapper which is for users who want to avoid import mlflow entirely and get additional Feast-specific behavior on top ex: logging a model, registering it, loading it for inference etc soFeastMlflowClient is for users who want the extra lineage features without touching MLflow directly

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
devin-ai-integration[bot]

This comment was marked as resolved.

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
devin-ai-integration[bot]

This comment was marked as resolved.

Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 12 additional findings in Devin Review.

Open in Devin Review

Comment on lines +268 to +273
fs_refs = frozenset(
f"{p.name}:{f.name}"
for p in fs.feature_view_projections
for f in p.features
)
index[fs_refs] = fs.name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 _rebuild_fs_name_index uses p.name instead of p.name_to_use(), causing mismatched feature ref formats

_rebuild_fs_name_index at sdk/python/feast/feature_store.py:269 builds the feature-service-to-refs index using f"{p.name}:{f.name}", but the canonical way to build feature refs throughout the codebase is via projection.name_to_use() (see sdk/python/feast/utils.py:1164). name_to_use() returns self.name_alias or self.name and appends @v{version_tag} when a version tag is set (sdk/python/feast/feature_view_projection.py:52-56). This means the index will contain refs using the raw name while other parts of the system generate refs using aliases or version-tagged names, so any feature service with aliased or versioned projections will never be matched by _resolve_feature_service_name. Currently the impact is limited because _resolve_feature_service_name is only called with user-provided raw string refs (where the user typically uses canonical names), but the index is objectively incorrect for feature services that use aliases or version tags.

Suggested change
fs_refs = frozenset(
f"{p.name}:{f.name}"
for p in fs.feature_view_projections
for f in p.features
)
index[fs_refs] = fs.name
for fs in self.registry.list_feature_services(self.project, allow_cache=True):
fs_refs = frozenset(
f"{p.name_to_use()}:{f.name}"
for p in fs.feature_view_projections
for f in p.features
)
index[fs_refs] = fs.name
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants