You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Deprecate Great Expectations DQM in favor of built-in Feature Quality Monitoring (#6548)
* docs: Deprecate Great Expectations DQM in favor of built-in Feature Quality Monitoring
The GE-based validation is superseded by Feast's native DQM system which
provides richer metrics, serving log support, no extra dependencies, and
a built-in UI dashboard. Adds deprecation notice, migration guide, and
comparison table to help users transition.
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: Deprecate GE references across all docs, update to new DQM system
- SUMMARY.md: Mark DQM and GE tutorial entries as [Deprecated]
- tutorials/validating-historical-features.md: Add deprecation banner
- roadmap.md: Strike GE entry, add new Feature Quality Monitoring
- README.md: Replace GE limitation with new built-in monitoring capability
- ADR-0011: Add Superseded section pointing to new system
- reference/codebase-structure.md: Note GE profiler deprecated
- getting-started/concepts/dataset.md: Update DQM context
- README.md (root): Auto-synced by template hook
Blog posts (feast-0-18) left untouched as historical records.
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: Replace GE in website blog, update offline store docs, cross-link operational metrics
- infra/website/docs/blog/feast-mlflow-kubeflow.md: Replace full GE code
section with native Feature Quality Monitoring (config + CLI examples)
- docs/how-to-guides/customizing-feast/adding-a-new-offline-store.md: Update
pull_all_from_table_or_query description to reference new monitoring system
- docs/how-to-guides/feature-monitoring.md: Add "Related: Operational and
SOX Metrics" section cross-linking to Prometheus metrics, audit logging,
and OpenTelemetry docs
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy file name to clipboardExpand all lines: docs/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,7 @@ Feast helps ML platform/MLOps teams with DevOps experience productionize real-ti
71
71
***batch feature engineering**: Feast supports on-demand and streaming transformations. Feast is also investing in supporting batch transformations.
72
72
***native streaming feature integration:** Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines.
73
73
***lineage:** Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py).
74
-
***data quality / drift detection**: Feast has experimental integrations with [Great Expectations](https://greatexpectations.io/), but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions.
74
+
***data quality / drift detection**: Feast now includes built-in [Feature Quality Monitoring](how-to-guides/feature-monitoring.md) that computes statistical metrics (null rates, distributions, percentiles), detects drift across batch data and serving logs, and provides a monitoring UI dashboard. The older Great Expectations integration is deprecated.
Copy file name to clipboardExpand all lines: docs/adr/ADR-0011-data-quality-monitoring.md
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,8 +83,21 @@ If validation fails, a `ValidationFailed` exception is raised with details for a
83
83
- Dependency on Great Expectations adds to the install footprint (optional via `feast[ge]`).
84
84
- Automatic profiling capabilities are limited; manual expectation crafting is recommended.
85
85
86
+
## Superseded
87
+
88
+
This ADR documents the original GE-based approach which is now **deprecated**. It has been superseded by Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system (introduced in 2025), which provides:
89
+
90
+
- Automatic metric computation (null rates, percentiles, histograms) with no external dependencies
91
+
- Monitoring across batch data and serving logs
92
+
- CLI (`feast monitor run`) and REST API for automation
93
+
- Built-in UI monitoring dashboard
94
+
- Support for all offline store backends via SQL push-down
95
+
96
+
The GE-based integration may be removed in a future release.
97
+
86
98
## References
87
99
88
100
- Original RFC: Feast RFC-027: Data Quality Monitoring
Copy file name to clipboardExpand all lines: docs/getting-started/concepts/dataset.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# \[Alpha] Saved dataset
2
2
3
-
Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. [Data Quality Monitoring](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98) was the primary motivation for creating dataset concept.
3
+
Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. Data Quality Monitoring was the original motivation for creating the dataset concept. Note that the Great Expectations-based validation that used saved datasets is now deprecated in favor of Feast's built-in [Feature Quality Monitoring](../../how-to-guides/feature-monitoring.md) system, which does not require saved datasets.
4
4
5
5
Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the [offline store](../components/offline-store.md).
Copy file name to clipboardExpand all lines: docs/how-to-guides/customizing-feast/adding-a-new-offline-store.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,7 +51,7 @@ To fully implement the interface for the offline store, you will need to impleme
51
51
*`pull_latest_from_table_or_query` is invoked when running materialization (using the `feast materialize` or `feast materialize-incremental` commands, or the corresponding `FeatureStore.materialize()` method. This method pull data from the offline store, and the `FeatureStore` class takes care of writing this data into the online store.
52
52
*`get_historical_features` is invoked when reading values from the offline store using the `FeatureStore.get_historical_features()` method. Typically, this method is used to retrieve features when training ML models.
53
53
* (optional) `offline_write_batch` is a method that supports directly pushing a pyarrow table to a feature view. Given a feature view with a specific schema, this function should write the pyarrow table to the batch source defined. More details about the push api can be found [here](../docs/reference/data-sources/push.md). This method only needs implementation if you want to support the push api in your offline store.
54
-
* (optional) `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date. This method is only used for **SavedDatasets** as part of data quality monitoring validation.
54
+
* (optional) `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date. This method is used for **SavedDatasets**and as a fallback compute path for the [Feature Quality Monitoring](../../how-to-guides/feature-monitoring.md) system (backends without native SQL push-down).
55
55
* (optional) `write_logged_features` is a method that takes a pyarrow table or a path that points to a parquet file and writes the data to a defined source defined by `LoggingSource` and `LoggingConfig`. This method is only used internally for **SavedDatasets**.
Copy file name to clipboardExpand all lines: docs/how-to-guides/feature-monitoring.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -462,3 +462,11 @@ The monitoring page is always accessible in the sidebar. To see actual data:
462
462
463
463
2. Run `feast apply` — this computes baseline metrics automatically
464
464
3. Schedule `feast monitor run` (or click "Compute Metrics" in the UI) to generate daily/weekly/monthly metrics
465
+
466
+
## Related: Operational and SOX Metrics
467
+
468
+
Feature Quality Monitoring focuses on **data-level** metrics (distributions, null rates, drift). Feast also provides **operational metrics** for infrastructure observability:
469
+
470
+
- **Prometheus metrics** (`feast_offline_store_*`, `feast_online_store_*`) — latency, throughput, and error rates for offline/online store operations. See [Python Feature Server — Metrics](../reference/feature-servers/python-feature-server.md).
471
+
- **SOX audit logging** (`feast.audit`) — structured audit events for compliance tracking of feature store operations.
472
+
- **OpenTelemetry integration** — distributed tracing for feature serving requests. See [OpenTelemetry Integration](../getting-started/components/open-telemetry.md).
Copy file name to clipboardExpand all lines: docs/reference/codebase-structure.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ The majority of Feast logic lives in these Python files:
28
28
29
29
There are also several important submodules:
30
30
*`infra/` contains all the infrastructure components, such as the provider, offline store, online store, batch materialization engine, and registry.
31
-
*`dqm/` covers data quality monitoring, such as the dataset profiler.
31
+
*`dqm/` covers data quality monitoring. The legacy Great Expectations profiler (`profilers/ge_profiler`) is deprecated; see [`monitoring/`](../../sdk/python/feast/monitoring/) for the current built-in monitoring system.
32
32
*`diff/` covers the logic for determining how to apply infrastructure changes upon feature repo changes (e.g. the output of `feast plan` and `feast apply`).
33
33
*`embedded_go/` covers the Go feature server.
34
34
*`ui/` contains the embedded Web UI, to be launched on the `feast ui` command.
**Deprecated:** The Great Expectations-based validation described on this page is deprecated and will be removed in a future release. It has been superseded by Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system, which provides richer metrics (histograms, percentiles, drift detection), works across batch data and serving logs, requires no external dependencies, and includes a built-in UI dashboard.
8
5
9
-
Its goal is to address several complex data problems, namely:
10
-
* Data consistency - new training datasets can be significantly different from previous datasets. This might require a change in model architecture.
11
-
* Issues/bugs in the upstream pipeline - bugs in upstream pipelines can cause invalid values to overwrite existing valid values in an online store.
12
-
* Training/serving skew - distribution shift could significantly decrease the performance of the model.
6
+
Please migrate to the new monitoring system. See the [Feature Quality Monitoring guide](../how-to-guides/feature-monitoring.md) for setup instructions.
7
+
{% endhint %}
13
8
14
-
> To monitor data quality, we check that the characteristics of the tested dataset (aka the tested dataset's profile) are "equivalent" to the characteristics of the reference dataset.
15
-
> How exactly profile equivalency should be measured is up to the user.
9
+
## Legacy: Great Expectations Integration
10
+
11
+
The following documents the deprecated Great Expectations-based validation that was previously the only DQM option in Feast. This integration relied on `pip install 'feast[ge]'` and only supported validation during historical retrieval.
12
+
13
+
---
16
14
17
15
### Overview
18
16
19
-
The validation process consists of the following steps:
20
-
1. User prepares reference dataset (currently only [saved datasets](../getting-started/concepts/dataset.md) from historical retrieval are supported).
21
-
2. User defines profiler function, which should produce profile by given dataset (currently only profilers based on [Great Expectations](https://docs.greatexpectations.io) are allowed).
22
-
3. Validation of tested dataset is performed with reference dataset and profiler provided as parameters.
17
+
The legacy validation process consists of the following steps:
18
+
1. User prepares reference dataset (only [saved datasets](../getting-started/concepts/dataset.md) from historical retrieval are supported).
19
+
2. User defines a profiler function that produces a profile using [Great Expectations](https://docs.greatexpectations.io).
20
+
3. Validation of the tested dataset is performed with the reference dataset and profiler provided as parameters.
23
21
24
-
### Preparations
25
-
Feast with Great Expectations support can be installed via
22
+
### Installation
26
23
```shell
27
24
pip install 'feast[ge]'
28
25
```
29
26
30
27
### Dataset profile
31
-
Currently, Feast supports only [Great Expectation's](https://greatexpectations.io/)[ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite)
32
-
as dataset's profile. Hence, the user needs to define a function (profiler) that would receive a dataset and return an [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite).
33
28
34
-
Great Expectations supports automatic profiling as well as manually specifying expectations:
29
+
This integration uses [Great Expectation's](https://greatexpectations.io/)[ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite)
30
+
as the dataset profile format. The user defines a profiler function that receives a dataset and returns an ExpectationSuite.
31
+
35
32
```python
36
33
from great_expectations.dataset import Dataset
37
34
from great_expectations.core.expectation_suite import ExpectationSuite
38
35
39
36
from feast.dqm.profilers.ge_profiler import ge_profiler
During retrieval of historical features, `validation_reference` can be passed as a parameter to methods `.to_df(validation_reference=...)` or `.to_arrow(validation_reference=...)` of RetrievalJob.
63
-
If parameter is provided Feast will run validation once dataset is materialized. In case if validation successful materialized dataset is returned.
64
-
Otherwise, `feast.dqm.errors.ValidationFailed` exception would be raised. It will consist of all details for expectations that didn't pass.
47
+
If validation is successful, the materialized dataset is returned. Otherwise, `feast.dqm.errors.ValidationFailed` exception is raised with details for expectations that didn't pass.
65
48
66
49
```python
67
50
from feast import FeatureStore
@@ -75,3 +58,32 @@ job.to_df(
75
58
.as_reference(profiler=manual_profiler)
76
59
)
77
60
```
61
+
62
+
---
63
+
64
+
## Migration Guide
65
+
66
+
The new [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system replaces this integration with:
67
+
68
+
| Capability | GE-based (deprecated) | New DQM |
69
+
|---|---|---|
70
+
| Scope | Historical retrieval only | Batch data + serving logs |
71
+
| Dependencies |`feast[ge]` extra required | No extra dependencies |
0 commit comments