Skip to content

Commit 58016bc

Browse files
jyejarecursoragent
andauthored
docs: Deprecate Great Expectations DQM in favor of built-in Feature Quality Monitoring (#6548)
* docs: Deprecate Great Expectations DQM in favor of built-in Feature Quality Monitoring The GE-based validation is superseded by Feast's native DQM system which provides richer metrics, serving log support, no extra dependencies, and a built-in UI dashboard. Adds deprecation notice, migration guide, and comparison table to help users transition. Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> * docs: Deprecate GE references across all docs, update to new DQM system - SUMMARY.md: Mark DQM and GE tutorial entries as [Deprecated] - tutorials/validating-historical-features.md: Add deprecation banner - roadmap.md: Strike GE entry, add new Feature Quality Monitoring - README.md: Replace GE limitation with new built-in monitoring capability - ADR-0011: Add Superseded section pointing to new system - reference/codebase-structure.md: Note GE profiler deprecated - getting-started/concepts/dataset.md: Update DQM context - README.md (root): Auto-synced by template hook Blog posts (feast-0-18) left untouched as historical records. Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> * docs: Replace GE in website blog, update offline store docs, cross-link operational metrics - infra/website/docs/blog/feast-mlflow-kubeflow.md: Replace full GE code section with native Feature Quality Monitoring (config + CLI examples) - docs/how-to-guides/customizing-feast/adding-a-new-offline-store.md: Update pull_all_from_table_or_query description to reference new monitoring system - docs/how-to-guides/feature-monitoring.md: Add "Related: Operational and SOX Metrics" section cross-linking to Prometheus metrics, audit logging, and OpenTelemetry docs Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> --------- Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent eb042f0 commit 58016bc

12 files changed

Lines changed: 101 additions & 76 deletions

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,8 @@ The list below contains the functionality that contributors are planning to deve
254254
* [x] [Offline Feature Server (alpha)](https://docs.feast.dev/reference/feature-servers/offline-feature-server)
255255
* [x] [Registry server (alpha)](https://github.com/feast-dev/feast/blob/master/docs/reference/feature-servers/registry-server.md)
256256
* **Data Quality Management (See [RFC](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98/edit))**
257-
* [x] Data profiling and validation (Great Expectations)
257+
* [x] ~~Data profiling and validation (Great Expectations)~~ (deprecated)
258+
* [x] [Feature Quality Monitoring](https://docs.feast.dev/how-to-guides/feature-monitoring) — built-in metrics, drift detection, serving log monitoring, and UI dashboard
258259
* **Feature Discovery and Governance**
259260
* [x] Python SDK for browsing feature registry
260261
* [x] CLI for browsing feature registry

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ Feast helps ML platform/MLOps teams with DevOps experience productionize real-ti
7171
* **batch feature engineering**: Feast supports on-demand and streaming transformations. Feast is also investing in supporting batch transformations.
7272
* **native streaming feature integration:** Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines.
7373
* **lineage:** Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py).
74-
* **data quality / drift detection**: Feast has experimental integrations with [Great Expectations](https://greatexpectations.io/), but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions.
74+
* **data quality / drift detection**: Feast now includes built-in [Feature Quality Monitoring](how-to-guides/feature-monitoring.md) that computes statistical metrics (null rates, distributions, percentiles), detects drift across batch data and serving logs, and provides a monitoring UI dashboard. The older Great Expectations integration is deprecated.
7575

7676
## Example use cases
7777

docs/SUMMARY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@
5757
* [Fraud detection on GCP](tutorials/tutorials-overview/fraud-detection.md)
5858
* [Real-time credit scoring on AWS](tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md)
5959
* [Driver stats on Snowflake](tutorials/tutorials-overview/driver-stats-on-snowflake.md)
60-
* [Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
60+
* [\[Deprecated\] Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
6161
* [Building streaming features](tutorials/building-streaming-features.md)
6262
* [Retrieval Augmented Generation (RAG) with Feast](tutorials/rag-with-docling.md)
6363
* [RAG Fine Tuning with Feast and Milvus](../examples/rag-retriever/README.md)
@@ -205,7 +205,7 @@
205205
* [\[Beta\] On demand feature view](reference/beta-on-demand-feature-view.md)
206206
* [\[Alpha\] Static Artifacts Loading](reference/alpha-static-artifacts.md)
207207
* [\[Alpha\] Vector Database](reference/alpha-vector-database.md)
208-
* [\[Alpha\] Data quality monitoring](reference/dqm.md)
208+
* [\[Deprecated\] Data quality monitoring (Great Expectations)](reference/dqm.md)
209209
* [\[Alpha\] Streaming feature computation with Denormalized](reference/denormalized.md)
210210
* [\[Alpha\] Feature View Versioning](reference/alpha-feature-view-versioning.md)
211211
* [OpenLineage Integration](reference/openlineage.md)

docs/adr/ADR-0011-data-quality-monitoring.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,21 @@ If validation fails, a `ValidationFailed` exception is raised with details for a
8383
- Dependency on Great Expectations adds to the install footprint (optional via `feast[ge]`).
8484
- Automatic profiling capabilities are limited; manual expectation crafting is recommended.
8585

86+
## Superseded
87+
88+
This ADR documents the original GE-based approach which is now **deprecated**. It has been superseded by Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system (introduced in 2025), which provides:
89+
90+
- Automatic metric computation (null rates, percentiles, histograms) with no external dependencies
91+
- Monitoring across batch data and serving logs
92+
- CLI (`feast monitor run`) and REST API for automation
93+
- Built-in UI monitoring dashboard
94+
- Support for all offline store backends via SQL push-down
95+
96+
The GE-based integration may be removed in a future release.
97+
8698
## References
8799

88100
- Original RFC: Feast RFC-027: Data Quality Monitoring
89101
- Implementation: `sdk/python/feast/dqm/`, `sdk/python/feast/saved_dataset.py`
90-
- Documentation: [Data Quality Monitoring](../reference/dqm.md)
102+
- Documentation: [Data Quality Monitoring (deprecated)](../reference/dqm.md)
103+
- **New system:** [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md)

docs/getting-started/concepts/dataset.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# \[Alpha] Saved dataset
22

3-
Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. [Data Quality Monitoring](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98) was the primary motivation for creating dataset concept.
3+
Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. Data Quality Monitoring was the original motivation for creating the dataset concept. Note that the Great Expectations-based validation that used saved datasets is now deprecated in favor of Feast's built-in [Feature Quality Monitoring](../../how-to-guides/feature-monitoring.md) system, which does not require saved datasets.
44

55
Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the [offline store](../components/offline-store.md).
66

docs/how-to-guides/customizing-feast/adding-a-new-offline-store.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ To fully implement the interface for the offline store, you will need to impleme
5151
* `pull_latest_from_table_or_query` is invoked when running materialization (using the `feast materialize` or `feast materialize-incremental` commands, or the corresponding `FeatureStore.materialize()` method. This method pull data from the offline store, and the `FeatureStore` class takes care of writing this data into the online store.
5252
* `get_historical_features` is invoked when reading values from the offline store using the `FeatureStore.get_historical_features()` method. Typically, this method is used to retrieve features when training ML models.
5353
* (optional) `offline_write_batch` is a method that supports directly pushing a pyarrow table to a feature view. Given a feature view with a specific schema, this function should write the pyarrow table to the batch source defined. More details about the push api can be found [here](../docs/reference/data-sources/push.md). This method only needs implementation if you want to support the push api in your offline store.
54-
* (optional) `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date. This method is only used for **SavedDatasets** as part of data quality monitoring validation.
54+
* (optional) `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date. This method is used for **SavedDatasets** and as a fallback compute path for the [Feature Quality Monitoring](../../how-to-guides/feature-monitoring.md) system (backends without native SQL push-down).
5555
* (optional) `write_logged_features` is a method that takes a pyarrow table or a path that points to a parquet file and writes the data to a defined source defined by `LoggingSource` and `LoggingConfig`. This method is only used internally for **SavedDatasets**.
5656

5757
{% code title="feast_custom_offline_store/file.py" %}

docs/how-to-guides/feature-monitoring.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -462,3 +462,11 @@ The monitoring page is always accessible in the sidebar. To see actual data:
462462

463463
2. Run `feast apply` — this computes baseline metrics automatically
464464
3. Schedule `feast monitor run` (or click "Compute Metrics" in the UI) to generate daily/weekly/monthly metrics
465+
466+
## Related: Operational and SOX Metrics
467+
468+
Feature Quality Monitoring focuses on **data-level** metrics (distributions, null rates, drift). Feast also provides **operational metrics** for infrastructure observability:
469+
470+
- **Prometheus metrics** (`feast_offline_store_*`, `feast_online_store_*`) — latency, throughput, and error rates for offline/online store operations. See [Python Feature Server — Metrics](../reference/feature-servers/python-feature-server.md).
471+
- **SOX audit logging** (`feast.audit`) — structured audit events for compliance tracking of feature store operations.
472+
- **OpenTelemetry integration** — distributed tracing for feature serving requests. See [OpenTelemetry Integration](../getting-started/components/open-telemetry.md).

docs/reference/codebase-structure.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ The majority of Feast logic lives in these Python files:
2828

2929
There are also several important submodules:
3030
* `infra/` contains all the infrastructure components, such as the provider, offline store, online store, batch materialization engine, and registry.
31-
* `dqm/` covers data quality monitoring, such as the dataset profiler.
31+
* `dqm/` covers data quality monitoring. The legacy Great Expectations profiler (`profilers/ge_profiler`) is deprecated; see [`monitoring/`](../../sdk/python/feast/monitoring/) for the current built-in monitoring system.
3232
* `diff/` covers the logic for determining how to apply infrastructure changes upon feature repo changes (e.g. the output of `feast plan` and `feast apply`).
3333
* `embedded_go/` covers the Go feature server.
3434
* `ui/` contains the embedded Web UI, to be launched on the `feast ui` command.

docs/reference/dqm.md

Lines changed: 48 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,50 @@
11
# Data Quality Monitoring
22

3-
Data Quality Monitoring (DQM) is a Feast module aimed to help users to validate their data with the user-curated set of rules.
4-
Validation could be applied during:
5-
* Historical retrieval (training dataset generation)
6-
* [planned] Writing features into an online store
7-
* [planned] Reading features from an online store
3+
{% hint style="warning" %}
4+
**Deprecated:** The Great Expectations-based validation described on this page is deprecated and will be removed in a future release. It has been superseded by Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system, which provides richer metrics (histograms, percentiles, drift detection), works across batch data and serving logs, requires no external dependencies, and includes a built-in UI dashboard.
85

9-
Its goal is to address several complex data problems, namely:
10-
* Data consistency - new training datasets can be significantly different from previous datasets. This might require a change in model architecture.
11-
* Issues/bugs in the upstream pipeline - bugs in upstream pipelines can cause invalid values to overwrite existing valid values in an online store.
12-
* Training/serving skew - distribution shift could significantly decrease the performance of the model.
6+
Please migrate to the new monitoring system. See the [Feature Quality Monitoring guide](../how-to-guides/feature-monitoring.md) for setup instructions.
7+
{% endhint %}
138

14-
> To monitor data quality, we check that the characteristics of the tested dataset (aka the tested dataset's profile) are "equivalent" to the characteristics of the reference dataset.
15-
> How exactly profile equivalency should be measured is up to the user.
9+
## Legacy: Great Expectations Integration
10+
11+
The following documents the deprecated Great Expectations-based validation that was previously the only DQM option in Feast. This integration relied on `pip install 'feast[ge]'` and only supported validation during historical retrieval.
12+
13+
---
1614

1715
### Overview
1816

19-
The validation process consists of the following steps:
20-
1. User prepares reference dataset (currently only [saved datasets](../getting-started/concepts/dataset.md) from historical retrieval are supported).
21-
2. User defines profiler function, which should produce profile by given dataset (currently only profilers based on [Great Expectations](https://docs.greatexpectations.io) are allowed).
22-
3. Validation of tested dataset is performed with reference dataset and profiler provided as parameters.
17+
The legacy validation process consists of the following steps:
18+
1. User prepares reference dataset (only [saved datasets](../getting-started/concepts/dataset.md) from historical retrieval are supported).
19+
2. User defines a profiler function that produces a profile using [Great Expectations](https://docs.greatexpectations.io).
20+
3. Validation of the tested dataset is performed with the reference dataset and profiler provided as parameters.
2321

24-
### Preparations
25-
Feast with Great Expectations support can be installed via
22+
### Installation
2623
```shell
2724
pip install 'feast[ge]'
2825
```
2926

3027
### Dataset profile
31-
Currently, Feast supports only [Great Expectation's](https://greatexpectations.io/) [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite)
32-
as dataset's profile. Hence, the user needs to define a function (profiler) that would receive a dataset and return an [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite).
3328

34-
Great Expectations supports automatic profiling as well as manually specifying expectations:
29+
This integration uses [Great Expectation's](https://greatexpectations.io/) [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite)
30+
as the dataset profile format. The user defines a profiler function that receives a dataset and returns an ExpectationSuite.
31+
3532
```python
3633
from great_expectations.dataset import Dataset
3734
from great_expectations.core.expectation_suite import ExpectationSuite
3835

3936
from feast.dqm.profilers.ge_profiler import ge_profiler
4037

41-
@ge_profiler
42-
def automatic_profiler(dataset: Dataset) -> ExpectationSuite:
43-
from great_expectations.profile.user_configurable_profiler import UserConfigurableProfiler
44-
45-
return UserConfigurableProfiler(
46-
profile_dataset=dataset,
47-
ignored_columns=['conv_rate'],
48-
value_set_threshold='few'
49-
).build_suite()
50-
```
51-
However, from our experience capabilities of automatic profiler are quite limited. So we would recommend crafting your own expectations:
52-
```python
5338
@ge_profiler
5439
def manual_profiler(dataset: Dataset) -> ExpectationSuite:
5540
dataset.expect_column_max_to_be_between("column", 1, 2)
5641
return dataset.get_expectation_suite()
5742
```
5843

59-
60-
6144
### Validating Training Dataset
45+
6246
During retrieval of historical features, `validation_reference` can be passed as a parameter to methods `.to_df(validation_reference=...)` or `.to_arrow(validation_reference=...)` of RetrievalJob.
63-
If parameter is provided Feast will run validation once dataset is materialized. In case if validation successful materialized dataset is returned.
64-
Otherwise, `feast.dqm.errors.ValidationFailed` exception would be raised. It will consist of all details for expectations that didn't pass.
47+
If validation is successful, the materialized dataset is returned. Otherwise, `feast.dqm.errors.ValidationFailed` exception is raised with details for expectations that didn't pass.
6548

6649
```python
6750
from feast import FeatureStore
@@ -75,3 +58,32 @@ job.to_df(
7558
.as_reference(profiler=manual_profiler)
7659
)
7760
```
61+
62+
---
63+
64+
## Migration Guide
65+
66+
The new [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system replaces this integration with:
67+
68+
| Capability | GE-based (deprecated) | New DQM |
69+
|---|---|---|
70+
| Scope | Historical retrieval only | Batch data + serving logs |
71+
| Dependencies | `feast[ge]` extra required | No extra dependencies |
72+
| Metrics | User-defined expectations | Automatic: null rates, percentiles, histograms, drift |
73+
| UI | None | Built-in monitoring dashboard |
74+
| Automation | Manual profiler code | `feast monitor run` CLI + REST API |
75+
| Backends | Limited | All offline store backends |
76+
77+
To migrate:
78+
79+
1. Enable DQM in `feature_store.yaml`:
80+
```yaml
81+
data_quality_monitoring:
82+
auto_baseline: true
83+
```
84+
85+
2. Run `feast apply` to compute baseline metrics automatically.
86+
87+
3. Schedule `feast monitor run` for ongoing monitoring.
88+
89+
4. Remove the `feast[ge]` dependency from your requirements.

docs/roadmap.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,8 @@ The list below contains the functionality that contributors are planning to deve
8989
* [x] [Offline Feature Server (alpha)](https://docs.feast.dev/reference/feature-servers/offline-feature-server)
9090
* [x] [Registry server (alpha)](https://github.com/feast-dev/feast/blob/master/docs/reference/feature-servers/registry-server.md)
9191
* **Data Quality Management (See [RFC](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98/edit))**
92-
* [x] Data profiling and validation (Great Expectations)
92+
* [x] ~~Data profiling and validation (Great Expectations)~~ (deprecated)
93+
* [x] [Feature Quality Monitoring](https://docs.feast.dev/how-to-guides/feature-monitoring) — built-in metrics, drift detection, serving log monitoring, and UI dashboard
9394
* **Feature Discovery and Governance**
9495
* [x] Python SDK for browsing feature registry
9596
* [x] CLI for browsing feature registry

0 commit comments

Comments
 (0)