docs: Deprecate Great Expectations DQM in favor of built-in Feature Quality Monitoring (#6548)

jyejare · cursoragent · web-flow · commit 58016bcc52bc · 2026-06-22T15:46:03.000+05:30
* docs: Deprecate Great Expectations DQM in favor of built-in Feature Quality Monitoring

The GE-based validation is superseded by Feast's native DQM system which
provides richer metrics, serving log support, no extra dependencies, and
a built-in UI dashboard. Adds deprecation notice, migration guide, and
comparison table to help users transition.

Signed-off-by: Jitendra Yejare &lt;11752425+jyejare@users.noreply.github.com&gt;
Co-authored-by: Cursor &lt;cursoragent@cursor.com&gt;

* docs: Deprecate GE references across all docs, update to new DQM system

- SUMMARY.md: Mark DQM and GE tutorial entries as [Deprecated]
- tutorials/validating-historical-features.md: Add deprecation banner
- roadmap.md: Strike GE entry, add new Feature Quality Monitoring
- README.md: Replace GE limitation with new built-in monitoring capability
- ADR-0011: Add Superseded section pointing to new system
- reference/codebase-structure.md: Note GE profiler deprecated
- getting-started/concepts/dataset.md: Update DQM context
- README.md (root): Auto-synced by template hook

Blog posts (feast-0-18) left untouched as historical records.

Signed-off-by: Jitendra Yejare &lt;11752425+jyejare@users.noreply.github.com&gt;
Co-authored-by: Cursor &lt;cursoragent@cursor.com&gt;

* docs: Replace GE in website blog, update offline store docs, cross-link operational metrics

- infra/website/docs/blog/feast-mlflow-kubeflow.md: Replace full GE code
  section with native Feature Quality Monitoring (config + CLI examples)
- docs/how-to-guides/customizing-feast/adding-a-new-offline-store.md: Update
  pull_all_from_table_or_query description to reference new monitoring system
- docs/how-to-guides/feature-monitoring.md: Add "Related: Operational and
  SOX Metrics" section cross-linking to Prometheus metrics, audit logging,
  and OpenTelemetry docs

Signed-off-by: Jitendra Yejare &lt;11752425+jyejare@users.noreply.github.com&gt;
Co-authored-by: Cursor &lt;cursoragent@cursor.com&gt;

---------

Signed-off-by: Jitendra Yejare &lt;11752425+jyejare@users.noreply.github.com&gt;
Co-authored-by: Cursor &lt;cursoragent@cursor.com&gt;
diff --git a/README.md b/README.md
@@ -254,7 +254,8 @@ The list below contains the functionality that contributors are planning to deve
   * [x] [Offline Feature Server (alpha)](https://docs.feast.dev/reference/feature-servers/offline-feature-server)
   * [x] [Registry server (alpha)](https://github.com/feast-dev/feast/blob/master/docs/reference/feature-servers/registry-server.md)
 * **Data Quality Management (See [RFC](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98/edit))**
-  * [x] Data profiling and validation (Great Expectations)
+  * [x] ~~Data profiling and validation (Great Expectations)~~ (deprecated)
+  * [x] [Feature Quality Monitoring](https://docs.feast.dev/how-to-guides/feature-monitoring) — built-in metrics, drift detection, serving log monitoring, and UI dashboard
 * **Feature Discovery and Governance**
   * [x] Python SDK for browsing feature registry
   * [x] CLI for browsing feature registry
diff --git a/docs/README.md b/docs/README.md
@@ -71,7 +71,7 @@ Feast helps ML platform/MLOps teams with DevOps experience productionize real-ti
 * **batch feature engineering**: Feast supports on-demand and streaming transformations. Feast is also investing in supporting batch transformations. 
 * **native streaming feature integration:** Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines.
 * **lineage:** Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py). 
-* **data quality / drift detection**: Feast has experimental integrations with [Great Expectations](https://greatexpectations.io/), but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions.
+* **data quality / drift detection**: Feast now includes built-in [Feature Quality Monitoring](how-to-guides/feature-monitoring.md) that computes statistical metrics (null rates, distributions, percentiles), detects drift across batch data and serving logs, and provides a monitoring UI dashboard. The older Great Expectations integration is deprecated.
 
 ## Example use cases
 
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -57,7 +57,7 @@
   * [Fraud detection on GCP](tutorials/tutorials-overview/fraud-detection.md)
   * [Real-time credit scoring on AWS](tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md)
   * [Driver stats on Snowflake](tutorials/tutorials-overview/driver-stats-on-snowflake.md)
-* [Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
+* [\[Deprecated\] Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
 * [Building streaming features](tutorials/building-streaming-features.md)
 * [Retrieval Augmented Generation (RAG) with Feast](tutorials/rag-with-docling.md)
 * [RAG Fine Tuning with Feast and Milvus](../examples/rag-retriever/README.md)
@@ -205,7 +205,7 @@
 * [\[Beta\] On demand feature view](reference/beta-on-demand-feature-view.md)
 * [\[Alpha\] Static Artifacts Loading](reference/alpha-static-artifacts.md)
 * [\[Alpha\] Vector Database](reference/alpha-vector-database.md)
-* [\[Alpha\] Data quality monitoring](reference/dqm.md)
+* [\[Deprecated\] Data quality monitoring (Great Expectations)](reference/dqm.md)
 * [\[Alpha\] Streaming feature computation with Denormalized](reference/denormalized.md)
 * [\[Alpha\] Feature View Versioning](reference/alpha-feature-view-versioning.md)
 * [OpenLineage Integration](reference/openlineage.md)
diff --git a/docs/adr/ADR-0011-data-quality-monitoring.md b/docs/adr/ADR-0011-data-quality-monitoring.md
@@ -83,8 +83,21 @@ If validation fails, a `ValidationFailed` exception is raised with details for a
 - Dependency on Great Expectations adds to the install footprint (optional via `feast[ge]`).
 - Automatic profiling capabilities are limited; manual expectation crafting is recommended.
 
+## Superseded
+
+This ADR documents the original GE-based approach which is now **deprecated**. It has been superseded by Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system (introduced in 2025), which provides:
+
+- Automatic metric computation (null rates, percentiles, histograms) with no external dependencies
+- Monitoring across batch data and serving logs
+- CLI (`feast monitor run`) and REST API for automation
+- Built-in UI monitoring dashboard
+- Support for all offline store backends via SQL push-down
+
+The GE-based integration may be removed in a future release.
+
 ## References
 
 - Original RFC: Feast RFC-027: Data Quality Monitoring 
 - Implementation: `sdk/python/feast/dqm/`, `sdk/python/feast/saved_dataset.py`
-- Documentation: [Data Quality Monitoring](../reference/dqm.md)
+- Documentation: [Data Quality Monitoring (deprecated)](../reference/dqm.md)
+- **New system:** [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md)
diff --git a/docs/getting-started/concepts/dataset.md b/docs/getting-started/concepts/dataset.md
@@ -1,6 +1,6 @@
 # \[Alpha] Saved dataset
 
-Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. [Data Quality Monitoring](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98) was the primary motivation for creating dataset concept.
+Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. Data Quality Monitoring was the original motivation for creating the dataset concept. Note that the Great Expectations-based validation that used saved datasets is now deprecated in favor of Feast's built-in [Feature Quality Monitoring](../../how-to-guides/feature-monitoring.md) system, which does not require saved datasets.
 
 Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the [offline store](../components/offline-store.md).
 
diff --git a/docs/how-to-guides/customizing-feast/adding-a-new-offline-store.md b/docs/how-to-guides/customizing-feast/adding-a-new-offline-store.md
@@ -51,7 +51,7 @@ To fully implement the interface for the offline store, you will need to impleme
 * `pull_latest_from_table_or_query` is invoked when running materialization (using the `feast materialize` or `feast materialize-incremental` commands, or the corresponding `FeatureStore.materialize()` method. This method pull data from the offline store, and the `FeatureStore` class takes care of writing this data into the online store.
 * `get_historical_features` is invoked when reading values from the offline store using the `FeatureStore.get_historical_features()` method. Typically, this method is used to retrieve features when training ML models.
 * (optional) `offline_write_batch` is a method that supports directly pushing a pyarrow table to a feature view. Given a feature view with a specific schema, this function should write the pyarrow table to the batch source defined. More details about the push api can be found [here](../docs/reference/data-sources/push.md). This method only needs implementation if you want to support the push api in your offline store.
-* (optional) `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date. This method is only used for **SavedDatasets** as part of data quality monitoring validation.
+* (optional) `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date. This method is used for **SavedDatasets** and as a fallback compute path for the [Feature Quality Monitoring](../../how-to-guides/feature-monitoring.md) system (backends without native SQL push-down).
 * (optional) `write_logged_features` is a method that takes a pyarrow table or a path that points to a parquet file and writes the data to a defined source defined by `LoggingSource` and `LoggingConfig`. This method is only used internally for **SavedDatasets**.
 
 {% code title="feast_custom_offline_store/file.py" %}
diff --git a/docs/how-to-guides/feature-monitoring.md b/docs/how-to-guides/feature-monitoring.md
@@ -462,3 +462,11 @@ The monitoring page is always accessible in the sidebar. To see actual data:
 
 2. Run `feast apply` — this computes baseline metrics automatically
 3. Schedule `feast monitor run` (or click "Compute Metrics" in the UI) to generate daily/weekly/monthly metrics
+
+## Related: Operational and SOX Metrics
+
+Feature Quality Monitoring focuses on **data-level** metrics (distributions, null rates, drift). Feast also provides **operational metrics** for infrastructure observability:
+
+- **Prometheus metrics** (`feast_offline_store_*`, `feast_online_store_*`) — latency, throughput, and error rates for offline/online store operations. See [Python Feature Server — Metrics](../reference/feature-servers/python-feature-server.md).
+- **SOX audit logging** (`feast.audit`) — structured audit events for compliance tracking of feature store operations.
+- **OpenTelemetry integration** — distributed tracing for feature serving requests. See [OpenTelemetry Integration](../getting-started/components/open-telemetry.md).
diff --git a/docs/reference/codebase-structure.md b/docs/reference/codebase-structure.md
@@ -28,7 +28,7 @@ The majority of Feast logic lives in these Python files:
 
 There are also several important submodules:
 * `infra/` contains all the infrastructure components, such as the provider, offline store, online store, batch materialization engine, and registry.
-* `dqm/` covers data quality monitoring, such as the dataset profiler.
+* `dqm/` covers data quality monitoring. The legacy Great Expectations profiler (`profilers/ge_profiler`) is deprecated; see [`monitoring/`](../../sdk/python/feast/monitoring/) for the current built-in monitoring system.
 * `diff/` covers the logic for determining how to apply infrastructure changes upon feature repo changes (e.g. the output of `feast plan` and `feast apply`).
 * `embedded_go/` covers the Go feature server.
 * `ui/` contains the embedded Web UI, to be launched on the `feast ui` command.
diff --git a/docs/reference/dqm.md b/docs/reference/dqm.md
@@ -1,67 +1,50 @@
 # Data Quality Monitoring
 
-Data Quality Monitoring (DQM) is a Feast module aimed to help users to validate their data with the user-curated set of rules.
-Validation could be applied during:
-* Historical retrieval (training dataset generation)
-* [planned] Writing features into an online store
-* [planned] Reading features from an online store
+{% hint style="warning" %}
+**Deprecated:** The Great Expectations-based validation described on this page is deprecated and will be removed in a future release. It has been superseded by Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system, which provides richer metrics (histograms, percentiles, drift detection), works across batch data and serving logs, requires no external dependencies, and includes a built-in UI dashboard.
 
-Its goal is to address several complex data problems, namely:
-* Data consistency - new training datasets can be significantly different from previous datasets. This might require a change in model architecture.
-* Issues/bugs in the upstream pipeline - bugs in upstream pipelines can cause invalid values to overwrite existing valid values in an online store.
-* Training/serving skew - distribution shift could significantly decrease the performance of the model.
+Please migrate to the new monitoring system. See the [Feature Quality Monitoring guide](../how-to-guides/feature-monitoring.md) for setup instructions.
+{% endhint %}
 
-> To monitor data quality, we check that the characteristics of the tested dataset (aka the tested dataset's profile) are "equivalent" to the characteristics of the reference dataset.
-> How exactly profile equivalency should be measured is up to the user. 
+## Legacy: Great Expectations Integration
+
+The following documents the deprecated Great Expectations-based validation that was previously the only DQM option in Feast. This integration relied on `pip install 'feast[ge]'` and only supported validation during historical retrieval.
+
+---
 
 ### Overview
 
-The validation process consists of the following steps:
-1. User prepares reference dataset (currently only [saved datasets](../getting-started/concepts/dataset.md) from historical retrieval are supported).
-2. User defines profiler function, which should produce profile by given dataset (currently only profilers based on [Great Expectations](https://docs.greatexpectations.io) are allowed).
-3. Validation of tested dataset is performed with reference dataset and profiler provided as parameters.
+The legacy validation process consists of the following steps:
+1. User prepares reference dataset (only [saved datasets](../getting-started/concepts/dataset.md) from historical retrieval are supported).
+2. User defines a profiler function that produces a profile using [Great Expectations](https://docs.greatexpectations.io).
+3. Validation of the tested dataset is performed with the reference dataset and profiler provided as parameters.
 
-### Preparations
-Feast with Great Expectations support can be installed via
+### Installation
 ```shell
 pip install 'feast[ge]'
 ```
 
 ### Dataset profile
-Currently, Feast supports only [Great Expectation's](https://greatexpectations.io/) [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite)
-as dataset's profile. Hence, the user needs to define a function (profiler) that would receive a dataset and return an [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite).
 
-Great Expectations supports automatic profiling as well as manually specifying expectations:
+This integration uses [Great Expectation's](https://greatexpectations.io/) [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite)
+as the dataset profile format. The user defines a profiler function that receives a dataset and returns an ExpectationSuite.
+
 ```python
 from great_expectations.dataset import Dataset
 from great_expectations.core.expectation_suite import ExpectationSuite
 
 from feast.dqm.profilers.ge_profiler import ge_profiler
 
-@ge_profiler
-def automatic_profiler(dataset: Dataset) -> ExpectationSuite:
-    from great_expectations.profile.user_configurable_profiler import UserConfigurableProfiler
-
-    return UserConfigurableProfiler(
-        profile_dataset=dataset,
-        ignored_columns=['conv_rate'],
-        value_set_threshold='few'
-    ).build_suite()
-```
-However, from our experience capabilities of automatic profiler are quite limited. So we would recommend crafting your own expectations:
-```python
 @ge_profiler
 def manual_profiler(dataset: Dataset) -> ExpectationSuite:
     dataset.expect_column_max_to_be_between("column", 1, 2)
     return dataset.get_expectation_suite()
 ```
 
-
-
 ### Validating Training Dataset
+
 During retrieval of historical features, `validation_reference` can be passed as a parameter to methods `.to_df(validation_reference=...)` or `.to_arrow(validation_reference=...)` of RetrievalJob.
-If parameter is provided Feast will run validation once dataset is materialized. In case if validation successful materialized dataset is returned.
-Otherwise, `feast.dqm.errors.ValidationFailed` exception would be raised. It will consist of all details for expectations that didn't pass.
+If validation is successful, the materialized dataset is returned. Otherwise, `feast.dqm.errors.ValidationFailed` exception is raised with details for expectations that didn't pass.
 
 ```python
 from feast import FeatureStore
@@ -75,3 +58,32 @@ job.to_df(
         .as_reference(profiler=manual_profiler)
 )
 ```
+
+---
+
+## Migration Guide
+
+The new [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system replaces this integration with:
+
+| Capability | GE-based (deprecated) | New DQM |
+|---|---|---|
+| Scope | Historical retrieval only | Batch data + serving logs |
+| Dependencies | `feast[ge]` extra required | No extra dependencies |
+| Metrics | User-defined expectations | Automatic: null rates, percentiles, histograms, drift |
+| UI | None | Built-in monitoring dashboard |
+| Automation | Manual profiler code | `feast monitor run` CLI + REST API |
+| Backends | Limited | All offline store backends |
+
+To migrate:
+
+1. Enable DQM in `feature_store.yaml`:
+   ```yaml
+   data_quality_monitoring:
+     auto_baseline: true
+   ```
+
+2. Run `feast apply` to compute baseline metrics automatically.
+
+3. Schedule `feast monitor run` for ongoing monitoring.
+
+4. Remove the `feast[ge]` dependency from your requirements.
diff --git a/docs/roadmap.md b/docs/roadmap.md
@@ -89,7 +89,8 @@ The list below contains the functionality that contributors are planning to deve
   * [x] [Offline Feature Server (alpha)](https://docs.feast.dev/reference/feature-servers/offline-feature-server)
   * [x] [Registry server (alpha)](https://github.com/feast-dev/feast/blob/master/docs/reference/feature-servers/registry-server.md)
 * **Data Quality Management (See [RFC](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98/edit))**
-  * [x] Data profiling and validation (Great Expectations)
+  * [x] ~~Data profiling and validation (Great Expectations)~~ (deprecated)
+  * [x] [Feature Quality Monitoring](https://docs.feast.dev/how-to-guides/feature-monitoring) — built-in metrics, drift detection, serving log monitoring, and UI dashboard
 * **Feature Discovery and Governance**
   * [x] Python SDK for browsing feature registry
   * [x] CLI for browsing feature registry
diff --git a/docs/tutorials/validating-historical-features.md b/docs/tutorials/validating-historical-features.md
@@ -1,5 +1,9 @@
 # Validating historical features with Great Expectations
 
+{% hint style="warning" %}
+**Deprecated:** This tutorial demonstrates the legacy Great Expectations-based validation which is deprecated. For new projects, use Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system which provides automatic metrics computation, drift detection, and a monitoring UI — with no external dependencies required. See also the [Monitoring Quickstart notebook](../../examples/monitoring/monitoring-quickstart.ipynb).
+{% endhint %}
+
 In this tutorial, we will use the public dataset of Chicago taxi trips to present data validation capabilities of Feast.
 - The original dataset is stored in BigQuery and consists of raw data for each taxi trip (one row per trip) since 2013.
 - We will generate several training datasets (aka historical features in Feast) for different periods and evaluate expectations made on one dataset against another.
diff --git a/infra/website/docs/blog/feast-mlflow-kubeflow.md b/infra/website/docs/blog/feast-mlflow-kubeflow.md