Skip to content

Commit e70ba39

Browse files
authored
Merge branch 'master' into feat/aerospike-online-store
2 parents 2c37c0f + 58016bc commit e70ba39

32 files changed

Lines changed: 513 additions & 151 deletions

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,8 @@ The list below contains the functionality that contributors are planning to deve
255255
* [x] [Offline Feature Server (alpha)](https://docs.feast.dev/reference/feature-servers/offline-feature-server)
256256
* [x] [Registry server (alpha)](https://github.com/feast-dev/feast/blob/master/docs/reference/feature-servers/registry-server.md)
257257
* **Data Quality Management (See [RFC](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98/edit))**
258-
* [x] Data profiling and validation (Great Expectations)
258+
* [x] ~~Data profiling and validation (Great Expectations)~~ (deprecated)
259+
* [x] [Feature Quality Monitoring](https://docs.feast.dev/how-to-guides/feature-monitoring) — built-in metrics, drift detection, serving log monitoring, and UI dashboard
259260
* **Feature Discovery and Governance**
260261
* [x] Python SDK for browsing feature registry
261262
* [x] CLI for browsing feature registry

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ Feast helps ML platform/MLOps teams with DevOps experience productionize real-ti
7171
* **batch feature engineering**: Feast supports on-demand and streaming transformations. Feast is also investing in supporting batch transformations.
7272
* **native streaming feature integration:** Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines.
7373
* **lineage:** Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py).
74-
* **data quality / drift detection**: Feast has experimental integrations with [Great Expectations](https://greatexpectations.io/), but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions.
74+
* **data quality / drift detection**: Feast now includes built-in [Feature Quality Monitoring](how-to-guides/feature-monitoring.md) that computes statistical metrics (null rates, distributions, percentiles), detects drift across batch data and serving logs, and provides a monitoring UI dashboard. The older Great Expectations integration is deprecated.
7575

7676
## Example use cases
7777

docs/SUMMARY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@
5757
* [Fraud detection on GCP](tutorials/tutorials-overview/fraud-detection.md)
5858
* [Real-time credit scoring on AWS](tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md)
5959
* [Driver stats on Snowflake](tutorials/tutorials-overview/driver-stats-on-snowflake.md)
60-
* [Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
60+
* [\[Deprecated\] Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
6161
* [Building streaming features](tutorials/building-streaming-features.md)
6262
* [Retrieval Augmented Generation (RAG) with Feast](tutorials/rag-with-docling.md)
6363
* [RAG Fine Tuning with Feast and Milvus](../examples/rag-retriever/README.md)
@@ -206,7 +206,7 @@
206206
* [\[Beta\] On demand feature view](reference/beta-on-demand-feature-view.md)
207207
* [\[Alpha\] Static Artifacts Loading](reference/alpha-static-artifacts.md)
208208
* [\[Alpha\] Vector Database](reference/alpha-vector-database.md)
209-
* [\[Alpha\] Data quality monitoring](reference/dqm.md)
209+
* [\[Deprecated\] Data quality monitoring (Great Expectations)](reference/dqm.md)
210210
* [\[Alpha\] Streaming feature computation with Denormalized](reference/denormalized.md)
211211
* [\[Alpha\] Feature View Versioning](reference/alpha-feature-view-versioning.md)
212212
* [OpenLineage Integration](reference/openlineage.md)

docs/adr/ADR-0011-data-quality-monitoring.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,21 @@ If validation fails, a `ValidationFailed` exception is raised with details for a
8383
- Dependency on Great Expectations adds to the install footprint (optional via `feast[ge]`).
8484
- Automatic profiling capabilities are limited; manual expectation crafting is recommended.
8585

86+
## Superseded
87+
88+
This ADR documents the original GE-based approach which is now **deprecated**. It has been superseded by Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system (introduced in 2025), which provides:
89+
90+
- Automatic metric computation (null rates, percentiles, histograms) with no external dependencies
91+
- Monitoring across batch data and serving logs
92+
- CLI (`feast monitor run`) and REST API for automation
93+
- Built-in UI monitoring dashboard
94+
- Support for all offline store backends via SQL push-down
95+
96+
The GE-based integration may be removed in a future release.
97+
8698
## References
8799

88100
- Original RFC: Feast RFC-027: Data Quality Monitoring
89101
- Implementation: `sdk/python/feast/dqm/`, `sdk/python/feast/saved_dataset.py`
90-
- Documentation: [Data Quality Monitoring](../reference/dqm.md)
102+
- Documentation: [Data Quality Monitoring (deprecated)](../reference/dqm.md)
103+
- **New system:** [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md)

docs/getting-started/architecture/model-inference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ of model inference):
1717

1818
*Note: online features can be sourced from batch, streaming, or request data sources.*
1919

20-
These three approaches have different tradeoffs but, in general, have significant implementation differences.
20+
These four approaches have different tradeoffs but, in general, have significant implementation differences.
2121

2222
## 1. Online Model Inference with Online Features
2323
Online model inference with online features is a powerful approach to serving data-driven machine learning applications.
@@ -78,7 +78,7 @@ if features.to_dict().get('user_data:model_predictions') is None:
7878
model_predictions = model_server.predict(features)
7979
store.write_to_online_store(feature_view_name="user_data", df=pd.DataFrame(model_predictions))
8080
```
81-
Note that in this case a seperate call to `write_to_online_store` is required when the underlying data changes and
81+
Note that in this case a separate call to `write_to_online_store` is required when the underlying data changes and
8282
predictions change along with it.
8383

8484
```python

docs/getting-started/components/compute-engine.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ functions (UDFs).
88

99
A materialization task abstracts over specific technologies or frameworks that are used to materialize data. It allows
1010
users to use a pure local serialized approach (which is the default LocalComputeEngine), or delegates the
11-
materialization to seperate components (e.g. AWS Lambda, as implemented by the the LambdaComputeEngine).
11+
materialization to separate components (e.g. AWS Lambda, as implemented by the LambdaComputeEngine).
1212

1313
If the built-in engines are not sufficient, you can create your own custom materialization engine. Please
1414
see [this guide](../../how-to-guides/customizing-feast/creating-a-custom-compute-engine.md) for more details.

docs/getting-started/concepts/dataset.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# \[Alpha] Saved dataset
22

3-
Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. [Data Quality Monitoring](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98) was the primary motivation for creating dataset concept.
3+
Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. Data Quality Monitoring was the original motivation for creating the dataset concept. Note that the Great Expectations-based validation that used saved datasets is now deprecated in favor of Feast's built-in [Feature Quality Monitoring](../../how-to-guides/feature-monitoring.md) system, which does not require saved datasets.
44

55
Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the [offline store](../components/offline-store.md).
66

docs/getting-started/concepts/feast-types.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Feast's type system is built on top of [protobuf](https://github.com/protocolbuf
88
Feast supports the following categories of data types:
99

1010
- **Primitive types**: numerical values (`Int32`, `Int64`, `Float32`, `Float64`), `String`, `Bytes`, `Bool`, and `UnixTimestamp`.
11+
- **Zoned timestamp type**: `ZonedTimestamp` stores a timezone-aware datetime as both the UTC instant and its originating zone, so the original wall-clock zone round-trips losslessly. This differs from `UnixTimestamp`, which is always decoded as UTC and discards the source zone. Use `ZonedTimestamp` when local time-of-day or the offset/zone itself is meaningful. It must be explicitly declared in schema (it is not inferred by any backend), and is not supported as an entity key.
1112
- **Domain-specific primitives**: `PdfBytes` (PDF binary data for RAG/document pipelines) and `ImageBytes` (image binary data for multimodal pipelines). These are semantic aliases over `Bytes` and must be explicitly declared in schema — no backend infers them.
1213
- **UUID types**: `Uuid` and `TimeUuid` for universally unique identifiers. Stored as strings at the proto level but deserialized to `uuid.UUID` objects in Python.
1314
- **Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`, `Array(Uuid)`.

docs/getting-started/concepts/feature-view.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ If the `schema` parameter is not specified in the creation of the feature view,
9191

9292
"Entity aliases" can be specified to join `entity_dataframe` columns that do not match the column names in the source table of a FeatureView.
9393

94-
This could be used if a user has no control over these column names or if there are multiple entities are a subclass of a more general entity. For example, "spammer" and "reporter" could be aliases of a "user" entity, and "origin" and "destination" could be aliases of a "location" entity as shown below.
94+
This could be used if a user has no control over these column names or if multiple entities are subclasses of a more general entity. For example, "spammer" and "reporter" could be aliases of a "user" entity, and "origin" and "destination" could be aliases of a "location" entity as shown below.
9595

9696
It is suggested that you dynamically specify the new FeatureView name using `.with_name` and `join_key_map` override using `.with_join_key_map` instead of needing to register each new copy.
9797

@@ -322,4 +322,4 @@ def driver_hourly_stats_stream(df: DataFrame):
322322
)
323323
```
324324

325-
See [here](https://github.com/feast-dev/streaming-tutorial) for a example of how to use stream feature views to register your own streaming data pipelines in Feast.
325+
See [here](https://github.com/feast-dev/streaming-tutorial) for an example of how to use stream feature views to register your own streaming data pipelines in Feast.

docs/getting-started/quickstart.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
Feast (Feature Store) is an open-source feature store designed to facilitate the management and serving of machine learning features in a way that supports both batch and real-time applications.
66

7-
* *For Data Scientists*: Feast is a a tool where you can easily define, store, and retrieve your features for both model development and model deployment. By using Feast, you can focus on what you do best: build features that power your AI/ML models and maximize the value of your data.
7+
* *For Data Scientists*: Feast is a tool where you can easily define, store, and retrieve your features for both model development and model deployment. By using Feast, you can focus on what you do best: build features that power your AI/ML models and maximize the value of your data.
88

99
* *For MLOps Engineers*: Feast is a library that allows you to connect your existing infrastructure (e.g., online database, application server, microservice, analytical database, and orchestration tooling) that enables your Data Scientists to ship features for their models to production using a friendly SDK without having to be concerned with software engineering challenges that occur from serving real-time production systems. By using Feast, you can focus on maintaining a resilient system, instead of implementing features for Data Scientists.
1010

0 commit comments

Comments
 (0)