You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: module_3/README.md
+42-21Lines changed: 42 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
<h1>Module 3: Orchestrated batch transformations using dbt + Airflow with Feast (Snowflake)</h1>
2
2
3
-
> **Note:** This module is still WIP, and does not have a public data set to use
3
+
> **Note:** This module is still WIP, and does not have a public data set to use. There is a smaller dataset visible in `data/`
4
4
5
5
This is a very similar module to module 1. The key difference is now we'll be using a data warehouse (Snowflake) in combination with dbt + Airflow to ensure that batch features are regularly generated.
6
6
@@ -21,7 +21,7 @@ This is a very similar module to module 1. The key difference is now we'll be us
21
21
-[Workshop](#workshop)
22
22
-[Step 1: Install Feast](#step-1-install-feast)
23
23
-[Step 2: Inspect the `feature_store.yaml`](#step-2-inspect-the-feature_storeyaml)
Created feature view aggregate_transactions_features
130
+
Created feature view credit_scores_features
131
+
Created feature service model_v1
132
+
Created feature service model_v2
133
+
134
+
Deploying infrastructure for aggregate_transactions_features
135
+
Deploying infrastructure for credit_scores_features
118
136
```
119
137
## Step 6: Set up orchestration
120
138
### Step 6a: Setting up Airflow to work with dbt
@@ -217,11 +235,11 @@ airflow dags backfill \
217
235
feature_dag
218
236
```
219
237
220
-
## Step 7: Run `get_historical_features` and `get_online_features`
221
-
Run [Jupyter notebook](feature_repo/module_3.ipynb)
238
+
## Step 7: Retrieve features + test stream ingestion
239
+
### Overview
240
+
Feast exposes a `get_historical_features` method to generate training data / run batch scoring and `get_online_features` method to power model serving.
222
241
223
-
## Step 8: Streaming
224
-
There are two broad approaches with streaming
242
+
To achieve fresher features, one might consider using streaming compute.There are two broad approaches with streaming
225
243
1. **[Simple, semi-fresh features]** Use data warehouse / data lake specific streaming ingest of raw data.
226
244
- This means that Feast only needs to know about a "batch feature" because the assumption is those batch features are sufficiently fresh.
227
245
- **BUT** there are limits to how fresh your features are. You won't be able to get to minute level freshness.
@@ -230,6 +248,9 @@ There are two broad approaches with streaming
230
248
231
249
Feast will help enforce a consistent schema across batch + streaming features as they land in the online store.
232
250
251
+
### Time to run code!
252
+
Now, Run [Jupyter notebook](feature_repo/module_3.ipynb)
253
+
233
254
# Conclusion
234
255
By the end of this module, you will have learned how to build a full feature platform, with orchestrated batch transformations (using dbt + Airflow), orchestrated materialization (with Feast + Airflow).
0 commit comments