feast-dev
diff --git a/‎module_3/README.md‎
Lines changed: 42 additions & 21 deletions b/‎module_3/README.md‎
Lines changed: 42 additions & 21 deletions
diff --git a/‎module_3/data/credit_scores.parquet‎
306 KB b/‎module_3/data/credit_scores.parquet‎
306 KB
diff --git a/‎module_3/data/transactions.parquet‎
149 KB b/‎module_3/data/transactions.parquet‎
149 KB
diff --git a/‎module_3/docker-compose.yml‎
Lines changed: 32 additions & 0 deletions b/‎module_3/docker-compose.yml‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎module_3/feature_repo/data_sources.py‎
Lines changed: 6 additions & 8 deletions b/‎module_3/feature_repo/data_sources.py‎
Lines changed: 6 additions & 8 deletions
diff --git a/‎module_3/feature_repo/features.py‎
Lines changed: 1 addition & 1 deletion b/‎module_3/feature_repo/features.py‎
Lines changed: 1 addition & 1 deletion
@@ -1,6 +1,6 @@
 <h1>Module 3: Orchestrated batch transformations using dbt + Airflow with Feast (Snowflake)</h1>
 
-> **Note:** This module is still WIP, and does not have a public data set to use
+> **Note:** This module is still WIP, and does not have a public data set to use. There is a smaller dataset visible in `data/`
 
 This is a very similar module to module 1. The key difference is now we'll be using a data warehouse (Snowflake) in combination with dbt + Airflow to ensure that batch features are regularly generated. 
 
@@ -21,7 +21,7 @@ This is a very similar module to module 1. The key difference is now we'll be us
 - [Workshop](#workshop)
   - [Step 1: Install Feast](#step-1-install-feast)
   - [Step 2: Inspect the `feature_store.yaml`](#step-2-inspect-the-feature_storeyaml)
-  - [Step 3: Spin up services (Redis + Feast SQL Registry + Feast services)](#step-3-spin-up-services-redis--feast-sql-registry--feast-services)
+  - [Step 3: Spin up services (Kafka + Redis + Feast SQL Registry + Feast services)](#step-3-spin-up-services-kafka--redis--feast-sql-registry--feast-services)
   - [Step 4: Set up dbt models for batch transformations](#step-4-set-up-dbt-models-for-batch-transformations)
   - [Step 5: Run `feast apply`](#step-5-run-feast-apply)
   - [Step 6: Set up orchestration](#step-6-set-up-orchestration)
@@ -30,8 +30,9 @@ This is a very similar module to module 1. The key difference is now we'll be us
     - [6c: Enable the DAG](#6c-enable-the-dag)
       - [Q: What if different feature views have different freshness requirements?](#q-what-if-different-feature-views-have-different-freshness-requirements)
     - [Step 6d (optional): Run a backfill](#step-6d-optional-run-a-backfill)
-  - [Step 7: Run `get_historical_features` and `get_online_features`](#step-7-run-get_historical_features-and-get_online_features)
-  - [Step 8: Streaming](#step-8-streaming)
+  - [Step 7: Retrieve features + test stream ingestion](#step-7-retrieve-features--test-stream-ingestion)
+    - [Overview](#overview)
+    - [Time to run code!](#time-to-run-code)
 - [Conclusion](#conclusion)
   - [Limitations](#limitations)
   - [Why Feast?](#why-feast)
@@ -70,21 +71,25 @@ offline_store:
 entity_key_serialization_version: 2
 ```
 
-##  Step 3: Spin up services (Redis + Feast SQL Registry + Feast services)
+##  Step 3: Spin up services (Kafka + Redis + Feast SQL Registry + Feast services)
 
 We use Docker Compose to spin up the services we need.
 - This deploys an instance of Redis, Postgres for a registry, a Feast feature server + push server.
+- This also uses `transactions.parquet` to generate streaming feature values to ingest into the online store with dummy timestamps
 
 Start up the Docker daemon and then use Docker Compose to spin up the services as described above:
 - You may need to run `sudo docker-compose up` if you run into a Docker permission denied error
 ```console
 $ docker-compose up
 
 Creating network "module_3_default" with the default driver
-Creating redis    ... done
-Creating registry ... done
+Creating registry  ... done
+Creating zookeeper ... done
+Creating redis     ... done
+Creating broker    ... done
+Creating tx_kafka_events ... done
 Creating feast_feature_server ... done
-Attaching to redis, registry, feast_feature_server
+Attaching to zookeeper, redis, registry, broker, kafka_events, feast_feature_server
 ...
 ```
 
@@ -106,15 +111,28 @@ This will create the initial tables we need for Feast
 In this example, we're using a test database in Snowflake. 
 
 To get started, go ahead and register the feature repository
-```bash
-# Note: first you need to export environment variables matching the above variables:
-# export SNOWFLAKE_DEPLOYMENT_URL="[YOUR DEPLOYMENT]
-# export SNOWFLAKE_USER="[YOUR USER]
-# export SNOWFLAKE_PASSWORD="[YOUR PASSWORD]
-# export SNOWFLAKE_ROLE="[YOUR ROLE]
-# export SNOWFLAKE_WAREHOUSE="[YOUR WAREHOUSE]
-# export SNOWFLAKE_DATABASE="[YOUR DATABASE]
-cd feature_repo; feast apply
+```console
+<!-- 
+Note: first you need to export environment variables 
+matching the above variables:
+
+export SNOWFLAKE_DEPLOYMENT_URL="[YOUR DEPLOYMENT]
+export SNOWFLAKE_USER="[YOUR USER]
+export SNOWFLAKE_PASSWORD="[YOUR PASSWORD]
+export SNOWFLAKE_ROLE="[YOUR ROLE]
+export SNOWFLAKE_WAREHOUSE="[YOUR WAREHOUSE]
+export SNOWFLAKE_DATABASE="[YOUR DATABASE]
+-->
+$ cd feature_repo; feast apply
+
+Created entity user
+Created feature view aggregate_transactions_features
+Created feature view credit_scores_features
+Created feature service model_v1
+Created feature service model_v2
+
+Deploying infrastructure for aggregate_transactions_features
+Deploying infrastructure for credit_scores_features
 ```
 ## Step 6: Set up orchestration
 ### Step 6a: Setting up Airflow to work with dbt
@@ -217,11 +235,11 @@ airflow dags backfill \
     feature_dag
 ```
 
-## Step 7: Run `get_historical_features` and `get_online_features`
-Run [Jupyter notebook](feature_repo/module_3.ipynb)
+## Step 7: Retrieve features + test stream ingestion
+### Overview
+Feast exposes a `get_historical_features` method to generate training data / run batch scoring and `get_online_features` method to power model serving.
 
-## Step 8: Streaming
-There are two broad approaches with streaming
+To achieve fresher features, one might consider using streaming compute.There are two broad approaches with streaming
 1. **[Simple, semi-fresh features]** Use data warehouse / data lake specific streaming ingest of raw data.
    - This means that Feast only needs to know about a "batch feature" because the assumption is those batch features are sufficiently fresh.
    - **BUT** there are limits to how fresh your features are. You won't be able to get to minute level freshness.
@@ -230,6 +248,9 @@ There are two broad approaches with streaming
 
 Feast will help enforce a consistent schema across batch + streaming features as they land in the online store. 
 
+### Time to run code!
+Now, Run [Jupyter notebook](feature_repo/module_3.ipynb)
+
 # Conclusion
 By the end of this module, you will have learned how to build a full feature platform, with orchestrated batch transformations (using dbt + Airflow), orchestrated materialization (with Feast + Airflow).
 
 
@@ -1,6 +1,38 @@
 ---
 version: '3'
 services:
+  zookeeper:
+    image: confluentinc/cp-zookeeper:7.0.1
+    container_name: zookeeper
+    environment:
+      ZOOKEEPER_CLIENT_PORT: 2181
+      ZOOKEEPER_TICK_TIME: 2000
+
+  broker:
+    image: confluentinc/cp-kafka:7.0.1
+    container_name: broker
+    ports:
+      - "9092:9092"
+      - "29092:29092"
+    depends_on:
+      - zookeeper
+    environment:
+      KAFKA_BROKER_ID: 1
+      KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
+      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
+      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://broker:29092
+      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
+      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
+      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
+
+  tx_kafka_events:
+    build:
+      context: .
+      dockerfile: kafka_demo/Dockerfile
+    depends_on:
+      - broker
+    container_name: tx_kafka_events
+
   redis:
     image: redis
     container_name: redis
 
@@ -11,25 +11,23 @@
     timestamp_field="TIMESTAMP",
 )
 
-aggregate_transactions_source = SnowflakeSource(
-    name="transactions_7d_source",
+aggregate_transactions_batch = SnowflakeSource(
+    name="transactions_7d_batch",
     database=yaml.safe_load(open("feature_store.yaml"))["offline_store"]["database"],
     table="AGGREGATE_TRANSACTION_FEATURES",
     schema="FRAUD",
     timestamp_field="TIMESTAMP",
     tags={"dbtModel": "models/example/aggregate_transaction_features.sql"},
 )
 
+aggregate_transactions_push = PushSource(
+    name="transactions_7d", batch_source=aggregate_transactions_batch
+)
+
 credit_scores = SnowflakeSource(
     name="credit_scores_source",
     database=yaml.safe_load(open("feature_store.yaml"))["offline_store"]["database"],
     query="SELECT USER_ID, DATE, CREDIT_SCORE, TIMESTAMP FROM CREDIT_SCORES",
     schema="FRAUD",
     timestamp_field="TIMESTAMP",
 )
-
-# A push source is useful if you have upstream systems that transform features (e.g. stream processing jobs)
-driver_stats_push_source = PushSource(
-    name="driver_stats_push_source",
-    batch_source=transactions_source,
-)
@@ -34,7 +34,7 @@
         Field(name="7D_AVG_AMT", dtype=Float32),
     ],
     online=True,
-    source=aggregate_transactions_source,
+    source=aggregate_transactions_push,
     tags={"production": "True"},
     owner="test2@gmail.com",
 )
Original file line number	Diff line number	Diff line change
`@@ -34,7 +34,7 @@`
`34`	`34`	`Field(name="7D_AVG_AMT", dtype=Float32),`
`35`	`35`	`],`
`36`	`36`	`online=True,`
`37`		`- source=aggregate_transactions_source,`
	`37`	`+ source=aggregate_transactions_push,`
`38`	`38`	`tags={"production": "True"},`
`39`	`39`	`owner="test2@gmail.com",`
`40`	`40`	`)`