Skip to content

Commit 8d206dc

Browse files
authored
Merge pull request #15 from BigDataIA-Spring2025-4/feature-cleanup
Updated on structure, diagrams addition and readme updation
2 parents ba30b5b + 2a3d350 commit 8d206dc

10 files changed

Lines changed: 143 additions & 9 deletions

File tree

FRED_0_START.ipynb

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,15 @@
88
"name": "cell4"
99
},
1010
"source": [
11-
"# Setting Up Snowflake"
11+
"### Setting Up Snowflake"
12+
]
13+
},
14+
{
15+
"cell_type": "markdown",
16+
"id": "2e70b640",
17+
"metadata": {},
18+
"source": [
19+
"Welcome to the beginning of the Quickstart! Please refer to the Snowflake Notebook Data Engineering Quickstarter for all the details including set up steps. The same hase been provided in the github's Readme.md"
1220
]
1321
},
1422
{
@@ -24,11 +32,11 @@
2432
"source": [
2533
"SET MY_USER = CURRENT_USER();\n",
2634
"\n",
27-
"-- Check on this \n",
28-
"SET GITHUB_SECRET_USERNAME = '##############';\n",
29-
"SET GITHUB_SECRET_PASSWORD = '#####################';\n",
30-
"SET GITHUB_URL_PREFIX = 'https://github.com/#####################';\n",
31-
"SET GITHUB_REPO_ORIGIN = 'https://github.com/##############################';"
35+
"-- Update as per your github repository values\n",
36+
"SET GITHUB_SECRET_USERNAME = 'username';\n",
37+
"SET GITHUB_SECRET_PASSWORD = 'personal access token';\n",
38+
"SET GITHUB_URL_PREFIX = 'https://github.com/BigDataIA-Spring2025-4';\n",
39+
"SET GITHUB_REPO_ORIGIN = 'https://github.com/BigDataIA-Spring2025-4/DAMG7245_Assignment03_Part02.git';"
3240
]
3341
},
3442
{

README.md

Lines changed: 59 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,59 @@
1-
# DAMG7245_Assignment03_Part02
2-
Snowflake Pipeline
1+
# FRED (Federal Reserve Economic Data) - Snowflake Pipelines
2+
3+
### Project Overview :
4+
This repository contains the implementation of a robust data engineering pipeline designed to process financial data from the Federal Reserve Economic Data (FRED) platform, specifically focusing on U.S. Treasury yields for 10-Year and 2-Year bonds. Leveraging Snowflake's Snowpark for Python, the project provides an efficient system for extracting, transforming, and validating financial data to enable advanced analysis and reporting.
5+
6+
7+
### Team Members :
8+
- Vedant Mane
9+
- Abhinav Gangurde
10+
- Yohan Markose
11+
12+
### Resources :
13+
- **Streamlit Application** : [Streamlit App]()
14+
15+
- **Google Codelab**: [Codelab]()
16+
17+
- **Google Docs**: [Project Document]()
18+
19+
- **Video Walkthrough**: [Video]()
20+
21+
### Technologies Used :
22+
- **Streamlit** : Frontend Visual Dataset for analytics
23+
- **AWS S3**: External Cloud Storage
24+
- **Cloud & Storage**: Snowflake, AWS S3
25+
- **ELT & Pipeline**: Snowflake Tasks
26+
- **Snowflake** : Snowpark, UDF, Stored Procedures, Streams, Notebook
27+
28+
### Architecture Diagram :
29+
30+
31+
32+
### Workflow :
33+
34+
1. **Initial Account Creation and Setup** -
35+
To get started with the project, you need to set up the necessary accounts and configurations for both Snowflake and FRED. These accounts form the foundation for creating the data pipelines, storing data, and accessing the required datasets.
36+
37+
- **Snowflake** : The pipelines heavily rely on Snowflake for data storage, transformation, and orchestration.
38+
- **FRED** : The project uses publicly available APIs from FRED (Federal Reserve Economic Data) to fetch U.S. Treasury yield data. To access these APIs, you need an API key from FRED.
39+
40+
2. **Automation of data extraction** -
41+
Github's actions for scheduling the data extraction daily and loading data into secondary data storage in this case AWS S3 bucket.
42+
43+
3. **Snowflake Account Setup** -
44+
Upload the provided FRED_0_START.ipynb from the git in your snowflake account and run for setting up with required database level objects.
45+
46+
4. **Deploy Notebooks** -
47+
Deploy the external git notebook elements in the snowflake and use the for data processing and updation in created snowflake dags.
48+
49+
5. **Schedule the Snowflake DAGs** -
50+
Run the created snowflake tasks and observed the run for the data processing.
51+
52+
6. **Streamlit**
53+
The processed data is observable in the rendered streamlit application
54+
55+
For detailed guide and steps to run the data pipelines following through the offical quickstarter guide. -
56+
[FRED (Federal Reserve Economic Data) - Snowflake Data Pipelines Quickstarter Guide](https://docs.google.com/document/d/1jTG4u1Wsswd29oEoYj2Cy0oAIexVLM-iuCtUTEH-1QU/edit?tab=t.0)
57+
58+
### Attestation :
59+
WE ATTEST THAT WE HAVEN’T USED ANY OTHER STUDENTS’ WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK
File renamed without changes.
File renamed without changes.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ snowpark:
1010
handler: "procedure.main"
1111
runtime: "3.10"
1212
signature: ""
13-
returns: string
13+
returns: string
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
from diagrams import Diagram, Cluster, Edge
2+
from diagrams.custom import Custom
3+
from diagrams.aws.storage import S3
4+
from diagrams.saas.analytics import Snowflake # Correct import for Snowflake
5+
from diagrams.programming.flowchart import PredefinedProcess # Use for GitHub Actions
6+
7+
# Create the diagram
8+
with Diagram("Snowflake Data Pipeline", show=True, direction="LR"):
9+
# GitHub Actions for scheduling
10+
github_actions = PredefinedProcess("GitHub Actions (Scheduler)")
11+
12+
# Fred Website Cluster (Frontend)
13+
with Cluster("Fred Website"):
14+
frontend = Custom("Fred Website", "./services/diagrams/src/fred-logo.png")
15+
16+
# AWS S3 Bucket for storing data
17+
with Cluster("AWS"):
18+
s3_bucket = S3("AWS S3 Bucket")
19+
20+
# Snowflake environment
21+
with Cluster("Snowflake"):
22+
raw_table = Snowflake("Raw Table")
23+
harmonized_table = Snowflake("Harmonized Table")
24+
snowflake_task = Snowflake("Snowflake Task (ETL Processing)")
25+
analytics_table = Snowflake("Analytics Table")
26+
27+
# Streamlit Dashboard Cluster (Frontend)
28+
with Cluster("Frontend (Streamlit)") as frontend_cluster:
29+
streamlit_app = Custom("Streamlit UI", "./services/diagrams/src/streamlit.png")
30+
31+
# Data pipeline flow
32+
github_actions >> Edge(label="Extract FRED API Data (Daily)") >> s3_bucket
33+
s3_bucket >> Edge(label="Load to Raw Tables") >> raw_table
34+
raw_table >> Edge(label="Transform to Harmonized Schema") >> harmonized_table
35+
harmonized_table >> Edge(label="Trigger Snowflake Task") >> snowflake_task
36+
snowflake_task >> Edge(label="Load to Analytics") >> analytics_table
37+
analytics_table >> Edge(label="Visualize Data") >> streamlit_app
38+
12.9 KB
Loading
5.86 KB
Loading

snowflake_data_pipeline

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
digraph "Snowflake Data Pipeline" {
2+
graph [fontcolor="#2D3436" fontname="Sans-Serif" fontsize=15 label="Snowflake Data Pipeline" nodesep=0.60 pad=2.0 rankdir=LR ranksep=0.75 splines=ortho]
3+
node [fixedsize=true fontcolor="#2D3436" fontname="Sans-Serif" fontsize=13 height=1.4 imagescale=true labelloc=b shape=box style=rounded width=1.4]
4+
edge [color="#7B8894"]
5+
"6e662c23572c410eadfdfd13bfd71179" [label="GitHub Actions (Scheduler)" height=1.9 image="D:\BigData\Assignment03\DAMG7245_Assignment03_Part02\venv\Lib\site-packages\resources/programming/flowchart\predefined-process.png" shape=none]
6+
subgraph "cluster_Fred Website" {
7+
graph [bgcolor="#E5F5FD" fontname="Sans-Serif" fontsize=12 label="Fred Website" labeljust=l pencolor="#AEB6BE" rankdir=LR shape=box style=rounded]
8+
"7b29f1ab9af141aab2340988517b05da" [label="Fred Website" height=1.9 image="./services/diagrams/src/fred-logo.png" shape=none]
9+
}
10+
subgraph cluster_AWS {
11+
graph [bgcolor="#E5F5FD" fontname="Sans-Serif" fontsize=12 label=AWS labeljust=l pencolor="#AEB6BE" rankdir=LR shape=box style=rounded]
12+
"40e7f5444c934b8b8632cd06fa4bad73" [label="AWS S3 Bucket" height=1.9 image="D:\BigData\Assignment03\DAMG7245_Assignment03_Part02\venv\Lib\site-packages\resources/aws/storage\simple-storage-service-s3.png" shape=none]
13+
}
14+
subgraph cluster_Snowflake {
15+
graph [bgcolor="#E5F5FD" fontname="Sans-Serif" fontsize=12 label=Snowflake labeljust=l pencolor="#AEB6BE" rankdir=LR shape=box style=rounded]
16+
"174d15eccf174b6caf2181e09ae1a442" [label="Raw Table" height=1.9 image="D:\BigData\Assignment03\DAMG7245_Assignment03_Part02\venv\Lib\site-packages\resources/saas/analytics\snowflake.png" shape=none]
17+
b53da7b78ca142248e319b2a27c6c701 [label="Harmonized Table" height=1.9 image="D:\BigData\Assignment03\DAMG7245_Assignment03_Part02\venv\Lib\site-packages\resources/saas/analytics\snowflake.png" shape=none]
18+
"08859dd452f444a4a7f2ab00e997c5be" [label="Snowflake Task (ETL Processing)" height=1.9 image="D:\BigData\Assignment03\DAMG7245_Assignment03_Part02\venv\Lib\site-packages\resources/saas/analytics\snowflake.png" shape=none]
19+
ae1e2a968948448daf6384367ce2ae26 [label="Analytics Table" height=1.9 image="D:\BigData\Assignment03\DAMG7245_Assignment03_Part02\venv\Lib\site-packages\resources/saas/analytics\snowflake.png" shape=none]
20+
}
21+
subgraph "cluster_Frontend (Streamlit)" {
22+
graph [bgcolor="#E5F5FD" fontname="Sans-Serif" fontsize=12 label="Frontend (Streamlit)" labeljust=l pencolor="#AEB6BE" rankdir=LR shape=box style=rounded]
23+
"77db1f9f56e943019d5aa0df32f8cc78" [label="Streamlit UI" height=1.9 image="./services/diagrams/src/streamlit.png" shape=none]
24+
}
25+
"6e662c23572c410eadfdfd13bfd71179" -> "40e7f5444c934b8b8632cd06fa4bad73" [label="Extract FRED API Data (Daily)" dir=forward fontcolor="#2D3436" fontname="Sans-Serif" fontsize=13]
26+
"40e7f5444c934b8b8632cd06fa4bad73" -> "174d15eccf174b6caf2181e09ae1a442" [label="Load to Raw Tables" dir=forward fontcolor="#2D3436" fontname="Sans-Serif" fontsize=13]
27+
"174d15eccf174b6caf2181e09ae1a442" -> b53da7b78ca142248e319b2a27c6c701 [label="Transform to Harmonized Schema" dir=forward fontcolor="#2D3436" fontname="Sans-Serif" fontsize=13]
28+
b53da7b78ca142248e319b2a27c6c701 -> "08859dd452f444a4a7f2ab00e997c5be" [label="Trigger Snowflake Task" dir=forward fontcolor="#2D3436" fontname="Sans-Serif" fontsize=13]
29+
"08859dd452f444a4a7f2ab00e997c5be" -> ae1e2a968948448daf6384367ce2ae26 [label="Load to Analytics" dir=forward fontcolor="#2D3436" fontname="Sans-Serif" fontsize=13]
30+
ae1e2a968948448daf6384367ce2ae26 -> "77db1f9f56e943019d5aa0df32f8cc78" [label="Visualize Data" dir=forward fontcolor="#2D3436" fontname="Sans-Serif" fontsize=13]
31+
}

0 commit comments

Comments
 (0)