This guide walks you through setting up dbt to work with the BigQuery data warehouse you created in Module 3.
Note
This guide assumes you've completed Module 3: Data Warehouse where you:
- Created a GCP project and enabled the BigQuery API
- Created a service account with BigQuery permissions
- Learned how to load data into BigQuery (in the
nytaxidataset)
Module 4 uses different data than Module 3 (green and yellow taxi data for 2019-2020 instead of yellow-only 2024). You'll load the new data in Step 1 below.
Before setting up dbt Cloud, confirm you have the required data and credentials from Module 3.
You should already have a service account JSON key file from Module 3. Make sure it has these permissions:
- BigQuery Data Editor
- BigQuery Job User
- BigQuery User
If you need to create a new service account or download a new key, follow the instructions below.
If you don't have the JSON key file or need to download a new one:
-
Go to Google Cloud Console
-
Navigate to IAM & Admin > Service Accounts
- Or use the search bar and type "Service Accounts"
-
Find your service account in the list
- It should look like:
service-account-name@project-id.iam.gserviceaccount.com - If you don't have a service account yet, click + CREATE SERVICE ACCOUNT and:
- Enter a name (e.g.,
dbt-bigquery-service-account) - Click CREATE AND CONTINUE
- Add these roles:
- BigQuery Admin (or at minimum: BigQuery Data Editor, BigQuery Job User, BigQuery User)
- Click CONTINUE > DONE
- Enter a name (e.g.,
- It should look like:
-
Click on your service account name to open its details
-
Go to the KEYS tab
-
Click ADD KEY > Create new key
-
Select JSON as the key type
-
Click CREATE
-
The JSON key file will automatically download to your computer
- Save it in a secure location
- Never commit this file to Git or share it publicly - it contains credentials to access your GCP resources
The downloaded JSON file will look something like this:
{
"type": "service_account",
"project_id": "your-project-id",
"private_key_id": "...",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "service-account-name@project-id.iam.gserviceaccount.com",
...
}You'll use this JSON file in Step 4 to connect dbt Cloud to BigQuery.
This module uses yellow and green taxi data for 2019-2020, which is different from the data you loaded in Module 3. Using the same approach you learned in Module 3, load the following data into your BigQuery nytaxi dataset:
- Yellow taxi trip records for all months of 2019 and 2020
- Green taxi trip records for all months of 2019 and 2020
Important
Download the data from the DataTalksClub NYC TLC Data repository, not from the official NYC TLC website. The official site has been retroactively updated over the years, so its data differs from what the homework answers are based on.
After loading, verify your data:
- Go to BigQuery Console
- In the Explorer panel on the left, expand your project
- You should see the
nytaxidataset - Expand the
nytaxidataset - you should see tables:green_tripdatayellow_tripdata
When you created your BigQuery datasets in Module 3, you chose a location (e.g., US, EU, us-central1). You'll need to use the same location when configuring dbt.
To check your dataset location:
- In BigQuery Console, click on the
nytaxidataset - Look for Data location in the dataset details
dbt Platform is dbt's cloud-based development environment with a web IDE, scheduler, and collaboration features. dbt offers a free Developer plan. This should be more than enough to learn dbt and follow the course.
Now you'll create a fresh dbt project from scratch in dbt Cloud.
-
Navigate to Account settings (gear icon in the top-right corner) and click + New Project
-
Enter a project name:
- Project name:
taxi_rides_ny
- Project name:
-
Click Continue
After clicking Continue in the previous step, dbt Cloud will prompt you to configure your data warehouse connection.
Tip
If you're not automatically taken to the connection setup, you can also configure it from Account settings > Projects > taxi_rides_ny > Connection.
-
For the connection type, select BigQuery
-
Click Upload a Service Account JSON file
-
Select the service account JSON key file from Module 3
-
dbt will automatically extract:
- Your GCP project ID
- Authentication credentials
-
Dataset: Enter
dbt_prod- This is the base schema name where dbt will create datasets
- dbt will organize your models into schemas like:
dbt_prod_staging- for staging modelsdbt_prod_intermediate- for intermediate modelsdbt_prod_marts- for final analytics tables
-
Location: Select the same location as your
nytaxidataset from Module 3- Example:
US,EU, orus-central1 - This must match your nytaxi dataset location
- You can find this under Optional Settings or Advanced Settings depending on your UI version
- Example:
-
Timeout:
300seconds -
Maximum Bytes Billed: (optional)
- Leave blank for unlimited, OR
- Set a limit like
1000000000(1 GB) to prevent runaway queries
-
Click Test Connection
-
You should see a success message: "Connection test succeeded"
-
Click Continue
dbt Cloud needs a Git repository to store your project code. You have two options:
- Let dbt Manage the Repository (Recommended for Beginners)
- Connect Your Own GitHub Repository (Recommended for Production)
It doesn't matter which one you prefer for this course.
In dbt, environments define different contexts where your data transformations run:
-
Development Environment: Your personal workspace for building and testing models
- Uses your personal credentials
- Creates temporary schemas with your name (e.g.,
dbt_<your_name>) - Changes only affect your work, not production
- Used when working in the dbt Cloud IDE
-
Deployment Environment: The production workspace where final models run on schedule
- Uses service account credentials
- Creates production schemas (e.g.,
dbt_prod_staging,dbt_prod_marts) - Used by scheduled jobs that keep your data warehouse updated
Think of it like having a draft folder (development) and a published folder (deployment) for your analytics code.
dbt Cloud automatically creates a development environment when you set up a project. You don't need to create one manually.
To verify it was created:
- Navigate to Deploy > Environments in the top navigation bar
- You should see a Development environment already listed
If you need to change how dbt connects to BigQuery during development, or adjust your development schema:
- Click your profile icon (bottom-left corner) > Your Profile > Credentials
- Select the credential linked to your project
- From here you can update:
- Development Schema: Where your personal development models will be created
- dbt automatically suggests:
dbt_<your_name>(e.g.,dbt_john_smith) - This schema is separate from production (
dbt_prod)
- dbt automatically suggests:
- Target Name: Leave as
dev(default)
- Development Schema: Where your personal development models will be created
Once your project, connection, and repository are configured, you're ready to start building dbt models.
-
Click Start developing in the Studio IDE
- If you don't see this option, navigate to Develop in the top navigation bar
-
dbt Cloud will initialize your workspace (this may take a minute)
-
Once the IDE loads, you'll have a fresh project ready for development!