This project is meant for experimenting with dbt and the dbt-redshift adapter for Analytics,
using NYC TLC Trip Record dataset as the datasource, with Kimball dimensional modeling technique.
IMPORTANT NOTE: To access awsdatacatalog from RedShift, IAM auth method is required. It also explicitly needs USAGE grants that DB, therefore, on Redshift Query Editor, run:
GRANT USAGE ON DATABASE awsdatacatalog to "IAM:my_iam_user";
GRANT ALL ON DATABASE <DATABASE_NAME> to "IAM:my_iam_user";1. Install dependencies from pyproject.toml and activate the created virtualenv:
uv sync && source .venv/bin/activate2. (Optional) Install pre-commit:
brew install pre-commit
# From root folder where `.pre-commit-config.yaml` is located, run:
pre-commit install3. Setup dbt profiles.yaml accordingly (use the profiles.tmpl.yaml as template)
3.1. By default, the profiles_dir is the user '$HOME/.dbt/'
mkdir -p ~/.dbt/
cat profiles.tmpl.yml >> ~/.dbt/profiles.yml3.2. Set the environment variables for dbt-bigquery:
export DBT_REDSHIFT_HOST=redshift.[id].[region].redshift-serverless.amazonaws.com
export DBT_REDSHIFT_DATABASE=dev
export DBT_REDSHIFT_USE_DATA_CATALOG=1
export DBT_REDSHIFT_SOURCE_GLUE_CATALOG_DB=raw_nyc_tlc_tripdata
export DBT_REDSHIFT_TARGET_SCHEMA=nyc_tlc_record_data3.3. Also, either have your AWS credentials set on ~/.aws/credentials or set them as well:
export AWS_ACCESS_KEY_ID=AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=AWS_SECRET_ACCESS_KEY4. Install dbt dependencies and trigger the pipeline
4.1. Run dbt deps to install dbt plugins
dbt deps4.2. Run dbt build to trigger the dbt models to run
dbt build
# Alternatively you can run only a subset of the models with:
## +models/staging: Runs the dependencies/preceding models first that lead
## to 'models/staging', and then the target models
dbt [build|run] --select +models/staging
## models/staging+: Runs the target models first, and then all models that depend on it
dbt [build|run] --select models/staging+5. Generate the Docs and the Data Lineage graph with:
dbt docs generate
dbt docs serveAccess the generated docs at:
open http://localhost:80801. Build the Docker Image with:
docker build -t dbt-redshift:latest . --no-cache2. Start a container with it:
docker run -d --rm \
-e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY} \
-e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
-e DBT_REDSHIFT_HOST=${DBT_REDSHIFT_HOST} \
-e DBT_REDSHIFT_DATABASE=dev \
-e DBT_REDSHIFT_USE_DATA_CATALOG=1 \
-e DBT_REDSHIFT_SOURCE_GLUE_CATALOG_DB=raw_nyc_tlc_tripdata \
-e DBT_REDSHIFT_TARGET_SCHEMA=nyc_tlc_record_data \
--name dbt-redshift \
dbt-redshift- PEP-517: Packaging and dependency management with
uv - Bootstrap dbt with Redshift Adapter (dbt-redshift)
- Add dbt macro to configure target schemas dinamically
- Run
dbt-corein Docker - Terraform AWS Glue Catalog and Crawler