Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

Module 5: Data Platforms

Overview

In this module, you'll learn about data platforms - tools that help you manage the entire data lifecycle from ingestion to analytics.

We'll use Bruin as an example of a data platform. Bruin puts multiple tools under one platform:

  • Data ingestion (extract from sources to your warehouse)
  • Data transformation (cleaning, modeling, aggregating)
  • Data orchestration (scheduling and dependency management)
  • Data quality (built-in checks and validation)
  • Metadata management (lineage, documentation)

Tutorial

Follow the complete hands-on tutorial at:

Bruin Data Engineering Zoomcamp Template

The template is a TODO-based learning exercise — run bruin init zoomcamp my-taxi-pipeline and fill in the configuration and code guided by inline comments. The notes contain completed reference implementations.

Videos

🎥 5.1 - Introduction to Bruin

Introduction to the Bruin data platform: what it is, what a modern data stack looks like (ETL/ELT, orchestration, data quality), and how Bruin brings all of these together into a single project.

🎥 5.2 - Getting Started with Bruin

Install Bruin, set up the VS Code/Cursor extension and Bruin MCP, and create a first project using bruin init. Walk through environments, connections (DuckDB, Chess.com), pipeline YAML configuration, and running Python, YAML ingestor, and SQL assets.

🎥 5.3 - Building an End-to-End Pipeline with NYC Taxi Data

Build a full pipeline with a three-layered architecture (ingestion, staging, reports) using NYC taxi data and DuckDB.

🎥 5.4 - Using Bruin MCP with AI Agents

Install the Bruin MCP in Cursor/VS Code and use an AI agent to build the entire NYC taxi pipeline end to end. Query data conversationally, ask questions about pipeline logic, and troubleshoot issues — all through natural language.

🎥 5.5 - Deploying to Bruin Cloud

Register for Bruin Cloud, connect your GitHub repository, set up data warehouse connections, deploy and monitor your pipelines with a fully managed infrastructure.

Bruin Core Concepts

Short videos covering the fundamental concepts of Bruin: projects, pipelines, assets, variables, and commands.

🎥 Projects

The root directory where you create your Bruin data pipeline. Learn about project initialization, the .bruin.yml configuration file, environments, and connections.

🎥 Pipelines

A grouping mechanism for organizing assets based on their execution schedule. Each pipeline has a single schedule and its own configuration file.

🎥 Assets

Single files that perform specific tasks, creating or updating tables/views in your database. Covers SQL, Python, and YAML asset types with examples.

🎥 Variables

Dynamic values initialized at each pipeline run. Learn about built-in variables (start_date, end_date) and custom variables for parameterizing your pipelines.

🎥 Commands

CLI commands for interacting with your Bruin project: bruin run, bruin validate, bruin lineage, and more with practical examples.

Resources

Homework

Community notes

Did you take notes? You can share them here
  • Add your notes here (above this line)