Name	Name	Last commit message	Last commit date
parent directory ..
notes	notes
README.md	README.md

Module 5: Data Platforms

Overview

In this module, you'll learn about data platforms - tools that help you manage the entire data lifecycle from ingestion to analytics.

We'll use Bruin as an example of a data platform. Bruin puts multiple tools under one platform:

Data ingestion (extract from sources to your warehouse)
Data transformation (cleaning, modeling, aggregating)
Data orchestration (scheduling and dependency management)
Data quality (built-in checks and validation)
Metadata management (lineage, documentation)

Tutorial

Follow the complete hands-on tutorial at:

Bruin Data Engineering Zoomcamp Template

The template is a TODO-based learning exercise — run bruin init zoomcamp my-taxi-pipeline and fill in the configuration and code guided by inline comments. The notes contain completed reference implementations.

Videos

🎥 5.1 - Introduction to Bruin

Introduction to the Bruin data platform: what it is, what a modern data stack looks like (ETL/ELT, orchestration, data quality), and how Bruin brings all of these together into a single project.

Notes

🎥 5.2 - Getting Started with Bruin

Install Bruin, set up the VS Code/Cursor extension and Bruin MCP, and create a first project using bruin init. Walk through environments, connections (DuckDB, Chess.com), pipeline YAML configuration, and running Python, YAML ingestor, and SQL assets.

Notes

🎥 5.3 - Building an End-to-End Pipeline with NYC Taxi Data

Build a full pipeline with a three-layered architecture (ingestion, staging, reports) using NYC taxi data and DuckDB.

Notes

🎥 5.4 - Using Bruin MCP with AI Agents

Install the Bruin MCP in Cursor/VS Code and use an AI agent to build the entire NYC taxi pipeline end to end. Query data conversationally, ask questions about pipeline logic, and troubleshoot issues — all through natural language.

Notes

🎥 5.5 - Deploying to Bruin Cloud

Register for Bruin Cloud, connect your GitHub repository, set up data warehouse connections, deploy and monitor your pipelines with a fully managed infrastructure.

Notes

Bruin Core Concepts

Short videos covering the fundamental concepts of Bruin: projects, pipelines, assets, variables, and commands.

🎥 Projects

The root directory where you create your Bruin data pipeline. Learn about project initialization, the .bruin.yml configuration file, environments, and connections.

Notes

🎥 Pipelines

A grouping mechanism for organizing assets based on their execution schedule. Each pipeline has a single schedule and its own configuration file.

Notes

🎥 Assets

Single files that perform specific tasks, creating or updating tables/views in your database. Covers SQL, Python, and YAML asset types with examples.

Notes

🎥 Variables

Dynamic values initialized at each pipeline run. Learn about built-in variables (start_date, end_date) and custom variables for parameterizing your pipelines.

Notes

🎥 Commands

CLI commands for interacting with your Bruin project: bruin run, bruin validate, bruin lineage, and more with practical examples.

Notes

Resources

Bruin Documentation
Bruin GitHub Repository
Bruin MCP (AI Integration)
Bruin Cloud — managed deployment and monitoring

Homework

2026 Homework

Community notes

Did you take notes? You can share them here

Add your notes here (above this line)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Module 5: Data Platforms

Overview

Tutorial