In this module, you'll learn about data platforms - tools that help you manage the entire data lifecycle from ingestion to analytics.
We'll use Bruin as an example of a data platform. Bruin puts multiple tools under one platform:
- Data ingestion (extract from sources to your warehouse)
- Data transformation (cleaning, modeling, aggregating)
- Data orchestration (scheduling and dependency management)
- Data quality (built-in checks and validation)
- Metadata management (lineage, documentation)
Follow the complete hands-on tutorial at:
Bruin Data Engineering Zoomcamp Template
The template is a TODO-based learning exercise — run bruin init zoomcamp my-taxi-pipeline and fill in the configuration and code guided by inline comments. The notes contain completed reference implementations.
Introduction to the Bruin data platform: what it is, what a modern data stack looks like (ETL/ELT, orchestration, data quality), and how Bruin brings all of these together into a single project.
Install Bruin, set up the VS Code/Cursor extension and Bruin MCP, and create a first project using bruin init. Walk through environments, connections (DuckDB, Chess.com), pipeline YAML configuration, and running Python, YAML ingestor, and SQL assets.
Build a full pipeline with a three-layered architecture (ingestion, staging, reports) using NYC taxi data and DuckDB.
Install the Bruin MCP in Cursor/VS Code and use an AI agent to build the entire NYC taxi pipeline end to end. Query data conversationally, ask questions about pipeline logic, and troubleshoot issues — all through natural language.
Register for Bruin Cloud, connect your GitHub repository, set up data warehouse connections, deploy and monitor your pipelines with a fully managed infrastructure.
Short videos covering the fundamental concepts of Bruin: projects, pipelines, assets, variables, and commands.
The root directory where you create your Bruin data pipeline. Learn about project initialization, the .bruin.yml configuration file, environments, and connections.
A grouping mechanism for organizing assets based on their execution schedule. Each pipeline has a single schedule and its own configuration file.
Single files that perform specific tasks, creating or updating tables/views in your database. Covers SQL, Python, and YAML asset types with examples.
Dynamic values initialized at each pipeline run. Learn about built-in variables (start_date, end_date) and custom variables for parameterizing your pipelines.
CLI commands for interacting with your Bruin project: bruin run, bruin validate, bruin lineage, and more with practical examples.
- Bruin Documentation
- Bruin GitHub Repository
- Bruin MCP (AI Integration)
- Bruin Cloud — managed deployment and monitoring
Did you take notes? You can share them here
- Add your notes here (above this line)