🎥 Bruin Core Concepts | Pipelines (3:13)
A Pipeline is a grouping mechanism for organizing assets based on their execution schedule and configuration requirements. Within a project, you can have multiple pipelines.
Each pipeline has one schedule - this is the primary reason to group assets together:
- Assets with the same schedule belong in the same pipeline
- Common schedules:
hourly,daily,monthly, or cron expressions
Each pipeline has its own folder containing a pipeline.yml file:
project/
├── .bruin.yml
├── pipelines/
│ ├── nyc-taxi/
│ │ ├── pipeline.yml
│ │ └── assets/
│ └── another-pipeline/
│ ├── pipeline.yml
│ └── assets/
name: nyc_taxi
schedule: monthly
start_date: "2019-01-01"
default_connections:
duckdb: duckdb-default| Setting | Description |
|---|---|
name |
Pipeline identifier |
schedule |
When to run (cron, daily, monthly, etc.) |
start_date |
When the pipeline starts being active |
default_connections |
Which connections to use |
variables |
Custom variables for the pipeline |
Even though connections are defined at the project level (.bruin.yml), each pipeline specifies which connections it uses.
Why this matters:
- In large organizations, different teams may need different credentials
- Prevents unnecessary exposure of secrets
- Only initializes connections needed for the specific pipeline run
- Security isolation between departments
# Validate a pipeline
bruin validate ./pipelines/nyc-taxi/pipeline.yml
# View pipeline lineage
bruin lineage ./pipelines/nyc-taxi/pipeline.yml
# Run the entire pipeline
bruin run ./pipelines/nyc-taxi/pipeline.yml