Skip to content

Latest commit

 

History

History
150 lines (114 loc) · 4.57 KB

File metadata and controls

150 lines (114 loc) · 4.57 KB
title Data processing & ETL workflows
sidebarTitle Data processing & ETL
description Learn how to use Trigger.dev for data processing and ETL (Extract, Transform, Load), including web scraping, database synchronization, batch enrichment and more.

import UseCasesCards from "/snippets/use-cases-cards.mdx";

Overview

Build complex data pipelines that process large datasets without timeouts. Handle streaming analytics, batch enrichment, web scraping, database sync, and file processing with automatic retries and progress tracking.

Featured examples

Import CSV files with progress streamed live to frontend. Scrape websites using BrowserBase and Puppeteer. Trigger tasks from Supabase database webhooks.

Benefits of using Trigger.dev for data processing & ETL workflows

Process datasets for hours without timeouts: Handle multi-hour transformations, large file processing, or complete database exports. No execution time limits.

Parallel processing with built-in rate limiting: Process thousands of records simultaneously while respecting API rate limits. Scale efficiently without overwhelming downstream services.

Stream progress to your users in real-time: Show row-by-row processing status updating live in your dashboard. Users see exactly where processing is and how long remains.

Production use cases

Read how MagicSchool AI uses Trigger.dev to generate insights from millions of student interactions.

Read how Comp AI uses Trigger.dev to automate evidence collection at scale, powering their open source, AI-driven compliance platform.

Read how Midday use Trigger.dev to sync large volumes of bank transactions in their financial management platform.

Example workflow patterns

Simple CSV import pipeline. Receives file upload, parses CSV rows, validates data, imports to database with progress tracking.
graph TB
    A[importCSV] --> B[parseCSVFile]
    B --> C[validateRows]
    C --> D[bulkInsertToDB]
    D --> E[notifyCompletion]
Loading
**Coordinator pattern with parallel extraction**. Batch triggers parallel extraction from multiple sources (APIs, databases, S3), transforms and validates data, loads to data warehouse with monitoring.
graph TB
    A[runETLPipeline] --> B[coordinateExtraction]
    B --> C[batchTriggerAndWait]
    C --> D[extractFromAPI]
    C --> E[extractFromDatabase]
    C --> F[extractFromS3]
    D --> G[transformData]
    E --> G
    F --> G
    G --> H[validateData]
    H --> I[loadToWarehouse]
Loading
**Coordinator pattern with browser automation**. Launches headless browsers in parallel to scrape multiple pages, extracts structured data, cleans and normalizes content, stores in database.
graph TB
    A[scrapeSite] --> B[coordinateScraping]
    B --> C[batchTriggerAndWait]
    C --> D[scrapePage1]
    C --> E[scrapePage2]
    C --> F[scrapePageN]
    D --> G[cleanData]
    E --> G
    F --> G
    G --> H[normalizeData]
    H --> I[storeInDatabase]
Loading
**Coordinator pattern with rate limiting**. Fetches records needing enrichment, batch triggers parallel API calls with configurable concurrency to respect rate limits, validates enriched data, updates database.
graph TB
    A[enrichRecords] --> B[fetchRecordsToEnrich]
    B --> C[coordinateEnrichment]
    C --> D[batchTriggerAndWait]
    D --> E[enrichRecord1]
    D --> F[enrichRecord2]
    D --> G[enrichRecordN]
    E --> H[validateEnrichedData]
    F --> H
    G --> H
    H --> I[updateDatabase]
Loading