| title | Puppeteer |
|---|---|
| sidebarTitle | Puppeteer |
| description | These examples demonstrate how to use Puppeteer with Trigger.dev. |
import LocalDevelopment from "/snippets/local-development-extensions.mdx"; import ScrapingWarning from "/snippets/web-scraping-warning.mdx";
- A project with Trigger.dev initialized
- Puppeteer installed on your machine
There are 3 example tasks to follow on this page:
To use all examples on this page, you'll first need to add these build settings to your trigger.config.ts file:
import { defineConfig } from "@trigger.dev/sdk/v3";
import { puppeteer } from "@trigger.dev/build/extensions/puppeteer";
export default defineConfig({
project: "<project ref>",
// Your other config settings...
build: {
// This is required to use the Puppeteer library
extensions: [puppeteer()],
},
});Learn more about the trigger.config.ts file including setting default retry settings, customizing the build environment, and more.
Set the following environment variable in your Trigger.dev dashboard or using the SDK:
PUPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable",In this example we use Puppeteer to log out the title of a web page, in this case from the Trigger.dev landing page.
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
export const puppeteerTask = task({
id: "puppeteer-log-title",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://trigger.dev");
const content = await page.title();
logger.info("Content", { content });
await browser.close();
},
});There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks here.
In this example we use Puppeteer to generate a PDF from the Trigger.dev landing page and upload it to Cloudflare R2.
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";
// Initialize S3 client
const s3Client = new S3Client({
region: "auto",
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "",
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "",
},
});
export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const response = await page.goto("https://trigger.dev");
const url = response?.url() ?? "No URL found";
// Generate PDF from the web page
const generatePdf = await page.pdf();
logger.info("PDF generated from URL", { url });
await browser.close();
// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
Bucket: process.env.S3_BUCKET,
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};
logger.log("Uploading to R2 with params", uploadParams);
// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
},
});There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks here.
In this example we use Puppeteer with a BrowserBase proxy to scrape the GitHub stars count from the Trigger.dev landing page and log it out. See this list for more proxying services we recommend.
When web scraping, you MUST use the technique below which uses a proxy with Puppeteer. Direct scraping without using `browserWSEndpoint` is prohibited and will result in account suspension.import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer-core";
export const puppeteerScrapeWithProxy = task({
id: "puppeteer-scrape-with-proxy",
run: async () => {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
});
const page = await browser.newPage();
try {
// Navigate to the target website
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" });
// Scrape the GitHub stars count
const starCount = await page.evaluate(() => {
const starElement = document.querySelector(".github-star-count");
const text = starElement?.textContent ?? "0";
const numberText = text.replace(/[^0-9]/g, "");
return parseInt(numberText);
});
logger.info("GitHub star count", { starCount });
return { starCount };
} catch (error) {
logger.error("Error during scraping", {
error: error instanceof Error ? error.message : String(error),
});
throw error;
} finally {
await browser.close();
}
},
});There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks here.
<LocalDevelopment packages={"the Puppeteer library."} />
If you're using Trigger.dev Cloud and Puppeteer or any other tool to scrape content from websites you don't own, you'll need to proxy your requests. If you don't you'll risk getting our IP address blocked and we will ban you from our service. You must always have permission from the website owner to scrape their content.
Here are a list of proxy services we recommend: