This document provides detailed explanations of the included examples.
The news scraper demonstrates basic scraping functionality:
// Basic configuration
const newsScraperDefinition: ScraperDefinition<string, NewsArticle> = {
id: 'news-scraper',
name: 'News Article Scraper',
url: 'https://httpbin.org/html',
navigation: { type: 'direct' },
waitStrategy: { type: 'selector', config: { selector: 'body' } },
parse: async (context) => {
// Extract content
return {
title: await page.textContent('h1'),
content: await page.textContent('p'),
// ... more fields
};
}
};Key Features:
- Simple URL navigation
- Content extraction
- Error handling
- TypeScript types
The product scraper shows advanced features:
const productScraperDefinition: ScraperDefinition<string, ProductSearchResult> = {
id: 'product-scraper',
hooks: {
beforeRequest: [/* custom hooks */],
afterRequest: [/* cleanup hooks */],
onError: [/* error handling */],
onRetry: [/* retry logic */]
},
parse: async (context) => {
// Complex data extraction
}
};Advanced Features:
- Plugin system (Retry, Cache, Metrics)
- Custom hooks
- Screenshot capture
- Performance metrics
- Complex data structures
# Run news scraper
pnpm run example:news
# Run product scraper
pnpm run example:products "search term"
# With custom parameters
pnpm run example:news "custom search"Both examples can be customized by:
- Modifying the scraper definitions
- Adding custom hooks
- Changing URLs and selectors
- Adding validation logic
- Implementing custom plugins