A lightweight Node.js web scraper that uses Playwright to render JavaScript-heavy pages and converts them to clean Markdown.
- 🎭 Playwright-powered headless browser — handles JS-rendered pages
- 📄 Readability for main content extraction
- ✨ Clean Markdown output via Turndown
- 🔓 Basic anti-scraping bypass (realistic user agent, viewport)
- 🚀 Zero config, single script
- Node.js 18+
- Playwright Chromium browser
npm install
npx playwright install chromium# Extract main article content (default)
node scrape.mjs https://example.com
# Full page content
node scrape.mjs https://example.com --fullOutputs clean Markdown to stdout. Redirect to a file:
node scrape.mjs https://example.com > output.md- Launches a headless Chromium browser via Playwright
- Navigates to the URL and waits for JS to render
- Extracts the main content using Mozilla's Readability
- Converts HTML to Markdown using Turndown
- Outputs to stdout
MIT