mcp-playwright-scraper

A lightweight Node.js web scraper that uses Playwright to render JavaScript-heavy pages and converts them to clean Markdown.

Features

🎭 Playwright-powered headless browser — handles JS-rendered pages
📄 Readability for main content extraction
✨ Clean Markdown output via Turndown
🔓 Basic anti-scraping bypass (realistic user agent, viewport)
🚀 Zero config, single script

Requirements

Node.js 18+
Playwright Chromium browser

Installation

npm install
npx playwright install chromium

Usage

# Extract main article content (default)
node scrape.mjs https://example.com

# Full page content
node scrape.mjs https://example.com --full

Output

Outputs clean Markdown to stdout. Redirect to a file:

node scrape.mjs https://example.com > output.md

How it works

Launches a headless Chromium browser via Playwright
Navigates to the URL and waits for JS to render
Extracts the main content using Mozilla's Readability
Converts HTML to Markdown using Turndown
Outputs to stdout

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
scrape.mjs		scrape.mjs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mcp-playwright-scraper

Features

Requirements

Installation

Usage

Output

How it works

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mcp-playwright-scraper

Features

Requirements

Installation

Usage

Output

How it works

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages