futron-scrape

Universal free OSS web-scraping cascade.
Drop-in replacement for paid Firecrawl / billed WebFetch in any Claude Code, Codex, or agent harness.

A single futron-scrape <url> walks 9 tiers in priority order and returns clean markdown (or JSON, or raw HTML) the moment one tier produces useful content. Works on:

🟢 Public APIs the open web already exposes — Reddit, Hacker News, Bluesky, Mastodon, GitHub, Stack Exchange, Wikipedia, npm, PyPI, Substack
🟢 X / Twitter posts without login (publish.twitter.com oembed)
🟢 YouTube videos with full description + auto-caption transcripts (yt-dlp)
🟢 PDFs (poppler pdftotext) and images (tesseract OCR)
🟢 Generic blogs / docs / articles via the free r.jina.ai reader
🟢 Paywalled / dead pages via Wayback + archive.is + 12ft.io
🟢 JS-heavy SPAs via headless Playwright
🟢 Login-walled / fingerprinted sites via macOS osascript driving your already-logged-in Chrome or Safari

When every OSS path is exhausted (real auth wall + no logged-in browser available), futron-scrape writes a structured queue file and returns exit 3 so a parent agent with a browser MCP (Claude-in-Chrome, Playwright MCP, computer-use MCP) can take over.

Why?

Cost. Firecrawl credits cost money. WebFetch in Claude Code is metered.
Auth. Half the sites that matter sit behind login walls. The cleanest answer is the user's already-open browser.
Coverage. "Just use jina" only takes you so far — Reddit, X, GitHub all return better data via their public APIs than via a generic reader.
Agent-friendly. Single binary, predictable exit codes, JSON mode, structured agent-handoff queue. Drop into any harness.

Install

git clone https://github.com/FutronPrime/futron-scrape.git
cd futron-scrape
./install.sh

install.sh symlinks the four binaries into ~/.local/bin/ (creating it if needed) and prints which optional system deps you should install:

brew install yt-dlp poppler tesseract       # YouTube + PDF + image OCR (optional)
pip install playwright && playwright install chromium    # T8 headless browser (optional)

Each is optional — its tier just gets skipped if missing.

Quick start

# Standard fetch — auto-cascades
futron-scrape https://news.ycombinator.com/item?id=39000000
futron-scrape https://en.wikipedia.org/wiki/Model_Context_Protocol
futron-scrape https://x.com/karpathy/status/1234567890

# Force JS-rendered fetch (skip cheap tiers)
futron-scrape --browser https://some-spa.example.com

# Allow paywall bypass
futron-scrape --paywall https://nytimes.com/article-url

# JSON output {url, source, length, body}
futron-scrape --json https://github.com/anthropics/anthropic-sdk-python

# Probe — which tier WOULD work, no body fetch
futron-scrape --probe https://example.com

# Pipe URL via stdin
echo "https://en.wikipedia.org/wiki/AI" | futron-scrape -

Tier order

#	Path	Best for
T1	Site-specific public APIs (Reddit `.json`, HN Algolia, Bluesky AT proto, Mastodon `/api/v1`, GitHub REST, Stack Exchange, Wikipedia, npm, PyPI, Substack)	Social posts + structured docs
T2	`publish.twitter.com/oembed`	X / Twitter
T3	`yt-dlp` (description + auto-captions)	YouTube
T4	`pdftotext` / `tesseract`	PDFs + images
T5	`r.jina.ai` reader	Generic blogs / docs / JS pages
T6	Wayback / archive.is / 12ft.io (`--paywall`)	Paywalls + dead pages
T7	Raw curl + HTML strip	Static HTML last resort
T8	Playwright headless (`--browser`)	JS that jina can't render
T9	osascript-driven logged-in Chrome / Safari (`futron-cua-fetch`)	macOS, login walls, fingerprinting

When all 9 fail, exit code 3 is returned and a queue file is written at ${FUTRON_SCRAPE_STATE:-~/.config/futron-scrape}/scrape-requests/<hash>.json containing {url, mode, created, status: "pending"}. A parent agent with a browser MCP (Claude-in-Chrome, Playwright MCP, computer-use MCP) can pick it up and fulfil it.

Companion tools

`futron-mcp-discover` — find an MCP server for any "trouble" site

futron-mcp-discover linkedin       # ranked GitHub repos + npm + curated tip
futron-mcp-discover slack          # same for Slack
futron-mcp-discover notion --install     # also stage top GitHub repo to ~/.config/futron-scrape/staged/
futron-mcp-discover --list         # full curated trouble-site map
futron-mcp-discover --json reddit  # JSON output

Searches the GitHub Search API (uses GITHUB_TOKEN if set), the npm registry, and a curated list of well-known OSS MCPs / clients per topic.

`futron-bookmark-watcher` — poll a Dewey public bookmark folder hourly

If you use Dewey for your X bookmarks, point this watcher at any public folder URL:

export FUTRON_BOOKMARK_URL="https://getdewey.co/users/<your-handle>/<folder-slug>/"
futron-bookmark-watcher                  # one-shot scan
futron-bookmark-watcher --install-safe   # also clone safe Skill repos
futron-bookmark-watcher --watch 3600     # continuous (or use launchd / cron)

Diffs against ${FUTRON_BOOKMARK_STATE:-~/.config/futron-scrape/bookmark-watcher}/seen.json, classifies new tweets (skill / mcp / cli-tool / repo / model / other), writes per-tweet markdown to ${FUTRON_BOOKMARK_DIR:-~/.config/futron-scrape/bookmarks}/<date>/.

`futron-cua-fetch` — drive your logged-in browser via osascript

futron-cua-fetch <url>                 # markdown of body innerText
futron-cua-fetch --raw <url>           # outerHTML
futron-cua-fetch --queue <url>         # write request file only, agent picks up
futron-cua-fetch --browser chrome <url>

macOS only. Uses your live Chrome / Safari session — works on any login-walled site you're already signed into.

Environment variables

All optional:

FUTRON_SCRAPE_TIMEOUT     per-source timeout in seconds (default 12)
FUTRON_SCRAPE_LOG         log path (default /tmp/futron-scrape.log)
FUTRON_SCRAPE_STATE       state dir (default ~/.config/futron-scrape)
FUTRON_BOOKMARK_URL       Dewey public folder URL (required by watcher)
FUTRON_BOOKMARK_DIR       memory output dir (watcher)
FUTRON_BOOKMARK_STATE     watcher state dir
GITHUB_TOKEN              raises GitHub rate to 5000/hr (T1 + discover)
TELEGRAM_BOT_TOKEN        watcher --notify
TELEGRAM_CHAT_ID          watcher --notify

No keys are required to start. Set what you have; the rest skips gracefully.

Agent integration

For Claude Code / Codex / any harness, drop the included Skill into ~/.claude/skills/futron-scrape/ (see skill/SKILL.md). The agent will then call futron-scrape automatically when it needs to fetch a URL and your default fetch tool is unavailable / billed / blocked.

When futron-scrape exits 3, the agent should:

Read the queue file under ${FUTRON_SCRAPE_STATE}/scrape-requests/.
Use whatever browser MCP is available (Claude-in-Chrome MCP for the user's live cookies, Playwright MCP, computer-use MCP).
Save the result and update the queue file's status to "fulfilled".

Exit codes

Code	Meaning
0	Body printed
1	All OSS tiers exhausted (page genuinely empty / unreachable)
2	Usage error
3	Login wall or fingerprinting — queue file written, agent should escalate via a browser MCP

License

MIT. See LICENSE.

Origin

Extracted and sanitized from the FUTRON / OpenClaw stack. Designed to be drop-in elsewhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

futron-scrape

Why?

Install

Quick start

Tier order

Companion tools

`futron-mcp-discover` — find an MCP server for any "trouble" site

`futron-bookmark-watcher` — poll a Dewey public bookmark folder hourly

`futron-cua-fetch` — drive your logged-in browser via osascript

Environment variables

Agent integration

Exit codes

License

Origin

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bin		bin
docs		docs
skill		skill
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

futron-scrape

Why?

Install

Quick start

Tier order

Companion tools

futron-mcp-discover — find an MCP server for any "trouble" site

futron-bookmark-watcher — poll a Dewey public bookmark folder hourly

futron-cua-fetch — drive your logged-in browser via osascript

Environment variables

Agent integration

Exit codes

License

Origin

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`futron-mcp-discover` — find an MCP server for any "trouble" site

`futron-bookmark-watcher` — poll a Dewey public bookmark folder hourly

`futron-cua-fetch` — drive your logged-in browser via osascript

Packages