Command-line interface for ScrapeGraph AI — AI-powered web scraping, data extraction, search, and crawling.
| Concern | Tool |
|---|---|
| Language | TypeScript 5.8 |
| Dev Runtime | Bun |
| Build | tsup (esbuild) |
| CLI Framework | citty (unjs) |
| Prompts | @clack/prompts |
| Styling | chalk v5 (ESM) |
| Validation | zod v4 |
| Env | dotenv |
| Lint / Format | Biome |
| Testing | Bun test (built-in) |
| Target | Node.js 22+, ESM-only |
bun installThe CLI needs a ScrapeGraph API key. Get one at dashboard.scrapegraphai.com.
Four ways to provide it (checked in order):
- Environment variable:
export SGAI_API_KEY="sgai-..." .envfile: create a.envfile in the project root withSGAI_API_KEY=sgai-...- Config file: stored in
~/.scrapegraphai/config.json - Interactive prompt: if none of the above are set, the CLI prompts you and saves it to the config file
Set SGAI_CLI_TIMEOUT_S to override the default 120s request/polling timeout:
export SGAI_CLI_TIMEOUT_S=300Set SGAI_CLI_DEBUG=1 to enable debug logging (outputs to stderr):
SGAI_CLI_DEBUG=1 scrapegraphai smart-scraper https://example.com -p "Extract data"smart-scraper — Extract structured data from a URL docs
scrapegraphai smart-scraper <url> -p "Extract all product names and prices"
# With JSON schema
scrapegraphai smart-scraper https://example.com/products -p "Extract products" \
--schema '{"type":"object","properties":{"products":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}}}}'
# With options
scrapegraphai smart-scraper https://example.com -p "Extract data" \
--stealth --render-js --scrolls 10 --pages 5| Option | Description |
|---|---|
-p, --prompt |
Extraction prompt (required) |
--schema |
Output JSON schema (JSON string) |
--scrolls |
Infinite scroll count (0-100) |
--pages |
Total pages to scrape (1-100) |
--render-js |
Enable JS rendering (+1 credit) |
--stealth |
Bypass bot detection (+4 credits) |
--cookies |
Cookies as JSON object string |
--headers |
Custom headers as JSON object string |
--plain-text |
Return plain text instead of JSON |
search-scraper — Search the web and extract data docs
scrapegraphai search-scraper "What are the top Python web frameworks?"
# Markdown only (cheaper)
scrapegraphai search-scraper "Python frameworks" --no-extraction --num-results 5| Option | Description |
|---|---|
--num-results |
Number of websites (3-20, default 3) |
--no-extraction |
Markdown only (2 credits/site vs 10) |
--schema |
Output JSON schema (JSON string) |
--stealth |
Bypass bot detection (+4 credits) |
--headers |
Custom headers as JSON object string |
markdownify — Convert a webpage to markdown docs
scrapegraphai markdownify https://example.com/article
scrapegraphai markdownify https://example.com --render-js --stealth| Option | Description |
|---|---|
--render-js |
Enable JS rendering (+1 credit) |
--stealth |
Bypass bot detection (+4 credits) |
--headers |
Custom headers as JSON object string |
crawl — Crawl and extract from multiple pages docs
scrapegraphai crawl https://example.com -p "Extract article titles" --max-pages 5 --depth 2
# Markdown only
scrapegraphai crawl https://example.com --no-extraction --max-pages 10
# With crawl rules
scrapegraphai crawl https://example.com -p "Extract data" \
--rules '{"include_paths":["/blog/*"],"same_domain":true}'| Option | Description |
|---|---|
-p, --prompt |
Extraction prompt (required when extraction is on) |
--no-extraction |
Markdown only (2 credits/page vs 10) |
--max-pages |
Max pages to crawl (default 10) |
--depth |
Crawl depth (default 1) |
--schema |
Output JSON schema (JSON string) |
--rules |
Crawl rules as JSON object string |
--no-sitemap |
Disable sitemap-based discovery |
--render-js |
Enable JS rendering (+1 credit/page) |
--stealth |
Bypass bot detection (+4 credits) |
sitemap — Get all URLs from a website's sitemap docs
scrapegraphai sitemap https://example.comscrape — Get raw HTML content docs
scrapegraphai scrape https://example.com
scrapegraphai scrape https://example.com --stealth --branding --country-code US| Option | Description |
|---|---|
--render-js |
Enable JS rendering (+1 credit) |
--stealth |
Bypass bot detection (+4 credits) |
--branding |
Extract branding info (+2 credits) |
--country-code |
ISO country code for geo-targeting |
agentic-scraper — Browser automation with AI docs
scrapegraphai agentic-scraper https://example.com/login \
-s "Fill email with user@test.com,Fill password with pass123,Click Sign In" \
--ai-extraction -p "Extract dashboard data"| Option | Description |
|---|---|
-s, --steps |
Comma-separated browser steps |
-p, --prompt |
Extraction prompt (with --ai-extraction) |
--schema |
Output JSON schema (JSON string) |
--ai-extraction |
Enable AI extraction after steps |
--use-session |
Persist browser session |
scrapegraphai generate-schema "Schema for an e-commerce product with name, price, and reviews"| Option | Description |
|---|---|
--existing-schema |
Existing schema to modify (JSON string) |
scrapegraphai creditsscrapegraphai validateTests use Bun's built-in test runner with spyOn(globalThis, "fetch") to mock all API calls — no network requests, no API key needed.
bun testCovers all SDK functions: success paths, polling, HTTP error mapping (401/402/422/429/500), Zod validation, timeouts, and network failures.
scrapegraph-cli/
├── src/
│ ├── cli.ts # Entry point, citty main command + subcommands
│ ├── lib/
│ │ ├── env.ts # Zod-parsed env config (API key, debug, timeout)
│ │ ├── folders.ts # API key resolution + interactive prompt
│ │ ├── scrapegraphai.ts # SDK layer — all API functions
│ │ ├── schemas.ts # Zod validation schemas
│ │ └── log.ts # Syntax-highlighted JSON output
│ ├── types/
│ │ └── index.ts # Zod-derived types + ApiResult
│ ├── commands/
│ │ ├── smart-scraper.ts
│ │ ├── search-scraper.ts
│ │ ├── markdownify.ts
│ │ ├── crawl.ts
│ │ ├── sitemap.ts
│ │ ├── scrape.ts
│ │ ├── agentic-scraper.ts
│ │ ├── generate-schema.ts
│ │ ├── credits.ts
│ │ └── validate.ts
│ └── utils/
│ └── banner.ts # ASCII banner + version from package.json
├── tests/
│ └── scrapegraphai.test.ts # SDK layer tests (mocked fetch)
├── dist/ # Build output (git-ignored)
│ └── cli.mjs # Bundled ESM with shebang
├── package.json
├── tsconfig.json
├── tsup.config.ts
├── biome.json
└── .gitignore
| Script | Command | Description |
|---|---|---|
dev |
bun run src/cli.ts |
Run CLI from TS source |
build |
tsup |
Bundle ESM to dist/cli.mjs |
lint |
biome check . |
Lint + format check |
format |
biome format . --write |
Auto-format |
test |
bun test |
Run tests |
check |
tsc --noEmit && biome check . |
Type-check + lint |
All commands output pretty-printed JSON to stdout (pipeable). Errors go to stderr via @clack/prompts.
# Pipe output to jq
scrapegraphai credits | jq '.remaining_credits'
# Save to file
scrapegraphai smart-scraper https://example.com -p "Extract data" > result.jsonISC
