|
| 1 | +# SmartCrawler (Crawl) |
| 2 | + |
| 3 | +Multi-page crawling with AI extraction or markdown conversion. |
| 4 | + |
| 5 | +**Endpoint:** `POST /v1/crawl` |
| 6 | +**Poll:** `GET /v1/crawl/{crawl_id}` |
| 7 | +**Credits:** 10/page (AI) or 2/page (markdown) |
| 8 | +**Docs:** https://docs.scrapegraphai.com/services/smartcrawler |
| 9 | + |
| 10 | +## Request |
| 11 | + |
| 12 | +| Parameter | Type | Required | Description | |
| 13 | +|---|---|---|---| |
| 14 | +| url | string | Yes | Starting URL | |
| 15 | +| prompt | string | No | Extraction instructions | |
| 16 | +| extraction_mode | boolean | No | AI extraction (true) or markdown (false) | |
| 17 | +| max_pages | number | No | Max pages to crawl | |
| 18 | +| depth | number | No | Link depth to follow | |
| 19 | +| schema | object | No | JSON schema for output structure | |
| 20 | +| rules | object | No | Crawl rules (include_paths, exclude_paths, same_domain) | |
| 21 | +| sitemap | boolean | No | Use sitemap for discovery | |
| 22 | +| stealth | boolean | No | Anti-bot bypass (+4 credits) | |
| 23 | +| webhook_url | string | No | Webhook for completion notification | |
| 24 | + |
| 25 | +## Response (CompletedCrawlResponse) |
| 26 | + |
| 27 | +| Field | Type | Description | |
| 28 | +|---|---|---| |
| 29 | +| crawl_id | string | Unique crawl identifier | |
| 30 | +| status | string | `queued` \| `processing` \| `done` \| `failed` | |
| 31 | +| result | object \| null | AI-extracted data (when extraction_mode=true) | |
| 32 | +| crawled_urls | string[] | All visited URLs | |
| 33 | +| pages | CrawlPage[] | Crawled pages with markdown content | |
| 34 | +| error | string | Error message (empty on success) | |
| 35 | + |
| 36 | +### CrawlPage |
| 37 | + |
| 38 | +| Field | Type | Description | |
| 39 | +|---|---|---| |
| 40 | +| url | string | Page URL | |
| 41 | +| markdown | string | Page content as markdown | |
0 commit comments