Skip to content

Commit f40c5f9

Browse files
feat: add typed API responses and fix sync/async endpoint classification
Replace ApiResult<unknown> with proper response types for all endpoints. Only crawl uses polling now — all other endpoints are direct POST calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 17ef0a5 commit f40c5f9

19 files changed

Lines changed: 455 additions & 130 deletions

README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,26 @@ Command-line interface for [ScrapeGraph AI](https://scrapegraphai.com) — AI-po
1111

1212
```
1313
just-scrape/
14+
├── docs/ # API response docs per endpoint
15+
│ ├── smartscraper.md
16+
│ ├── searchscraper.md
17+
│ ├── markdownify.md
18+
│ ├── crawl.md
19+
│ ├── scrape.md
20+
│ ├── agenticscraper.md
21+
│ ├── generate-schema.md
22+
│ ├── sitemap.md
23+
│ └── credits.md
1424
├── src/
1525
│ ├── cli.ts # Entry point, citty main command + subcommands
1626
│ ├── lib/
1727
│ │ ├── env.ts # Zod-parsed env config (API key, debug, timeout)
1828
│ │ ├── folders.ts # API key resolution + interactive prompt
19-
│ │ ├── scrapegraphai.ts # SDK layer — all API functions
29+
│ │ ├── scrapegraphai.ts # SDK layer — all API functions (typed responses)
2030
│ │ ├── schemas.ts # Zod validation schemas
2131
│ │ └── log.ts # Logger factory + syntax-highlighted JSON output
2232
│ ├── types/
23-
│ │ └── index.ts # Zod-derived types + ApiResult
33+
│ │ └── index.ts # Zod-derived types + ApiResult + response types
2434
│ ├── commands/
2535
│ │ ├── smart-scraper.ts
2636
│ │ ├── search-scraper.ts

docs/agenticscraper.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Agentic Scraper
2+
3+
Browser automation with AI — login, click, navigate, fill forms, extract data.
4+
5+
**Endpoint:** `POST /v1/agentic-scrapper`
6+
**Poll:** `GET /v1/agentic-scrapper/{request_id}`
7+
**Docs:** https://docs.scrapegraphai.com/services/agenticscraper
8+
9+
## Request
10+
11+
| Parameter | Type | Required | Description |
12+
|---|---|---|---|
13+
| url | string | Yes | Target URL |
14+
| steps | string[] | No | Browser action sequence |
15+
| user_prompt | string | No | AI extraction instructions |
16+
| output_schema | object | No | JSON schema for output structure |
17+
| ai_extraction | boolean | No | Enable AI-powered structuring |
18+
| use_session | boolean | No | Persist browser session |
19+
20+
## Response (CompletedAgenticScraperResponse)
21+
22+
| Field | Type | Description |
23+
|---|---|---|
24+
| request_id | string | Unique request identifier |
25+
| status | string | `queued` \| `processing` \| `completed` \| `failed` |
26+
| result | object \| null | Extracted data or page markdown |
27+
| error | string | Error message (empty on success) |

docs/crawl.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# SmartCrawler (Crawl)
2+
3+
Multi-page crawling with AI extraction or markdown conversion.
4+
5+
**Endpoint:** `POST /v1/crawl`
6+
**Poll:** `GET /v1/crawl/{crawl_id}`
7+
**Credits:** 10/page (AI) or 2/page (markdown)
8+
**Docs:** https://docs.scrapegraphai.com/services/smartcrawler
9+
10+
## Request
11+
12+
| Parameter | Type | Required | Description |
13+
|---|---|---|---|
14+
| url | string | Yes | Starting URL |
15+
| prompt | string | No | Extraction instructions |
16+
| extraction_mode | boolean | No | AI extraction (true) or markdown (false) |
17+
| max_pages | number | No | Max pages to crawl |
18+
| depth | number | No | Link depth to follow |
19+
| schema | object | No | JSON schema for output structure |
20+
| rules | object | No | Crawl rules (include_paths, exclude_paths, same_domain) |
21+
| sitemap | boolean | No | Use sitemap for discovery |
22+
| stealth | boolean | No | Anti-bot bypass (+4 credits) |
23+
| webhook_url | string | No | Webhook for completion notification |
24+
25+
## Response (CompletedCrawlResponse)
26+
27+
| Field | Type | Description |
28+
|---|---|---|
29+
| crawl_id | string | Unique crawl identifier |
30+
| status | string | `queued` \| `processing` \| `done` \| `failed` |
31+
| result | object \| null | AI-extracted data (when extraction_mode=true) |
32+
| crawled_urls | string[] | All visited URLs |
33+
| pages | CrawlPage[] | Crawled pages with markdown content |
34+
| error | string | Error message (empty on success) |
35+
36+
### CrawlPage
37+
38+
| Field | Type | Description |
39+
|---|---|---|
40+
| url | string | Page URL |
41+
| markdown | string | Page content as markdown |

docs/credits.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Credits
2+
3+
Check API credit balance.
4+
5+
**Endpoint:** `GET /v1/credits`
6+
**Sync**
7+
8+
## Response (CreditsResponse)
9+
10+
| Field | Type | Description |
11+
|---|---|---|
12+
| remaining_credits | number | Credits available |
13+
| total_credits_used | number | Total credits consumed |

docs/generate-schema.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Generate Schema
2+
3+
AI-powered JSON schema generation from natural language descriptions.
4+
5+
**Endpoint:** `POST /v1/generate_schema`
6+
**Poll:** `GET /v1/generate_schema/{request_id}`
7+
8+
## Request
9+
10+
| Parameter | Type | Required | Description |
11+
|---|---|---|---|
12+
| user_prompt | string | Yes | Schema description |
13+
| existing_schema | object | No | Existing schema to modify/extend |
14+
15+
## Response (SchemaGenerationResponse)
16+
17+
| Field | Type | Description |
18+
|---|---|---|
19+
| request_id | string | Unique request identifier |
20+
| status | string | `pending` \| `processing` \| `completed` \| `failed` |
21+
| user_prompt | string | Original prompt |
22+
| refined_prompt | string \| null | AI-refined version of prompt |
23+
| generated_schema | object \| null | Generated JSON schema |
24+
| error | string \| null | Error message |
25+
| created_at | string \| null | ISO 8601 timestamp |
26+
| updated_at | string \| null | ISO 8601 timestamp |

docs/markdownify.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Markdownify
2+
3+
Converts any webpage to clean, well-formatted markdown.
4+
5+
**Endpoint:** `POST /v1/markdownify`
6+
**Poll:** `GET /v1/markdownify/{request_id}`
7+
**Credits:** 2/page
8+
**Docs:** https://docs.scrapegraphai.com/services/markdownify
9+
10+
## Request
11+
12+
| Parameter | Type | Required | Description |
13+
|---|---|---|---|
14+
| website_url | string | Yes | Target URL |
15+
| stealth | boolean | No | Anti-bot bypass (+4 credits) |
16+
| headers | object | No | Custom HTTP headers |
17+
| webhook_url | string | No | Webhook for completion notification |
18+
19+
## Response (CompletedMarkdownifyResponse)
20+
21+
| Field | Type | Description |
22+
|---|---|---|
23+
| request_id | string | Unique request identifier |
24+
| status | string | `queued` \| `processing` \| `completed` \| `failed` |
25+
| website_url | string | Processed URL |
26+
| result | string \| null | Converted markdown content |
27+
| error | string | Error message (empty on success) |

docs/scrape.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Scrape
2+
3+
Raw HTML extraction with optional branding analysis.
4+
5+
**Endpoint:** `POST /v1/scrape`
6+
**Poll:** `GET /v1/scrape/{request_id}`
7+
**Credits:** 1/page (+2 branding, +4 stealth)
8+
**Docs:** https://docs.scrapegraphai.com/services/scrape
9+
10+
## Request
11+
12+
| Parameter | Type | Required | Description |
13+
|---|---|---|---|
14+
| website_url | string | Yes | Target URL |
15+
| stealth | boolean | No | Anti-bot bypass (+4 credits) |
16+
| branding | boolean | No | Extract design elements (+2 credits) |
17+
| country_code | string | No | ISO country code for geo-targeting |
18+
19+
## Response (CompletedScrapeResponse)
20+
21+
| Field | Type | Description |
22+
|---|---|---|
23+
| request_id | string | Unique request identifier |
24+
| status | string | `queued` \| `processing` \| `completed` \| `failed` |
25+
| html | string | Complete HTML content |
26+
| branding | object | Design elements (when branding=true) |
27+
| error | string | Error message (empty on success) |

docs/searchscraper.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# SearchScraper
2+
3+
AI-powered web search that aggregates information from multiple sources.
4+
5+
**Endpoint:** `POST /v1/searchscraper`
6+
**Poll:** `GET /v1/searchscraper/{request_id}`
7+
**Credits:** 10/page (AI extraction) or 2/page (markdown mode)
8+
**Docs:** https://docs.scrapegraphai.com/services/searchscraper
9+
10+
## Request
11+
12+
| Parameter | Type | Required | Description |
13+
|---|---|---|---|
14+
| user_prompt | string | Yes | Search query |
15+
| num_results | number | No | Sources to scrape (3-20) |
16+
| extraction_mode | boolean | No | AI extraction (true) or markdown mode (false) |
17+
| output_schema | object | No | JSON schema for output structure |
18+
| stealth | boolean | No | Anti-bot bypass (+4 credits) |
19+
| headers | object | No | Custom HTTP headers |
20+
| webhook_url | string | No | Webhook for completion notification |
21+
22+
## Response (CompletedSearchScraperResponse)
23+
24+
| Field | Type | Description |
25+
|---|---|---|
26+
| request_id | string | Unique request identifier |
27+
| status | string | `queued` \| `processing` \| `completed` \| `failed` |
28+
| user_prompt | string | Original search query |
29+
| result | object \| null | Extracted/structured data |
30+
| reference_urls | string[] | Source URLs used |
31+
| error | string | Error message (empty on success) |

docs/sitemap.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Sitemap
2+
3+
Extracts all URLs from a website's sitemap.xml.
4+
5+
**Endpoint:** `POST /v1/sitemap`
6+
**Sync** (no polling)
7+
**Docs:** https://docs.scrapegraphai.com/services/sitemap
8+
9+
## Request
10+
11+
| Parameter | Type | Required | Description |
12+
|---|---|---|---|
13+
| website_url | string | Yes | Target website URL |
14+
15+
## Response (SitemapResponse)
16+
17+
| Field | Type | Description |
18+
|---|---|---|
19+
| request_id | string | Request identifier |
20+
| status | string | Completion status |
21+
| website_url | string | Processed URL |
22+
| urls | string[] | Discovered URLs |
23+
| error | string | Error message (empty on success) |

docs/smartscraper.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# SmartScraper
2+
3+
AI-powered web scraping that extracts structured data from any website.
4+
5+
**Endpoint:** `POST /v1/smartscraper`
6+
**Poll:** `GET /v1/smartscraper/{request_id}`
7+
**Credits:** 10/page
8+
**Docs:** https://docs.scrapegraphai.com/services/smartscraper
9+
10+
## Request
11+
12+
| Parameter | Type | Required | Description |
13+
|---|---|---|---|
14+
| website_url | string | Yes* | Target URL (*one of three inputs) |
15+
| website_html | string | No | Raw HTML (max 2MB, mutually exclusive) |
16+
| website_markdown | string | No | Markdown content (max 2MB, mutually exclusive) |
17+
| user_prompt | string | Yes | Extraction instructions |
18+
| output_schema | object | No | JSON schema for output structure |
19+
| number_of_scrolls | number | No | Infinite scroll iterations (0-100) |
20+
| total_pages | number | No | Pagination depth (1-100) |
21+
| stealth | boolean | No | Anti-bot bypass (+4 credits) |
22+
| cookies | object | No | Session cookies |
23+
| headers | object | No | Custom HTTP headers |
24+
| plain_text | boolean | No | Return plaintext instead of JSON |
25+
| webhook_url | string | No | Webhook for completion notification |
26+
27+
## Response (CompletedSmartscraperResponse)
28+
29+
| Field | Type | Description |
30+
|---|---|---|
31+
| request_id | string | Unique request identifier |
32+
| status | string | `queued` \| `processing` \| `completed` \| `failed` |
33+
| website_url | string | Processed URL |
34+
| user_prompt | string | Original prompt |
35+
| result | object \| null | Extracted data matching schema |
36+
| error | string | Error message (empty on success) |

0 commit comments

Comments
 (0)