Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .DS_Store
Binary file not shown.
226 changes: 67 additions & 159 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,7 @@
[![npm version](https://badge.fury.io/js/scrapegraph-js.svg)](https://badge.fury.io/js/scrapegraph-js)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)

<p align="left">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
</p>

Official TypeScript SDK for the [ScrapeGraph AI API](https://scrapegraphai.com). Zero dependencies.
Official ScrapeGraphAI SDK for the ScrapeGraph AI API v2.

## Install

Expand All @@ -20,224 +16,136 @@ bun add scrapegraph-js
## Quick Start

```ts
import { smartScraper } from "scrapegraph-js";
import { scrapegraphai } from "scrapegraph-js";

const result = await smartScraper("your-api-key", {
user_prompt: "Extract the page title and description",
website_url: "https://example.com",
});
const sgai = scrapegraphai({ apiKey: "your-api-key" });

if (result.status === "success") {
console.log(result.data);
} else {
console.error(result.error);
}
const result = await sgai.scrape("https://example.com", { format: "markdown" });

console.log(result.data);
console.log(result._requestId);
```

Every function returns `ApiResult<T>` — no exceptions to catch:
Every method returns:

```ts
type ApiResult<T> = {
status: "success" | "error";
data: T | null;
error?: string;
elapsedMs: number;
data: T;
_requestId: string;
};
```

## API

All functions take `(apiKey, params)` where `params` is a typed object.

### smartScraper

Extract structured data from a webpage using AI.
Create a client once, then call the available v2 endpoints:

```ts
const res = await smartScraper("key", {
user_prompt: "Extract product names and prices",
website_url: "https://example.com",
output_schema: { /* JSON schema */ }, // optional
number_of_scrolls: 5, // optional, 0-50
total_pages: 3, // optional, 1-100
stealth: true, // optional, +4 credits
cookies: { session: "abc" }, // optional
headers: { "Accept-Language": "en" }, // optional
steps: ["Click 'Load More'"], // optional, browser actions
wait_ms: 5000, // optional, default 3000
country_code: "us", // optional, proxy routing
mock: true, // optional, testing mode
const sgai = scrapegraphai({
apiKey: "your-api-key",
baseUrl: "https://api.scrapegraphai.com", // optional
timeout: 30000, // optional
maxRetries: 2, // optional
});
```

### searchScraper

Search the web and extract structured results.
### scrape

```ts
const res = await searchScraper("key", {
user_prompt: "Latest TypeScript release features",
num_results: 5, // optional, 3-20
extraction_mode: true, // optional, false for markdown
output_schema: { /* */ }, // optional
stealth: true, // optional, +4 credits
time_range: "past_week", // optional, past_hour|past_24_hours|past_week|past_month|past_year
location_geo_code: "us", // optional, geographic targeting
mock: true, // optional, testing mode
await sgai.scrape("https://example.com", {
format: "markdown",
fetchConfig: {
mock: false,
},
});
// res.data.result (extraction mode) or res.data.markdown_content (markdown mode)
```

### markdownify
### extract

Convert a webpage to clean markdown.
Raw JSON schema:

```ts
const res = await markdownify("key", {
website_url: "https://example.com",
stealth: true, // optional, +4 credits
wait_ms: 5000, // optional, default 3000
country_code: "us", // optional, proxy routing
mock: true, // optional, testing mode
await sgai.extract("https://example.com", {
prompt: "Extract the page title",
schema: {
type: "object",
properties: {
title: { type: "string" },
},
},
});
// res.data.result is the markdown string
```

### scrape

Get raw HTML from a webpage.
Zod schema:

```ts
const res = await scrape("key", {
website_url: "https://example.com",
stealth: true, // optional, +4 credits
branding: true, // optional, extract brand design
country_code: "us", // optional, proxy routing
wait_ms: 5000, // optional, default 3000
import { z } from "zod";

await sgai.extract("https://example.com", {
prompt: "Extract the page title",
schema: z.object({
title: z.string(),
}),
});
// res.data.html is the HTML string
// res.data.scrape_request_id is the request identifier
```

### crawl

Crawl a website and its linked pages. Async — polls until completion.
### search

```ts
const res = await crawl(
"key",
{
url: "https://example.com",
prompt: "Extract company info", // required when extraction_mode=true
max_pages: 10, // optional, default 10
depth: 2, // optional, default 1
breadth: 5, // optional, max links per depth
schema: { /* JSON schema */ }, // optional
sitemap: true, // optional
stealth: true, // optional, +4 credits
wait_ms: 5000, // optional, default 3000
batch_size: 3, // optional, default 1
same_domain_only: true, // optional, default true
cache_website: true, // optional
headers: { "Accept-Language": "en" }, // optional
},
(status) => console.log(status), // optional poll callback
);
await sgai.search("What is the capital of France?", {
numResults: 5,
});
```

### agenticScraper

Automate browser actions (click, type, navigate) then extract data.
### schema

```ts
const res = await agenticScraper("key", {
url: "https://example.com/login",
steps: ["Type user@example.com in email", "Click login button"], // required
user_prompt: "Extract dashboard data", // required when ai_extraction=true
output_schema: { /* */ }, // required when ai_extraction=true
ai_extraction: true, // optional
use_session: true, // optional
});
await sgai.schema("A product with name and price");
```

### generateSchema

Generate a JSON schema from a natural language description.
### credits

```ts
const res = await generateSchema("key", {
user_prompt: "Schema for a product with name, price, and rating",
existing_schema: { /* modify this */ }, // optional
});
await sgai.credits();
```

### sitemap

Extract all URLs from a website's sitemap.
### history

```ts
const res = await sitemap("key", {
website_url: "https://example.com",
headers: { /* */ }, // optional
stealth: true, // optional, +4 credits
mock: true, // optional, testing mode
await sgai.history({
page: 1,
limit: 10,
service: "scrape",
});
// res.data.urls is string[]
```

### getCredits / checkHealth
### crawl

```ts
const credits = await getCredits("key");
// { remaining_credits: 420, total_credits_used: 69 }
const crawl = await sgai.crawl.start("https://example.com", {
maxPages: 10,
maxDepth: 2,
});

const health = await checkHealth("key");
// { status: "healthy" }
await sgai.crawl.status((crawl.data as { id: string }).id);
```

### history

Fetch request history for any service.
### monitor

```ts
const res = await history("key", {
service: "smartscraper",
page: 1, // optional, default 1
page_size: 10, // optional, default 10
await sgai.monitor.create({
url: "https://example.com",
prompt: "Notify me when the price changes",
interval: "1h",
});
```

## Examples

Find complete working examples in the [`examples/`](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples) directory:

| Service | Examples |
|---|---|
| [SmartScraper](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/smartscraper) | basic, cookies, html input, infinite scroll, markdown input, pagination, stealth, with schema |
| [SearchScraper](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/searchscraper) | basic, markdown mode, with schema |
| [Markdownify](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/markdownify) | basic, stealth |
| [Scrape](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/scrape) | basic, stealth, with branding |
| [Crawl](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/crawl) | basic, markdown mode, with schema |
| [Agentic Scraper](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/agenticscraper) | basic, AI extraction |
| [Schema Generation](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/schema) | basic, modify existing |
| [Sitemap](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/sitemap) | basic, with smartscraper |
| [Utilities](https://github.com/ScrapeGraphAI/scrapegraph-js/tree/main/examples/utilities) | credits, health, history |

## Environment Variables

| Variable | Description | Default |
|---|---|---|
| `SGAI_API_URL` | Override API base URL | `https://api.scrapegraphai.com/v1` |
| `SGAI_DEBUG` | Enable debug logging (`"1"`) | off |
| `SGAI_TIMEOUT_S` | Request timeout in seconds | `120` |

## Development

```bash
bun install
bun test # 21 tests
bun run build # tsup → dist/
bun run check # tsc --noEmit + biome
bun test
bun run check
bun run build
```

## License
Expand Down
14 changes: 9 additions & 5 deletions biome.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
{
"$schema": "https://biomejs.dev/schemas/1.9.4/schema.json",
"organizeImports": {
"enabled": true
"$schema": "https://biomejs.dev/schemas/2.4.9/schema.json",
"assist": {
"actions": {
"source": {
"organizeImports": "on"
}
}
},
"formatter": {
"enabled": true,
Expand All @@ -16,7 +20,7 @@
},
"overrides": [
{
"include": ["tests/**"],
"includes": ["tests/**"],
"linter": {
"rules": {
"suspicious": {
Expand All @@ -27,6 +31,6 @@
}
],
"files": {
"ignore": ["node_modules", "dist", "bun.lock", ".claude", "examples"]
"includes": ["**", "!dist", "!node_modules", "!bun.lock"]
}
}
1 change: 0 additions & 1 deletion examples/.env.example

This file was deleted.

35 changes: 0 additions & 35 deletions examples/agenticscraper/agenticscraper_ai_extraction.ts

This file was deleted.

22 changes: 0 additions & 22 deletions examples/agenticscraper/agenticscraper_basic.ts

This file was deleted.

Loading
Loading