Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 90 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,65 @@

Command-line interface for [ScrapeGraph AI](https://scrapegraphai.com) — AI-powered web scraping, data extraction, search, and crawling.

## Installation

### From npm (recommended)

Install globally to use `just-scrape` from anywhere:

```bash
npm install -g just-scrape
```

Or use it directly without installing via `npx`:

```bash
npx just-scrape --help
```

You can also install with other package managers:

```bash
# pnpm
pnpm add -g just-scrape

# yarn
yarn global add just-scrape

# bun
bun add -g just-scrape
```

Package: [just-scrape](https://www.npmjs.com/package/just-scrape) on npm.

### From source (local development)

Requires [Bun](https://bun.sh) and Node.js 22+.

```bash
# Clone the repository
git clone https://github.com/ScrapeGraphAI/just-scrape.git
cd just-scrape

# Install dependencies
bun install

# Run directly from source (no build needed)
bun run dev --help

# Or build and link globally
bun run build
npm link
just-scrape --help
```

### Verify installation

```bash
just-scrape --help
just-scrape validate # check your API key
```

## Tech Stack

| Concern | Tool |
Expand Down Expand Up @@ -37,35 +96,31 @@ Four ways to provide it (checked in order):
3. **Config file**: stored in `~/.scrapegraphai/config.json`
4. **Interactive prompt**: if none of the above are set, the CLI prompts you and saves it to the config file

### Timeout

Set `SGAI_CLI_TIMEOUT_S` to override the default 120s request/polling timeout:

```bash
export SGAI_CLI_TIMEOUT_S=300
```
### Environment Variables

### Debug Logging

Set `SGAI_CLI_DEBUG=1` to enable debug logging (outputs to stderr):
| Variable | Default | Description |
|---|---|---|
| `JUST_SCRAPE_TIMEOUT_S` | `120` | Request/polling timeout in seconds |
| `JUST_SCRAPE_DEBUG` | `0` | Set to `1` to enable debug logging (outputs to stderr) |

```bash
SGAI_CLI_DEBUG=1 scrapegraphai smart-scraper https://example.com -p "Extract data"
export JUST_SCRAPE_TIMEOUT_S=300
JUST_SCRAPE_DEBUG=1 just-scrape smart-scraper https://example.com -p "Extract data"
```

## Commands

### `smart-scraper` — Extract structured data from a URL [docs](https://docs.scrapegraphai.com/services/smartscraper)

```bash
scrapegraphai smart-scraper <url> -p "Extract all product names and prices"
just-scrape smart-scraper <url> -p "Extract all product names and prices"

# With JSON schema
scrapegraphai smart-scraper https://example.com/products -p "Extract products" \
just-scrape smart-scraper https://example.com/products -p "Extract products" \
--schema '{"type":"object","properties":{"products":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}}}}'

# With options
scrapegraphai smart-scraper https://example.com -p "Extract data" \
just-scrape smart-scraper https://example.com -p "Extract data" \
--stealth --render-js --scrolls 10 --pages 5
```

Expand All @@ -84,10 +139,10 @@ scrapegraphai smart-scraper https://example.com -p "Extract data" \
### `search-scraper` — Search the web and extract data [docs](https://docs.scrapegraphai.com/services/searchscraper)

```bash
scrapegraphai search-scraper "What are the top Python web frameworks?"
just-scrape search-scraper "What are the top Python web frameworks?"

# Markdown only (cheaper)
scrapegraphai search-scraper "Python frameworks" --no-extraction --num-results 5
just-scrape search-scraper "Python frameworks" --no-extraction --num-results 5
```

| Option | Description |
Expand All @@ -101,8 +156,8 @@ scrapegraphai search-scraper "Python frameworks" --no-extraction --num-results 5
### `markdownify` — Convert a webpage to markdown [docs](https://docs.scrapegraphai.com/services/markdownify)

```bash
scrapegraphai markdownify https://example.com/article
scrapegraphai markdownify https://example.com --render-js --stealth
just-scrape markdownify https://example.com/article
just-scrape markdownify https://example.com --render-js --stealth
```

| Option | Description |
Expand All @@ -114,13 +169,13 @@ scrapegraphai markdownify https://example.com --render-js --stealth
### `crawl` — Crawl and extract from multiple pages [docs](https://docs.scrapegraphai.com/services/smartcrawler)

```bash
scrapegraphai crawl https://example.com -p "Extract article titles" --max-pages 5 --depth 2
just-scrape crawl https://example.com -p "Extract article titles" --max-pages 5 --depth 2

# Markdown only
scrapegraphai crawl https://example.com --no-extraction --max-pages 10
just-scrape crawl https://example.com --no-extraction --max-pages 10

# With crawl rules
scrapegraphai crawl https://example.com -p "Extract data" \
just-scrape crawl https://example.com -p "Extract data" \
--rules '{"include_paths":["/blog/*"],"same_domain":true}'
```

Expand All @@ -139,14 +194,14 @@ scrapegraphai crawl https://example.com -p "Extract data" \
### `sitemap` — Get all URLs from a website's sitemap [docs](https://docs.scrapegraphai.com/services/sitemap)

```bash
scrapegraphai sitemap https://example.com
just-scrape sitemap https://example.com
```

### `scrape` — Get raw HTML content [docs](https://docs.scrapegraphai.com/services/scrape)

```bash
scrapegraphai scrape https://example.com
scrapegraphai scrape https://example.com --stealth --branding --country-code US
just-scrape scrape https://example.com
just-scrape scrape https://example.com --stealth --branding --country-code US
```

| Option | Description |
Expand All @@ -159,7 +214,7 @@ scrapegraphai scrape https://example.com --stealth --branding --country-code US
### `agentic-scraper` — Browser automation with AI [docs](https://docs.scrapegraphai.com/services/agenticscraper)

```bash
scrapegraphai agentic-scraper https://example.com/login \
just-scrape agentic-scraper https://example.com/login \
-s "Fill email with user@test.com,Fill password with pass123,Click Sign In" \
--ai-extraction -p "Extract dashboard data"
```
Expand All @@ -175,7 +230,7 @@ scrapegraphai agentic-scraper https://example.com/login \
### `generate-schema` — Generate JSON schema from a prompt

```bash
scrapegraphai generate-schema "Schema for an e-commerce product with name, price, and reviews"
just-scrape generate-schema "Schema for an e-commerce product with name, price, and reviews"
```

| Option | Description |
Expand All @@ -185,13 +240,13 @@ scrapegraphai generate-schema "Schema for an e-commerce product with name, price
### `credits` — Check credit balance

```bash
scrapegraphai credits
just-scrape credits
```

### `validate` — Validate your API key

```bash
scrapegraphai validate
just-scrape validate
```

## Testing
Expand All @@ -207,7 +262,7 @@ Covers all SDK functions: success paths, polling, HTTP error mapping (401/402/42
## Project Structure

```
scrapegraph-cli/
just-scrape/
├── src/
│ ├── cli.ts # Entry point, citty main command + subcommands
│ ├── lib/
Expand Down Expand Up @@ -259,12 +314,16 @@ All commands output pretty-printed JSON to stdout (pipeable). Errors go to stder

```bash
# Pipe output to jq
scrapegraphai credits | jq '.remaining_credits'
just-scrape credits | jq '.remaining_credits'

# Save to file
scrapegraphai smart-scraper https://example.com -p "Extract data" > result.json
just-scrape smart-scraper https://example.com -p "Extract data" > result.json
```

## License

ISC

---

Made with love by the [ScrapeGraphAI](https://scrapegraphai.com) team.
Binary file modified assets/demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/demo.mp4
Binary file not shown.
1 change: 1 addition & 0 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
{
"name": "just-scrape",
"version": "0.1.0",
"version": "0.1.1",
"description": "ScrapeGraph AI CLI tool",
"type": "module",
"main": "dist/cli.mjs",
"bin": {
"scrapegraphai": "dist/cli.mjs"
"just-scrape": "dist/cli.mjs"
},
"scripts": {
"dev": "bun run src/cli.ts",
Expand Down
2 changes: 1 addition & 1 deletion src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ showBanner();

const main = defineCommand({
meta: {
name: "scrapegraphai",
name: "just-scrape",
version: getVersion(),
description: "ScrapeGraph AI CLI tool",
},
Expand Down
6 changes: 4 additions & 2 deletions src/lib/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,10 @@ function resolve(): Env {

return EnvSchema.parse({
apiKey: process.env.SGAI_API_KEY || (config["api-key"] as string) || undefined,
debug: process.env.SGAI_CLI_DEBUG === "1",
timeoutS: process.env.SGAI_CLI_TIMEOUT_S ? Number(process.env.SGAI_CLI_TIMEOUT_S) : undefined,
debug: process.env.JUST_SCRAPE_DEBUG === "1",
timeoutS: process.env.JUST_SCRAPE_TIMEOUT_S
? Number(process.env.JUST_SCRAPE_TIMEOUT_S)
: undefined,
});
}

Expand Down
17 changes: 9 additions & 8 deletions src/utils/banner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,20 @@ export function getVersion(): string {
}

const BANNER = [
"╔═╗╔═╗╦═╗╔═╗╔═╗╔═╗╔═╗╦═╗╔═╗╔═╗╦ ╦╔═╗╦",
"╚═╗║ ╠╦╝╠═╣╠═╝║╣ ║ ╦╠╦╝╠═╣╠═╝╠═╣╠═╣║",
"╚═╝╚═╝╩╚═╩ ╩╩ ╚═╝╚═╝╩╚═╩ ╩╩ ╩ ╩╩ ╩╩",
].join("\n");
"╔═╗╦ ╦╔═╗╔╦╗ ╔═╗╔═╗╦═╗╔═╗╔═╗╔═╗",
" ║║ ║╚═╗ ║ ═══╚═╗║ ╠╦╝╠═╣╠═╝║╣ ",
"╚═╝╚═╝╚═╝ ╩ ╚═╝╚═╝╩╚═╩ ╩╩ ╚═╝",
];

const TAGLINE = " made with ♥ from scrapegraphai team";

const BANNER_COLOR = "#bd93f9";

export function showBanner() {
const colored = BANNER.split("\n")
.map((line) => chalk.hex(BANNER_COLOR)(line))
.join("\n");
const text = BANNER.map((line) => chalk.hex(BANNER_COLOR)(line)).join("\n");

console.log(colored);
console.log(text);
console.log(chalk.hex(BANNER_COLOR)(TAGLINE));
console.log(chalk.hex(BANNER_COLOR)(`v${getVersion()}`));
console.log();
}