Skip to content

feat!: migrate scrapegraph-js to api v2#11

Open
lurenss wants to merge 6 commits intomainfrom
feat/sdk-v2-migration
Open

feat!: migrate scrapegraph-js to api v2#11
lurenss wants to merge 6 commits intomainfrom
feat/sdk-v2-migration

Conversation

@lurenss
Copy link
Copy Markdown
Member

@lurenss lurenss commented Mar 30, 2026

Summary

  • Port the standalone JS SDK to the new v2 client surface from sgai-stack feat/sdk-v2
  • Replace the old flat function API with scrapegraphai(config) factory
  • Move requests to the /api/v2/* endpoints and add crawl / monitor namespaces
  • Add the v2 request layer with retry logic, Zod-to-JSON-schema support, and a full unit test suite
  • Rewrite the README and integration script to document the new SDK shape

Breaking Changes

  • smartScraper, searchScraper, markdownify, agenticScraper, sitemap, generateSchema, getCredits, checkHealth are removed from the public API
  • Consumers must create a client with scrapegraphai({ apiKey, ...config })
  • crawl and monitor are now namespaced methods on the client (crawl.start, crawl.status, crawl.stop, monitor.create, monitor.delete, etc.)
  • Requests now target the /api/v2 surface

Unit Tests

14 tests run with bun test against local mock HTTP servers (no real API calls).

tests/http.test.ts — request layer

  • POST with JSON body returns { data, requestId }
  • Retries on 502 (verified attempt count)
  • Sends Authorization, SGAI-APIKEY, and X-SDK-Version headers
  • Throws on 401 with error message from response body

tests/client.test.ts — client methods

  • scrape() returns markdown + requestId
  • extract() sends prompt + schema in body
  • search() sends query in body
  • credits() returns balance
  • crawl.start() returns job id
  • crawl.status() returns status
  • crawl.stop() sends POST
  • monitor.create() sends body
  • monitor.delete() sends DELETE method
  • history() appends query params (page, limit, service)

Integration Tests (Dev Server)

22 tests total, 22 pass, 0 failures — tested against https://sgai-api-dev-v2.onrender.com/api/v2

1. Scrape — 7/7 ✅

# Test Config Time Status
1 Simple page (markdown) sgai.scrape("https://example.com") 154ms
2 Complex page (markdown) sgai.scrape("https://news.ycombinator.com") 2013ms
3 HTML format sgai.scrape(url, { format: "html" }) 202ms
4 Screenshot format sgai.scrape(url, { format: "screenshot" }) 52ms
5 FetchConfig (mock) { fetchConfig: { mock: true } } 150ms
6 FetchConfig (stealth + wait) { fetchConfig: { stealth: true, wait: 500 } } 251ms
7 Heavy page (Wikipedia) sgai.scrape("https://en.wikipedia.org/wiki/Web_scraping") 775ms

2. Extract — 5/5 ✅

# Test Config Time Status
1 Basic prompt sgai.extract(url, { prompt: "Extract title and description" }) 1115ms
2 With JSON schema { schema: { type: "object", properties: {...} } } 672ms
3 Complex (Hacker News) Extract top 5 posts with title/points/author 3224ms
4 With fetchConfig { fetchConfig: { mock: true } } 980ms
5 With llmConfig { llmConfig: { temperature: 0 } } 748ms

3. Search — 4/4 ✅

# Test Config Time Status
1 Basic search sgai.search("what is web scraping") 1818ms
2 numResults=3 sgai.search("Python SDK", { numResults: 3 }) 3684ms
3 numResults=10 sgai.search("ScrapeGraph AI", { numResults: 10 }) 9921ms
4 With llmConfig { llmConfig: { temperature: 0 } } 2310ms

4. History — 4/4 ✅

# Test Config Time Status
1 No filters sgai.history() 74ms
2 With limit sgai.history({ limit: 5 }) 66ms
3 Filter by service (scrape) sgai.history({ service: "scrape", limit: 3 }) 107ms
4 Filter by service (extract) sgai.history({ service: "extract", limit: 3 }) 55ms

5. Credits — 1/1 ✅

# Test Response Time Status
1 Get credits {"remaining": 249347, "used": 651, "plan": "Pro Plan"} 63ms

6. Error Handling — 1/1 ✅

# Test Expected Behavior Result
1 Invalid API key Server returns error Invalid or deprecated API key

Summary

Endpoint Tests Status
scrape (markdown, html, screenshot, mock, stealth) 7
extract (basic, schema, complex, fetchConfig, llmConfig) 5
search (basic, numResults, llmConfig) 4
history (no filters, limit, service filter) 4
credits 1
Error handling 1

Validation

bun test tests/   # 14 unit tests, all passing
bun run check     # type-check
bun run build     # bundle

🤖 Generated with Claude Code

src/client.ts Outdated
import { DEFAULT_BASE_URL } from "./types/index.js";
import { toJsonSchema } from "./zod.js";

/** Create a ScrapeGraphAI client. All methods return `{ data, _requestId }`. */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why requestId private?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to requestId and updated the docs/tests to match. The response shape is now { data, requestId }.

src/client.ts Outdated

return {
async scrape(
targetUrl: string,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TatgetUrl? Java shit -> url you have the zod api schemas from the main repo; match them 100%

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed targetUrl to url. I also copied the shared contract layer from sgai-stack into src/schemas.ts / src/models.ts / src/url.ts so the request types are derived from the same schema source.

src/client.ts Outdated
return {
async scrape(
targetUrl: string,
options?: { format?: string; fetchConfig?: Record<string, unknown> },
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is forbitten; inline types are shit. Match the ones in the main repo. @lorenzo have u prompt it to use the types of the main repo?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the inline method types. src/types/index.ts now uses named types inferred from the copied shared schemas from sgai-stack instead of ad hoc shapes in client.ts.

src/client.ts Outdated
},

async extract(
targetUrl: string,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same shit and wtf is ro???

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed ro to requestOptions throughout client.ts.

src/client.ts Outdated
async extract(
targetUrl: string,
options: { prompt: string; schema?: unknown; llmConfig?: Record<string, unknown> },
ro?: RequestOptions,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have options and requestOption? What in tjis world is thir garbage. USE THE TYPES FROM SGAI STACK

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to use named types from the copied sgai-stack contract layer. src/schemas.ts / src/models.ts / src/url.ts are now ported over, and src/types/index.ts infers the request types from those schemas. The only SDK-specific adaptation left is keeping extract.schema widened at the SDK boundary so Zod input can still flow through toJsonSchema.

VinciGit00 and others added 2 commits March 30, 2026 10:25
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VinciGit00 added a commit to ScrapeGraphAI/just-scrape that referenced this pull request Mar 31, 2026
Align the CLI with ScrapeGraphAI/scrapegraph-js#11 (v2 SDK migration):

- Rename smart-scraper → extract, search-scraper → search
- Remove commands dropped from the API: agentic-scraper, generate-schema, sitemap, validate
- Add client factory (src/lib/client.ts) using the new scrapegraphai({ apiKey }) pattern
- Update scrape command with --format flag (markdown, html, screenshot, branding)
- Update crawl to use crawl.start/status polling lifecycle
- Update history to use v2 service names and parameters
- All commands now use try/catch (v2 throws on error) and self-timed elapsed

BREAKING CHANGE: CLI commands have been renamed and removed to match the v2 API surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants