Skip to content

Latest commit

 

History

History
156 lines (117 loc) · 4.33 KB

File metadata and controls

156 lines (117 loc) · 4.33 KB
title Anthropic
description Use ScrapeGraphAI with Claude for web scraping + AI workflows

Integrate ScrapeGraphAI with Claude to build AI applications powered by web data.

Setup

npm install scrapegraph-js @anthropic-ai/sdk zod zod-to-json-schema

Create .env file:

SGAI_APIKEY=your_scrapegraph_key
ANTHROPIC_API_KEY=your_anthropic_key
If using Node < 20, install `dotenv` and add `import 'dotenv/config'` to your code.

Scrape + Summarize

This example demonstrates a simple workflow: scrape a website and summarize the content using Claude.

import { scrapegraphai } from 'scrapegraph-js';
import Anthropic from '@anthropic-ai/sdk';

const sgai = scrapegraphai({ apiKey: process.env.SGAI_APIKEY });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const { data } = await sgai.extract('https://scrapegraphai.com', {
    prompt: 'Extract all content from this page',
});

console.log('Scraped content length:', JSON.stringify(data).length);

const message = await anthropic.messages.create({
    model: 'claude-haiku-4-5',
    max_tokens: 1024,
    messages: [
        { role: 'user', content: `Summarize in 100 words: ${JSON.stringify(data)}` }
    ]
});

console.log('Response:', message);

Tool Use

This example shows how to use Claude's tool use feature to let the model decide when to scrape websites based on user requests.

import { scrapegraphai } from 'scrapegraph-js';
import { Anthropic } from '@anthropic-ai/sdk';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

const sgai = scrapegraphai({ apiKey: process.env.SGAI_APIKEY });
const anthropic = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY
});

const ScrapeArgsSchema = z.object({
    url: z.string()
});

console.log("Sending user message to Claude and requesting tool use if necessary...");
const response = await anthropic.messages.create({
    model: 'claude-haiku-4-5',
    max_tokens: 1024,
    tools: [{
        name: 'scrape_website',
        description: 'Scrape and extract structured data from a website URL',
        input_schema: zodToJsonSchema(ScrapeArgsSchema, 'ScrapeArgsSchema') as any
    }],
    messages: [{
        role: 'user',
        content: 'What is ScrapeGraphAI? Check scrapegraphai.com'
    }]
});

const toolUse = response.content.find(block => block.type === 'tool_use');

if (toolUse && toolUse.type === 'tool_use') {
    const input = toolUse.input as { url: string };
    console.log(`Calling tool: ${toolUse.name} | URL: ${input.url}`);

    const { data } = await sgai.extract(input.url, {
        prompt: 'Extract all content from this page',
    });

    console.log(`Scraped content preview: ${JSON.stringify(data)?.substring(0, 300)}...`);
    // Continue with the conversation or process the scraped content as needed
}

Structured Extraction

This example demonstrates how to use Claude to extract structured data from scraped website content.

import { scrapegraphai } from 'scrapegraph-js';
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';

const sgai = scrapegraphai({ apiKey: process.env.SGAI_APIKEY });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const CompanyInfoSchema = z.object({
    name: z.string(),
    industry: z.string().optional(),
    description: z.string().optional()
});

const { data } = await sgai.extract('https://stripe.com', {
    prompt: 'Extract all content from this page',
});

const prompt = `Extract company information from this website content.

Output ONLY valid JSON in this exact format (no markdown, no explanation):

{
  "name": "Company Name",
  "industry": "Industry",
  "description": "One sentence description"
}

Website content:
${JSON.stringify(data)}`;

const message = await anthropic.messages.create({
    model: 'claude-haiku-4-5',
    max_tokens: 1024,
    messages: [
        { role: 'user', content: prompt },
        { role: 'assistant', content: '{' }
    ]
});

const textBlock = message.content.find(block => block.type === 'text');

if (textBlock && textBlock.type === 'text') {
    const jsonText = '{' + textBlock.text;
    const companyInfo = CompanyInfoSchema.parse(JSON.parse(jsonText));
  
    console.log(companyInfo);
}

For more examples, check the Claude documentation.