Skip to content

Commit f6d8b9c

Browse files
davila7claude
andauthored
feat: add Bright Data web data template as featured integration (#413)
* feat: add Bright Data web data template as featured integration Add Bright Data skills (search, scrape, data-feeds, bright-data-mcp, bright-data-best-practices, design-mirror), MCP component, and featured page with complete documentation covering 60+ tools for web search, scraping, structured data extraction, and browser automation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: regenerate components.json with Bright Data components Updates catalog to include new web-data skills (search, scrape, data-feeds, bright-data-mcp, bright-data-best-practices, design-mirror) and brightdata MCP component. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(dashboard): improve Bright Data featured page UX - Move Bright Data to first position in featured list - Add View buttons linking to each skill detail page - Add rounded corners to logo images across featured cards, header, and sidebar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2ce9a68 commit f6d8b9c

28 files changed

Lines changed: 7728 additions & 3083 deletions

File tree

.claude/launch.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{
2+
"version": "0.0.1",
3+
"configurations": [
4+
{
5+
"name": "dashboard",
6+
"runtimeExecutable": "npx",
7+
"runtimeArgs": ["astro", "dev", "--port", "4321"],
8+
"port": 4321,
9+
"cwd": "dashboard"
10+
}
11+
]
12+
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"mcpServers": {
3+
"brightdata": {
4+
"description": "Bright Data MCP server providing 60+ tools for web search, scraping, structured data extraction, and browser automation across major platforms",
5+
"url": "https://mcp.brightdata.com/mcp?token=YOUR_BRIGHTDATA_API_TOKEN&pro=1"
6+
}
7+
}
8+
}
Lines changed: 368 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,368 @@
1+
---
2+
name: bright-data-best-practices
3+
description: "Build production-ready Bright Data integrations with best practices baked in. Reference documentation for developers using coding assistants (Claude Code, Cursor, etc.) to implement web scraping, search, browser automation, and structured data extraction. Covers Web Unlocker API, SERP API, Web Scraper API, and Browser API (Scraping Browser)."
4+
user-invocable: false
5+
---
6+
7+
# Bright Data APIs
8+
9+
Bright Data provides infrastructure for web data extraction at scale. Four primary APIs cover different use cases — always pick the most specific tool for the job.
10+
11+
## Choosing the Right API
12+
13+
| Use Case | API | Why |
14+
|----------|-----|-----|
15+
| Scrape any webpage by URL (no interaction) | Web Unlocker | HTTP-based, auto-bypasses bot detection, cheapest |
16+
| Google / Bing / Yandex search results | SERP API | Specialized for SERP extraction, returns structured data |
17+
| Structured data from Amazon, LinkedIn, Instagram, TikTok, etc. | Web Scraper API | Pre-built scrapers, no parsing needed |
18+
| Click, scroll, fill forms, run JS, intercept XHR | Browser API | Full browser automation |
19+
| Puppeteer / Playwright / Selenium automation | Browser API | Connects via CDP/WebDriver |
20+
21+
## Authentication Pattern (All APIs)
22+
23+
All APIs share the same authentication model:
24+
25+
```bash
26+
export BRIGHTDATA_API_KEY="your-api-key" # From Control Panel > Account Settings
27+
export BRIGHTDATA_UNLOCKER_ZONE="zone-name" # Web Unlocker zone name
28+
export BRIGHTDATA_SERP_ZONE="serp-zone-name" # SERP API zone name
29+
export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD" # Browser API credentials
30+
```
31+
32+
REST API authentication header for Web Unlocker and SERP API:
33+
```
34+
Authorization: Bearer YOUR_API_KEY
35+
```
36+
37+
---
38+
39+
## Web Unlocker API
40+
41+
HTTP-based scraping proxy. Best for simple page fetches without browser interaction.
42+
43+
**Endpoint:** `POST https://api.brightdata.com/request`
44+
45+
```python
46+
import requests
47+
48+
response = requests.post(
49+
"https://api.brightdata.com/request",
50+
headers={"Authorization": f"Bearer {API_KEY}"},
51+
json={
52+
"zone": "YOUR_ZONE_NAME",
53+
"url": "https://example.com/product/123",
54+
"format": "raw"
55+
}
56+
)
57+
html = response.text
58+
```
59+
60+
### Key Parameters
61+
62+
| Parameter | Type | Description |
63+
|-----------|------|-------------|
64+
| `zone` | string | Zone name (required) |
65+
| `url` | string | Target URL with `http://` or `https://` (required) |
66+
| `format` | string | `"raw"` (HTML) or `"json"` (structured wrapper) (required) |
67+
| `method` | string | HTTP verb, default `"GET"` |
68+
| `country` | string | 2-letter ISO for geo-targeting (e.g., `"us"`, `"de"`) |
69+
| `data_format` | string | Transform: `"markdown"` or `"screenshot"` |
70+
| `async` | boolean | `true` for async mode |
71+
72+
### Quick Patterns
73+
74+
```python
75+
# Get markdown (best for LLM input)
76+
response = requests.post(
77+
"https://api.brightdata.com/request",
78+
headers={"Authorization": f"Bearer {API_KEY}"},
79+
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"}
80+
)
81+
82+
# Geo-targeted request
83+
json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}
84+
85+
# Screenshot for debugging
86+
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}
87+
88+
# Async for bulk processing
89+
json={"zone": ZONE, "url": url, "format": "raw", "async": True}
90+
```
91+
92+
**Critical rule:** Never use Web Unlocker with Puppeteer, Playwright, Selenium, or anti-detect browsers. Use Browser API instead.
93+
94+
See **[references/web-unlocker.md](references/web-unlocker.md)** for complete reference including proxy interface, special headers, async flow, features, and billing.
95+
96+
---
97+
98+
## SERP API
99+
100+
Structured search engine result extraction for Google, Bing, Yandex, DuckDuckGo.
101+
102+
**Endpoint:** `POST https://api.brightdata.com/request` (same as Web Unlocker)
103+
104+
```python
105+
response = requests.post(
106+
"https://api.brightdata.com/request",
107+
headers={"Authorization": f"Bearer {API_KEY}"},
108+
json={
109+
"zone": "YOUR_SERP_ZONE",
110+
"url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",
111+
"format": "raw"
112+
}
113+
)
114+
data = response.json()
115+
for result in data.get("organic", []):
116+
print(result["rank"], result["title"], result["link"])
117+
```
118+
119+
### Essential Google URL Parameters
120+
121+
| Parameter | Description | Example |
122+
|-----------|-------------|---------|
123+
| `q` | Search query | `q=python+web+scraping` |
124+
| `brd_json` | Parsed JSON output | `brd_json=1` (always use for data pipelines) |
125+
| `gl` | Country for search | `gl=us` |
126+
| `hl` | Language | `hl=en` |
127+
| `start` | Pagination offset | `start=10` (page 2), `start=20` (page 3) |
128+
| `tbm` | Search type | `tbm=nws` (news), `tbm=isch` (images), `tbm=vid` (videos) |
129+
| `brd_mobile` | Device | `brd_mobile=1` (mobile), `brd_mobile=ios` |
130+
| `brd_browser` | Browser | `brd_browser=chrome` |
131+
| `brd_ai_overview` | Trigger AI Overview | `brd_ai_overview=2` |
132+
| `uule` | Encoded geo location | for precise location targeting |
133+
134+
**Note:** `num` parameter is **deprecated** as of September 2025. Use `start` for pagination.
135+
136+
### Parsed JSON Response Structure
137+
138+
```json
139+
{
140+
"organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],
141+
"paid": [],
142+
"people_also_ask": [],
143+
"knowledge_graph": {},
144+
"related_searches": [],
145+
"general": {"results_cnt": 1240000000, "query": "..."}
146+
}
147+
```
148+
149+
### Bing Key Parameters
150+
151+
| Parameter | Description |
152+
|-----------|-------------|
153+
| `q` | Search query |
154+
| `setLang` | Language (prefer 4-letter: `en-US`) |
155+
| `cc` | Country code |
156+
| `first` | Pagination (increment by 10: 1, 11, 21...) |
157+
| `safesearch` | `off`, `moderate`, `strict` |
158+
| `brd_mobile` | Device type |
159+
160+
### Async for Bulk SERP
161+
162+
```python
163+
# Submit
164+
response = requests.post(
165+
"https://api.brightdata.com/request",
166+
params={"async": "1"},
167+
headers={"Authorization": f"Bearer {API_KEY}"},
168+
json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"}
169+
)
170+
response_id = response.headers.get("x-response-id")
171+
172+
# Retrieve (retrieve calls are NOT billed)
173+
result = requests.get(
174+
"https://api.brightdata.com/serp/get_result",
175+
params={"response_id": response_id},
176+
headers={"Authorization": f"Bearer {API_KEY}"}
177+
)
178+
```
179+
180+
**Billing:** Pay per 1,000 successful requests only. Async retrieve calls are not billed.
181+
182+
See **[references/serp-api.md](references/serp-api.md)** for complete reference including Maps, Trends, Reviews, Lens, Hotels, Flights parameters.
183+
184+
---
185+
186+
## Web Scraper API
187+
188+
Pre-built scrapers for structured data extraction from 100+ platforms. No parsing logic needed.
189+
190+
**Sync Endpoint:** `POST https://api.brightdata.com/datasets/v3/scrape`
191+
**Async Endpoint:** `POST https://api.brightdata.com/datasets/v3/trigger`
192+
193+
```python
194+
# Sync (up to 20 URLs, returns immediately)
195+
response = requests.post(
196+
"https://api.brightdata.com/datasets/v3/scrape",
197+
params={"dataset_id": "YOUR_DATASET_ID", "format": "json"},
198+
headers={"Authorization": f"Bearer {API_KEY}"},
199+
json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]}
200+
)
201+
202+
if response.status_code == 200:
203+
data = response.json() # Results ready
204+
elif response.status_code == 202:
205+
snapshot_id = response.json()["snapshot_id"] # Poll for completion
206+
```
207+
208+
### Parameters
209+
210+
| Parameter | Type | Description |
211+
|-----------|------|-------------|
212+
| `dataset_id` | string | Scraper identifier from the Scraper Library (required) |
213+
| `format` | string | `json` (default), `ndjson`, `jsonl`, `csv` |
214+
| `custom_output_fields` | string | Pipe-separated fields: `url\|title\|price` |
215+
| `include_errors` | boolean | Include error info in results |
216+
217+
### Request Body
218+
219+
```json
220+
{
221+
"input": [
222+
{ "url": "https://www.amazon.com/dp/B09X7M8TBQ" },
223+
{ "url": "https://www.amazon.com/dp/B0B7CTCPKN" }
224+
]
225+
}
226+
```
227+
228+
### Poll for Async Results
229+
230+
```python
231+
import time
232+
233+
# Trigger
234+
snapshot_id = requests.post(
235+
"https://api.brightdata.com/datasets/v3/trigger",
236+
params={"dataset_id": DATASET_ID, "format": "json"},
237+
headers={"Authorization": f"Bearer {API_KEY}"},
238+
json={"input": [{"url": u} for u in urls]}
239+
).json()["snapshot_id"]
240+
241+
# Poll
242+
while True:
243+
status = requests.get(
244+
f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}",
245+
headers={"Authorization": f"Bearer {API_KEY}"}
246+
).json()["status"]
247+
248+
if status == "ready": break
249+
if status == "failed": raise Exception("Job failed")
250+
time.sleep(10)
251+
252+
# Download
253+
data = requests.get(
254+
f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}",
255+
params={"format": "json"},
256+
headers={"Authorization": f"Bearer {API_KEY}"}
257+
).json()
258+
```
259+
260+
**Progress status values:** `starting``running``ready` | `failed`
261+
**Data retention:** 30 days.
262+
**Billing:** Per delivered record. Invalid input URLs that fail are still billable.
263+
264+
See **[references/web-scraper-api.md](references/web-scraper-api.md)** for complete reference including scraper types, output formats, delivery options, and billing details.
265+
266+
---
267+
268+
## Browser API (Scraping Browser)
269+
270+
Full browser automation via CDP/WebDriver. Handles CAPTCHA, fingerprinting, and anti-bot detection automatically.
271+
272+
**Connection:**
273+
- Playwright/Puppeteer: `wss://${AUTH}@brd.superproxy.io:9222`
274+
- Selenium: `https://${AUTH}@brd.superproxy.io:9515`
275+
276+
```javascript
277+
const { chromium } = require("playwright-core");
278+
279+
const AUTH = process.env.BROWSER_AUTH;
280+
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
281+
const page = await browser.newPage();
282+
page.setDefaultNavigationTimeout(120000); // Always set to 2 minutes
283+
284+
await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
285+
const html = await page.content();
286+
await browser.close();
287+
```
288+
289+
```python
290+
from playwright.async_api import async_playwright
291+
292+
async with async_playwright() as p:
293+
browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")
294+
page = await browser.new_page()
295+
page.set_default_navigation_timeout(120000)
296+
await page.goto("https://example.com", wait_until="domcontentloaded")
297+
html = await page.content()
298+
await browser.close()
299+
```
300+
301+
### Custom CDP Functions
302+
303+
| Function | Purpose |
304+
|----------|---------|
305+
| `Captcha.solve` | Manually trigger CAPTCHA solving |
306+
| `Captcha.setAutoSolve` | Enable/disable auto CAPTCHA solving |
307+
| `Proxy.setLocation` | Set precise geo location (call BEFORE goto) |
308+
| `Proxy.useSession` | Maintain same IP across sessions |
309+
| `Emulation.setDevice` | Apply device profile (iPhone 14, etc.) |
310+
| `Emulation.getSupportedDevices` | List available device profiles |
311+
| `Unblocker.enableAdBlock` | Block ads to save bandwidth |
312+
| `Unblocker.disableAdBlock` | Re-enable ads |
313+
| `Input.type` | Fast text input for bulk form filling |
314+
| `Browser.addCertificate` | Install client SSL cert for session |
315+
| `Page.inspect` | Get DevTools debug URL for live session |
316+
317+
```javascript
318+
// CDP session pattern for custom functions
319+
const client = await page.target().createCDPSession();
320+
321+
// CAPTCHA solve with timeout
322+
const result = await client.send("Captcha.solve", { timeout: 30000 });
323+
324+
// Precise geo location (must be before goto)
325+
await client.send("Proxy.setLocation", {
326+
latitude: 37.7749,
327+
longitude: -122.4194,
328+
distance: 10,
329+
strict: true
330+
});
331+
332+
// Block unnecessary resources
333+
await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });
334+
335+
// Device emulation
336+
await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });
337+
```
338+
339+
### Session Rules
340+
- **One initial navigation per session** — new URL = new session
341+
- **Idle timeout:** 5 minutes
342+
- **Max duration:** 30 minutes
343+
344+
### Geolocation
345+
- Country-level: append `-country-us` to credentials username
346+
- EU-wide: append `-country-eu` (routes through 29+ European countries)
347+
- Precise: use `Proxy.setLocation` CDP command (before navigation)
348+
349+
### Error Codes
350+
351+
| Code | Issue | Fix |
352+
|------|-------|-----|
353+
| `407` | Wrong port | Playwright/Puppeteer → `9222`, Selenium → `9515` |
354+
| `403` | Bad auth | Check credentials format and zone type |
355+
| `503` | Service scaling | Wait 1 minute, reconnect |
356+
357+
**Billing:** Traffic-based only. Block images/CSS/fonts to reduce costs.
358+
359+
See **[references/browser-api.md](references/browser-api.md)** for complete reference including all CDP functions, bandwidth optimization, CAPTCHA patterns, and debugging.
360+
361+
---
362+
363+
## Detailed References
364+
365+
- **[references/web-unlocker.md](references/web-unlocker.md)** — Web Unlocker: full parameter list, proxy interface, special headers, async flow, features, billing, anti-patterns
366+
- **[references/serp-api.md](references/serp-api.md)** — SERP API: all Google params (Maps, Trends, Reviews, Lens, Hotels, Flights), Bing params, parsed JSON structure, async, billing
367+
- **[references/web-scraper-api.md](references/web-scraper-api.md)** — Web Scraper API: sync vs async, all parameters, polling, scraper types, output formats, billing
368+
- **[references/browser-api.md](references/browser-api.md)** — Browser API: connection strings, session rules, all CDP functions, geo-targeting, bandwidth optimization, CAPTCHA, debugging, error codes

0 commit comments

Comments
 (0)