|
27 | 27 | | `brightdata discover` | AI-powered web discovery - find and rank results by intent with optional full-page content | |
28 | 28 | | `brightdata scraper create` | Build a Bright Data scraper from a natural-language description using AI | |
29 | 29 | | `brightdata scraper run` | Run a Bright Data scraper on a URL and return the data | |
| 30 | +| `brightdata scraper heal` | Fix an existing scraper in place via AI self-healing (stops at an approval gate) | |
| 31 | +| `brightdata scraper approve` | Approve (or reject) a self-healing fix that is awaiting approval | |
30 | 32 | | `brightdata pipelines` | Extract structured data from 40+ platforms (Amazon, LinkedIn, TikTok…) | |
31 | 33 | | `brightdata browser` | Control a real browser via Bright Data's Scraping Browser — navigate, snapshot, click, type, and more | |
32 | 34 | | `brightdata zones` | List and inspect your Bright Data proxy zones | |
|
50 | 52 | - [discover](#discover) |
51 | 53 | - [scraper create](#scraper-create) |
52 | 54 | - [scraper run](#scraper-run) |
| 55 | + - [scraper heal](#scraper-heal) |
| 56 | + - [scraper approve](#scraper-approve) |
53 | 57 | - [pipelines](#pipelines) |
54 | 58 | - [browser](#browser) |
55 | 59 | - [status](#status) |
@@ -475,6 +479,105 @@ brightdata scraper run c_mp3tuab31lswoxvpws --input-file urls.json |
475 | 479 |
|
476 | 480 | --- |
477 | 481 |
|
| 482 | +### `scraper heal` |
| 483 | + |
| 484 | +Fix an existing scraper **in place** when it ran but returned wrong, empty, or partial data. The `collector_id` stays the same — the scraper is improved, not replaced. This is the maintenance twin of `scraper create`: it triggers Bright Data's AI self-healing flow (`POST /dca/collectors/{id}/refactor_template`), then polls progress. |
| 485 | + |
| 486 | +```bash |
| 487 | +brightdata scraper heal <collector_id> "<prompt>" [options] |
| 488 | +``` |
| 489 | + |
| 490 | +**You are the detector.** The CLI never decides on its own that a scraper is broken — you inspect the run output and decide. The `<prompt>` is required (max 1000 chars); name exactly what is wrong and what the correct output should be. Vague prompts produce vague heals. |
| 491 | + |
| 492 | +| Flag | Description | |
| 493 | +|---|---| |
| 494 | +| `--url <url>` | Verify target woven into the success `next_step` hint (not sent to the heal call) | |
| 495 | +| `--auto-approve` | When the heal hits the approval gate, approve it automatically and poll through to `done` (default: stop and let you review) | |
| 496 | +| `--timeout <seconds>` | Polling timeout (default: `600`) | |
| 497 | +| `--max-retries <n>` | Max retries on the AI-Flow concurrent-job-cap `429` (default: `4`) | |
| 498 | +| `--no-retry` | Fail immediately on `429` instead of waiting through the cap | |
| 499 | +| `-o, --output <path>` | Write output to file | |
| 500 | +| `--json` / `--pretty` | JSON output (raw / indented) | |
| 501 | +| `--legacy-output` | Emit the bare AI-progress payload instead of the envelope | |
| 502 | +| `--timing` | Show request timing | |
| 503 | +| `-k, --api-key <key>` | Override API key | |
| 504 | + |
| 505 | +**The approval gate** |
| 506 | + |
| 507 | +Self-healing is human-in-the-loop. Without `--auto-approve`, `heal` runs the fix and then **stops at an approval gate** rather than committing it, exiting `0` with a `status: "awaiting_approval"` envelope: |
| 508 | + |
| 509 | +```json |
| 510 | +{ |
| 511 | + "collector_id": "c_mp3tuab31lswoxvpws", |
| 512 | + "status": "awaiting_approval", |
| 513 | + "prompt": "Price returns null — the selector moved …", |
| 514 | + "preview_result": [ { "title": "…", "price": { "value": 51.77, "currency": "GBP" } }, … ], |
| 515 | + "diff_summary": "proposed template has 1 step(s) — review at view_url", |
| 516 | + "view_url": "https://brightdata.com/cp/scrapers/c_mp3tuab31lswoxvpws", |
| 517 | + "next_step": "bdata scraper approve c_mp3tuab31lswoxvpws --url https://example.com/product/1" |
| 518 | +} |
| 519 | +``` |
| 520 | + |
| 521 | +`preview_result` shows the sample rows the fixed scraper would produce — review them, then run the `next_step` (`scraper approve`) to commit. `awaiting_approval` is **not** a failure; it means the fix is ready and waiting for your decision. A failed heal (`429` cap exhausted, timeout, terminal `failed`) is **non-destructive** — the existing scraper is unchanged and still works as before. |
| 522 | + |
| 523 | +**Examples** |
| 524 | + |
| 525 | +```bash |
| 526 | +# Heal a scraper, stop at the gate, and get a ready-to-run verify command back |
| 527 | +brightdata scraper heal c_mp3tuab31lswoxvpws \ |
| 528 | + "The price field returns null — the selector moved into a span with \ |
| 529 | + data-testid. Capture price and currency again." \ |
| 530 | + --url https://example.com/product/1 --pretty -o heal.json |
| 531 | + |
| 532 | +# Fully autonomous: heal and approve in one command (no manual review) |
| 533 | +brightdata scraper heal c_mp3tuab31lswoxvpws \ |
| 534 | + "Reviews stopped extracting after the page redesign" --auto-approve |
| 535 | +``` |
| 536 | + |
| 537 | +--- |
| 538 | + |
| 539 | +### `scraper approve` |
| 540 | + |
| 541 | +Commit (or reject) a self-healing fix that `scraper heal` left **awaiting approval**. Calls `POST /dca/collectors/{id}/resume_automation_job`, then polls the refactor job to `done`. |
| 542 | + |
| 543 | +```bash |
| 544 | +brightdata scraper approve <collector_id> [options] |
| 545 | +``` |
| 546 | + |
| 547 | +| Flag | Description | |
| 548 | +|---|---| |
| 549 | +| `--reject` | Reject the proposed fix instead of approving it | |
| 550 | +| `--url <url>` | Verify target woven into the success `next_step` hint | |
| 551 | +| `--timeout <seconds>` | Polling timeout (default: `600`) | |
| 552 | +| `-o, --output <path>` | Write output to file | |
| 553 | +| `--json` / `--pretty` | JSON output (raw / indented) | |
| 554 | +| `--legacy-output` | Emit the bare AI-progress payload instead of the envelope | |
| 555 | +| `--timing` | Show request timing | |
| 556 | +| `-k, --api-key <key>` | Override API key | |
| 557 | + |
| 558 | +On success the job advances to `status: "done"` and the envelope hands back a `next_step` = `scraper run <id> <url>` so you can verify the committed fix. `--reject` discards the proposed fix (`status: "rejected"`) — re-run `scraper heal` with a sharper prompt to try again. If a heal needs multiple approvals, `approve` may stop at `awaiting_approval` again — just run it once more. |
| 559 | + |
| 560 | +**The self-healing loop** |
| 561 | + |
| 562 | +```bash |
| 563 | +# 1. Run and inspect the data |
| 564 | +brightdata scraper run c_mp3tuab31lswoxvpws https://example.com/product/1 --json -o out.json |
| 565 | + |
| 566 | +# 2. If the data is wrong, heal (stops at the approval gate) |
| 567 | +brightdata scraper heal c_mp3tuab31lswoxvpws \ |
| 568 | + "Price returns null — the selector moved; capture price + currency." \ |
| 569 | + --url https://example.com/product/1 --pretty -o heal.json |
| 570 | + |
| 571 | +# 3. Review heal.json's preview_result, then approve |
| 572 | +brightdata scraper approve c_mp3tuab31lswoxvpws \ |
| 573 | + --url https://example.com/product/1 --pretty -o approve.json |
| 574 | + |
| 575 | +# 4. Verify the committed fix |
| 576 | +brightdata scraper run c_mp3tuab31lswoxvpws https://example.com/product/1 --pretty |
| 577 | +``` |
| 578 | + |
| 579 | +--- |
| 580 | + |
478 | 581 | ### `pipelines` |
479 | 582 |
|
480 | 583 | Extract structured data from 40+ platforms using Bright Data's Web Scraper API. Triggers an async collection job, polls until ready, and returns results. |
|
0 commit comments