feat(scraper-create): auto-backoff on AI-Flow concurrent-job cap (429)#7
Open
anil-bd wants to merge 1 commit into
Open
feat(scraper-create): auto-backoff on AI-Flow concurrent-job cap (429)#7anil-bd wants to merge 1 commit into
anil-bd wants to merge 1 commit into
Conversation
Bright Data's AI Flow caps concurrent scraper-template generations
per account (currently 3, undocumented). When exceeded, the
automate_template POST returns:
429 Cannot run more than 3 jobs in parallel
Today the CLI maps every 429 to a single 500ms-base exponential
backoff capped at ~3.5s total — way too short for this case, since
freeing a slot takes 2-11 minutes (a full AI-Flow generation).
Users who launch ten parallel `scraper create` invocations see
seven of them fail within seconds, leaving seven half-built stub
collectors in the dashboard.
This change is the first half of PR-11 (audit's must-fix list).
The second half (programmatic stub cleanup via DELETE
/dca/collector/{id}) needs an API endpoint that doesn't yet exist
and has been forwarded to the product team.
Mechanism:
* src/utils/client.ts gains an optional `retry: Retry_config` field
on Request_opts. Callers can override max_attempts, base_ms,
max_ms, and supply an on_retry callback fired before each sleep.
Defaults to the existing short schedule when omitted, so every
other command (scrape, search, discover, pipelines, browser) is
unaffected.
* The new compute_backoff() implements full-jitter exponential
backoff: delay ∈ [exp/2, exp]. This spreads herds of concurrent
processes that all 429 on the same tick — without it, ten
processes would back off the same 30s and re-collide.
* src/commands/scraper.ts owns the AI-Flow-specific schedule
(base 30s, ceiling 240s, 4 attempts ≈ 7.5 min total max wait)
and passes it to the automate_template POST via the new
`retry` opt. Two new flags expose it:
--max-retries <n> override the count
--no-retry disable retries (fail-fast on 429)
* on_retry fires a stderr status line during each wait so the
user knows the CLI isn't hung:
"Hit AI-Flow concurrent-job cap (429). Waiting 32s before
retry 1/4..."
Non-429 transient errors get a generic line that names the
status code.
* On terminal failure paths that leave a half-built collector
(AI-trigger ultimately fails after retries, poll status != done,
polling exception), print_stub_recovery_note() writes a stderr
block pointing at the dashboard URL for the stub and explaining
that Bright Data does not yet expose programmatic deletion.
Composes with the PR-2 envelope, which also surfaces the
view_url in -o.
* The AI Scraper Studio vocabulary stays in the scraper command;
client.ts knows only the generic retry mechanism. Same
architectural boundary as PR-12 (per-command hints).
Tests:
* 4 unit tests for client.compute_backoff (exponential growth,
max_ms ceiling, full-jitter range distribution, default
constants).
* 6 unit tests for build_ai_trigger_retry (default schedule,
--max-retries override, --no-retry → max_attempts=0, on_retry
emits 429-specific line, on_retry emits generic transient line,
on_retry handles status=0 network error).
* 3 unit tests for parse_max_retries (default, non-negative
integers, rejects negatives/floats/non-numeric).
* 2 unit tests for print_stub_recovery_note (content + empty-id
guard).
* 6 command-level integration tests covering: retry config flows
to post, --max-retries respected, --no-retry → 0, stub-recovery
note emitted on AI-trigger failure, on poll status != done, and
on polling exception.
66 / 66 tests in affected files pass. The 9 pre-existing failures
in unrelated suites (daemon, add-mcp, browser, discover, scrape) on
main are unchanged.
What is NOT in this PR (split out as a follow-up server-side ask):
* Programmatic stub deletion (needs DELETE /dca/collector/{id}).
* Pre-emptive rejection at the template POST step when the user is
already at the cap (avoids stub creation entirely; cleaner than
client-side cleanup).
Both items are filed in the skills repo proposal
skills/scraper-studio/proposals/PR-11-backoff.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
/Bright Data's AI Flow caps concurrent scraper-template generations per account (currently 3, undocumented). When exceeded, the automate_template POST returns:
Today the CLI maps every 429 to a single 500ms-base exponential backoff capped at ~3.5s total — way too short for this case, since freeing a slot takes 2-11 minutes (a full AI-Flow generation). Users who launch ten parallel
scraper createinvocations see seven of them fail within seconds, leaving seven half-built stub collectors in the dashboard.This change is the first half of PR-11 (audit's must-fix list). The second half (programmatic stub cleanup via DELETE /dca/collector/{id}) needs an API endpoint that doesn't yet exist and has been forwarded to the product team.
Mechanism:
src/utils/client.ts gains an optional
retry: Retry_configfield on Request_opts. Callers can override max_attempts, base_ms, max_ms, and supply an on_retry callback fired before each sleep. Defaults to the existing short schedule when omitted, so every other command (scrape, search, discover, pipelines, browser) is unaffected.The new compute_backoff() implements full-jitter exponential backoff: delay ∈ [exp/2, exp]. This spreads herds of concurrent processes that all 429 on the same tick — without it, ten processes would back off the same 30s and re-collide.
src/commands/scraper.ts owns the AI-Flow-specific schedule (base 30s, ceiling 240s, 4 attempts ≈ 7.5 min total max wait) and passes it to the automate_template POST via the new
retryopt. Two new flags expose it: --max-retries override the count--no-retry disable retries (fail-fast on 429)
on_retry fires a stderr status line during each wait so the user knows the CLI isn't hung: "Hit AI-Flow concurrent-job cap (429). Waiting 32s before retry 1/4..." Non-429 transient errors get a generic line that names the status code.
On terminal failure paths that leave a half-built collector (AI-trigger ultimately fails after retries, poll status != done, polling exception), print_stub_recovery_note() writes a stderr block pointing at the dashboard URL for the stub and explaining that Bright Data does not yet expose programmatic deletion. Composes with the PR-2 envelope, which also surfaces the view_url in -o.
The AI Scraper Studio vocabulary stays in the scraper command; client.ts knows only the generic retry mechanism. Same architectural boundary as PR-12 (per-command hints).
Tests:
66 / 66 tests in affected files pass. The 9 pre-existing failures in unrelated suites (daemon, add-mcp, browser, discover, scrape) on main are unchanged.
What is NOT in this PR (split out as a follow-up server-side ask):
Both items are filed in the skills repo proposal
skills/scraper-studio/proposals/PR-11-backoff.md.