fix(scraper): replace misleading 403 hint for AI Scraper Studio errors#5
Open
anil-bd wants to merge 1 commit into
Open
fix(scraper): replace misleading 403 hint for AI Scraper Studio errors#5anil-bd wants to merge 1 commit into
anil-bd wants to merge 1 commit into
Conversation
When a `bdata scraper create` succeeds on the template POST but the
subsequent AI-trigger POST 429s (e.g. because the user hit the AI Flow
parallel-job cap), the half-built `collector_id` is still printed.
If that id is then passed to `bdata scraper run`, the API returns
403 + body `{"error":"Collector does not have a template"}`.
Today the CLI maps any 403 to a fixed hint:
Hint: Access denied. Check your zone permissions in the control
panel.
This sends the user 30+ minutes down a zone-permission rabbit hole
that has nothing to do with the actual problem (the AI Flow never
finished generating selectors for this collector). Observed multiple
times during stress testing.
This change is structured so the AI Scraper Studio error vocabulary
stays in the scraper command and does NOT leak into the shared HTTP
client. `scrape`, `search`, `discover`, `pipelines`, and `browser`
are unaffected.
Mechanism:
* `src/utils/client.ts` gains a generic `hints?: Body_hint[]` field
on `Request_opts`. The pure helper `pick_hint(status, body, hints)`
consults the caller's list first and falls back to the existing
`ERROR_HINTS` status-code map. The shared client ships ZERO
command-specific patterns.
* `src/commands/scraper.ts` defines `SCRAPER_BODY_HINTS` — two
patterns:
- /collector does not have a template/i → AI generation didn't
complete; re-run `scraper create`; web-UI URL for manual
recovery.
- /cannot run more than \d+ jobs in parallel/i → AI-Flow
concurrent-job cap; serialise launches.
Every `post`/`get` call in `handle_create_scraper`,
`handle_run_scraper`, and `run_batch` passes `hints:
SCRAPER_BODY_HINTS` so a 4xx from any of them is translated with
the right vocabulary.
* Real zone-permission 403s (any body that doesn't match the
scraper patterns) still get the original "Access denied" hint —
test 'does not consult ERROR_HINTS when an extra-hint pattern
matches' locks this in.
Tests: 8 unit tests for `client.pick_hint` using mock generic
patterns (covers mechanism + asserts the shared client carries no
scraper vocabulary in ERROR_HINTS), plus 5 scraper command tests
asserting the scraper patterns are well-formed and travel via
`hints` to client.post on every AI-Flow call. Two existing tests
relaxed from strict opts-object matches to objectContaining-style.
58 / 58 tests in the affected files pass. The 9 pre-existing
failures in unrelated suites (daemon, add-mcp, browser, discover,
scrape) on main are unchanged by this PR.
a6957fa to
5157b51
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a
bdata scraper createsucceeds on the template POST but the subsequent AI-trigger POST 429s (e.g. because the user hit the AI Flow parallel-job cap), the half-builtcollector_idis still printed. If that id is then passed tobdata scraper run, the API returns 403 + body{"error":"Collector does not have a template"}.Today the CLI maps any 403 to a fixed hint:
This sends the user 30+ minutes down a zone-permission rabbit hole that has nothing to do with the actual problem (the AI Flow never finished generating selectors for this collector). Observed multiple times during stress testing.
This change is structured so the AI Scraper Studio error vocabulary stays in the scraper command and does NOT leak into the shared HTTP client.
scrape,search,discover,pipelines, andbrowserare unaffected.Mechanism:
src/utils/client.tsgains a generichints?: Body_hint[]field onRequest_opts. The pure helperpick_hint(status, body, hints)consults the caller's list first and falls back to the existingERROR_HINTSstatus-code map. The shared client ships ZERO command-specific patterns.src/commands/scraper.tsdefinesSCRAPER_BODY_HINTStwo patterns:scraper create; web-UI URL for manual recovery.post/getcall inhandle_create_scraper,handle_run_scraper, andrun_batchpasseshints: SCRAPER_BODY_HINTSso a 4xx from any of them is translated with the right vocabulary.Real zone-permission 403s (any body that doesn't match the scraper patterns) still get the original "Access denied" hint — test 'does not consult ERROR_HINTS when an extra-hint pattern matches' locks this in.
Tests: 8 unit tests for
client.pick_hintusing mock generic patterns (covers mechanism + asserts the shared client carries no scraper vocabulary in ERROR_HINTS), plus 5 scraper command tests asserting the scraper patterns are well-formed and travel viahintsto client.post on every AI-Flow call. Two existing tests relaxed from strict opts-object matches to objectContaining-style. 58 / 58 tests in the affected files pass. The 9 pre-existing failures in unrelated suites (daemon, add-mcp, browser, discover, scrape) on main are unchanged by this PR.