Validated web extraction contracts as an MCP tool for Continue agents? #12446
Rashadamom
started this conversation in
Feedback
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am testing whether this is useful for Continue-style agent workflows:
https://github.com/manchittlab/TheCrawler
TheCrawler is a crawler/extractor MCP/CLI/Apify tool. The current wedge is validated extraction contracts: run a diagnostic first, see whether a URL is extract-ready, then extract against a schema with required-field and missing-field evidence.
I am not claiming it works on every site or bypasses blockers. In local validation, Rightmove, Apple, React.dev, and Framer returned useful output; G2 returned a structured 403/blocker instead of a false success.
Question for Continue users: would a web extraction tool be more useful if it returned readiness/blocker evidence before the agent spends tokens trying to extract?
If yes, I can run one public test URL through the diagnostic and post the report back here.
Beta Was this translation helpful? Give feedback.
All reactions