Context
Part of the Architecture Refactoring epic (see design doc, previously PR #535).
Problem
The current parser is tightly coupled to HTML/BeautifulSoup. Adding JSON API support (a frequently requested feature) requires significant surgery.
Proposed Solution
Use the ParserFactory and pluggable parser strategies introduced in #552:
class HtmlParser:
def parse(self, response: Response) -> HtmlContent: ...
class JsonParser:
def parse(self, response: Response) -> JsonContent: ...
class ParserFactory:
def get_parser(self, content_type: str) -> Parser: ...
Add JSONPath-like selector support in ValueExtractor alongside existing CSS selectors.
Dependencies
Benefits
- Native JSON API scraping without workarounds
- Clean extension point for future parsers (XML, CSV, etc.)
- CSS and JSONPath selectors work uniformly through
ValueExtractor
Context
Part of the Architecture Refactoring epic (see design doc, previously PR #535).
Problem
The current parser is tightly coupled to HTML/BeautifulSoup. Adding JSON API support (a frequently requested feature) requires significant surgery.
Proposed Solution
Use the
ParserFactoryand pluggable parser strategies introduced in #552:Add JSONPath-like selector support in
ValueExtractoralongside existing CSS selectors.Dependencies
Benefits
ValueExtractor