Skip to content

Extract Parser Strategy to enable native JSON API support #555

@danieldotnl

Description

@danieldotnl

Context

Part of the Architecture Refactoring epic (see design doc, previously PR #535).

Problem

The current parser is tightly coupled to HTML/BeautifulSoup. Adding JSON API support (a frequently requested feature) requires significant surgery.

Proposed Solution

Use the ParserFactory and pluggable parser strategies introduced in #552:

class HtmlParser:
    def parse(self, response: Response) -> HtmlContent: ...

class JsonParser:
    def parse(self, response: Response) -> JsonContent: ...

class ParserFactory:
    def get_parser(self, content_type: str) -> Parser: ...

Add JSONPath-like selector support in ValueExtractor alongside existing CSS selectors.

Dependencies

Benefits

  • Native JSON API scraping without workarounds
  • Clean extension point for future parsers (XML, CSV, etc.)
  • CSS and JSONPath selectors work uniformly through ValueExtractor

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitectural changes and refactoringepicLarge feature or refactor tracking multiple sub-issues

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions