Skip to content

Ryan lippman solution#377

Closed
lippmanry wants to merge 3 commits into
serpapi:masterfrom
lippmanry:ryan_lippman_solution
Closed

Ryan lippman solution#377
lippmanry wants to merge 3 commits into
serpapi:masterfrom
lippmanry:ryan_lippman_solution

Conversation

@lippmanry

Copy link
Copy Markdown

Created parser solution to:

  • search for base64 in script tags and handle unicode characters/validate image
  • handle getting a header for the object (e.g. "artworks", "books")
  • deal with different page structures (html vs wp-grid-tile)
  • search and parse all .html files in /files
  • dynamically name .json output based on page title
  • output item object with name, extensions, link, image

Created tests to check:

  • item types
  • validate url structure
  • validate base64
  • validate year

Added 2 additional .html files to test parser:

  • chuck.html
  • monet.html

Output of parser is .json object:

  • chuck-wendig-books.json
  • monet-paintings.json
  • van-gogh-paintings.json

@trusche trusche closed this Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants