danieldotnl · danieldotnl · May 13, 2026 · May 13, 2026 · May 13, 2026 · coderabbitai
diff --git a/README.md b/README.md
@@ -1,7 +1,9 @@
 # Multiscrape
 
 ---
+
 > [!TIP]
+>
 > ## 👋 A quick note
 >
 > I run **[Smart Home Newsletter](https://smarthomenewsletter.com/?utm_source=github&utm_medium=readme&utm_campaign=multiscrape)** — a weekly curated digest for smart home enthusiasts.
@@ -11,9 +13,8 @@
 > Since you're here, you're clearly into home automation — so you might genuinely enjoy the newsletter.
 >
 > 👉 [Subscribe at smarthomenewsletter.com](https://smarthomenewsletter.com/?utm_source=github&utm_medium=readme&utm_campaign=multiscrape)
->
----
 
+---
 
 [![GitHub Release][releases-shield]][releases]
 [![License][license-shield]](LICENSE)
@@ -33,11 +34,12 @@
 ## Need help with Multiscrape?
 
 ### Personal (paid) support option
+
 I very often get asked for help, for example with finding the right CSS selectors or with a login. Actually more often than I can handle, so I'm running an experiment with a paid support option!
 
 **Sponsor me [here](https://github.com/sponsors/danieldotnl/sponsorships?tier_id=432422), and I'll try to assist you with your `multiscrape` configuration within 1-2 days.** The support funds will go towards family time, making up for the hours I spend on Home Assistant ☺️.
 
-**Note:** Scraping isn't always possible. I'd love to offer a "no cure, no pay" service, but GitHub Sponsoring doesn't support that. If you're concerned about sponsoring without guarentee, please reach out by email before sponsoring!
+**Note:** Scraping isn't always possible. I'd love to offer a "no cure, no pay" service, but GitHub Sponsoring doesn't support that. If you're concerned about sponsoring without guarantee, please reach out by email before sponsoring!
 
 ### Other options
 
@@ -67,7 +69,8 @@ It is based on both the existing [Rest sensor](https://www.home-assistant.io/int
 Install via HACS (default store) or install manually by copying the files in a new 'custom_components/multiscrape' directory.
 
 ## Example configuration (YAML)
-*This code example is to be placed into /config/configuration.yaml*
+
+_This code example is to be placed into /config/configuration.yaml_
 
 ```yaml
 multiscrape:
@@ -100,17 +103,21 @@ multiscrape:
             select: ".release-date"
             attribute: href
 ```
+
 ### Advanced Example Configuration (YAML)
+
 For background on splitting the HA configuration, see the [HA Documentation](https://www.home-assistant.io/docs/configuration/splitting_configuration/).
 
-*Inside the configuration.yaml file*
+_Inside the configuration.yaml file_
+
 ```yaml
 multiscrape: !include multiscrape.yaml
 ```
 
 Make a new file named /config/multiscrape.yaml
 
-*Inside the multiscrape.yaml file. Syntax is the same but starting at the resource level*
+_Inside the multiscrape.yaml file. Syntax is the same but starting at the resource level_
+
 ```yaml
 - resource: https://www.home-assistant.io
   scan_interval: 3600
@@ -145,28 +152,28 @@ Make a new file named /config/multiscrape.yaml
 
 Based on latest (pre) release.
 
-| name              | description                                                                                                               | required | default | type            |
-| ----------------- | ------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | --------------- |
-| name              | The name for the integration.                                                                                             | False    |         | string          |
-| resource          | The url for retrieving the site or a template that will output an url. Not required when `resource_template` is provided. | True     |         | string          |
-| resource_template | A template that will output an url after being rendered. Only required when `resource` is not provided.                   | True     |         | template        |
-| authentication    | Configure HTTP authentication. `basic` or `digest`. Use this with username and password fields.                           | False    |         | string          |
-| username          | The username for accessing the url.                                                                                       | False    |         | string          |
-| password          | The password for accessing the url.                                                                                       | False    |         | string          |
-| headers           | The headers for the requests.                                                                                             | False    |         | template - list |
-| params            | The query params for the requests.                                                                                        | False    |         | template - list |
-| method            | The method for the request. Either `POST` or `GET`.                                                                       | False    | GET     | string          |
+| name              | description                                                                                                               | required | default | type              |
+| ----------------- | ------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | ----------------- |
+| name              | The name for the integration.                                                                                             | False    |         | string            |
+| resource          | The url for retrieving the site or a template that will output an url. Not required when `resource_template` is provided. | True     |         | string            |
+| resource_template | A template that will output an url after being rendered. Only required when `resource` is not provided.                   | True     |         | template          |
+| authentication    | Configure HTTP authentication. `basic` or `digest`. Use this with username and password fields.                           | False    |         | string            |
+| username          | The username for accessing the url.                                                                                       | False    |         | string            |
+| password          | The password for accessing the url.                                                                                       | False    |         | string            |
+| headers           | The headers for the requests.                                                                                             | False    |         | template - list   |
+| params            | The query params for the requests.                                                                                        | False    |         | template - list   |
+| method            | The method for the request. Either `POST` or `GET`.                                                                       | False    | GET     | string            |
 | payload           | Optional payload to send with a POST request.                                                                             | False    |         | template - string |
-| verify_ssl        | Verify the SSL certificate of the endpoint.                                                                               | False    | True    | boolean         |
-| log_response      | Log the HTTP responses and HTML parsed by BeautifulSoup in files. (Will be written to/config/multiscrape/name_of_config)  | False    | False   | boolean         |
-| timeout           | Defines max time to wait data from the endpoint.                                                                          | False    | 10      | int             |
-| scan_interval     | Determines how often the url will be requested.                                                                           | False    | 60      | int             |
-| parser            | Determines the parser to be used with beautifulsoup. `lxml-xml` for xml recommended and `lxml` for everything else.       | False    | lxml    | string          |
-| list_separator    | Separator to be used in combination with `select_list` features.                                                          | False    | ,       | string          |
-| form_submit       | See [Form-submit](#form-submit)                                                                                           | False    |         |                 |
-| sensor            | See [Sensor](#sensorbinary-sensor)                                                                                        | False    |         | list            |
-| binary_sensor     | See [Binary sensor](#sensorbinary-sensor)                                                                                 | False    |         | list            |
-| button            | See [Refresh button](#refresh-button)                                                                                     | False    |         | list            |
+| verify_ssl        | Verify the SSL certificate of the endpoint.                                                                               | False    | True    | boolean           |
+| log_response      | Log the HTTP responses and HTML parsed by BeautifulSoup in files. (Will be written to/config/multiscrape/name_of_config)  | False    | False   | boolean           |
-| log_response      | Log the HTTP responses and HTML parsed by BeautifulSoup in files. (Will be written to/config/multiscrape/name_of_config)  | False    | False   | boolean           |
+| log_response      | Log the HTTP responses and HTML parsed by BeautifulSoup in files. (Will be written to /config/multiscrape/name_of_config) | False    | False   | boolean           |
-| log_response      | Log the HTTP responses and HTML parsed by BeautifulSoup in files. (Will be written to/config/multiscrape/name_of_config)  | False    | False   | boolean           |
+| log_response      | Log the HTTP responses and HTML parsed by BeautifulSoup in files. (Will be written to /config/multiscrape/name_of_config) | False    | False   | boolean           |
+| timeout           | Defines max time to wait data from the endpoint.                                                                          | False    | 10      | int               |
+| scan_interval     | Determines how often the url will be requested.                                                                           | False    | 60      | int               |
+| parser            | Determines the parser to be used with beautifulsoup. `lxml-xml` for xml recommended and `lxml` for everything else.       | False    | lxml    | string            |
+| list_separator    | Separator to be used in combination with `select_list` features.                                                          | False    | ,       | string            |
+| form_submit       | See [Form-submit](#form-submit)                                                                                           | False    |         |                   |
+| sensor            | See [Sensor](#sensorbinary-sensor)                                                                                        | False    |         | list              |
+| binary_sensor     | See [Binary sensor](#sensorbinary-sensor)                                                                                 | False    |         | list              |
+| button            | See [Refresh button](#refresh-button)                                                                                     | False    |         | list              |
 
 ### Sensor/Binary Sensor
 
@@ -273,7 +280,7 @@ Configure what should happen in case of a scraping error (the css selector does
 For each multiscrape instance, a service will be created to trigger a scrape run through an automation. (For manual triggering, the button entity can now be configured.)
 The services are named `multiscrape.trigger_{name of integration}`.
 
-Multiscrape also offers a `get_content` and a `scrape` service. `get_content` retrieves the content of the website you want to scrape. It shows the same data for which you now need to enable `log_response` and open the page_soup.txt file.\
+Multiscrape also offers a `get_content` and a `scrape` service. `get_content` retrieves the content of the website you want to scrape. It shows the same data for which you now need to enable `log_response` and open the `page_soup.txt` file (or `page_json.txt` when the response is JSON).\
 `scrape` does what it says. It scrapes a website and provides the sensors and attributes.
 
 Both services accept the same configuration as what you would provide in your configuration yaml (what is described above), with a small but important caveat: if the service input contains templates, those are automatically parsed by home assistant when the service is being called. That is fine for templates like `resource` and `select`, but templates that need to be applied on the scraped data itself (like `value_template`), cannot be parsed when the service is called. Therefore you need to slightly alter the syntax and add a `!` in the middle. E.g. `{{` becomes `{!{` and `%}` becomes `%!}`. Multiscrape will then understand that this string needs to handled as a template after the service has been called.\

diff --git a/custom_components/multiscrape/parsers.py b/custom_components/multiscrape/parsers.py
@@ -1,6 +1,8 @@
 """Content parsers for multiscrape using the Strategy pattern."""
+
 from __future__ import annotations
 
+import json
 import logging
 from abc import ABC, abstractmethod
 from typing import Any
@@ -55,8 +57,13 @@ async def parse(self, content: str, hass: Any) -> BeautifulSoup:
         )
 
 
-class JsonDetector(ContentParser):
-    """Detects JSON content. Does not parse it (JSON uses value_template only)."""
+class JsonParser(ContentParser):
+    """Parse JSON content into a Python structure.
+
+    Values are typically extracted via Jinja value_template (the canonical
+    Home Assistant pattern); the parsed structure is used for pretty-printing
+    and file logging.
+    """
 
     @property
     def name(self) -> str:
@@ -68,9 +75,9 @@ def can_parse(self, content: str) -> bool:
         content_stripped = content.lstrip() if content else ""
         return bool(content_stripped) and content_stripped[0] in ("{", "[")
 
-    async def parse(self, content: str, hass: Any) -> None:
-        """JSON is not parsed into a queryable structure."""
-        return None
+    async def parse(self, content: str, hass: Any) -> dict | list:
+        """Parse JSON content. Raises json.JSONDecodeError on malformed input."""
+        return await hass.async_add_executor_job(json.loads, content)
 
 
 class ParserFactory:
@@ -79,7 +86,7 @@ class ParserFactory:
     def __init__(self, parser_name: str):
         """Initialize with the HTML parser name."""
         self._parsers: list[ContentParser] = [
-            JsonDetector(),
+            JsonParser(),
             HtmlParser(parser_name),
         ]
 

diff --git a/custom_components/multiscrape/scraper.py b/custom_components/multiscrape/scraper.py
@@ -1,11 +1,13 @@
 """Support for multiscrape requests."""
+
+import json
 import logging
 
 from bs4 import BeautifulSoup
 
 from .const import CONF_PARSER, CONF_SEPARATOR
 from .extractors import ValueExtractor
-from .parsers import JsonDetector, ParserFactory
+from .parsers import JsonParser, ParserFactory
 from .scrape_context import ScrapeContext
 
 DEFAULT_TIMEOUT = 10
@@ -64,22 +66,32 @@ def reset(self):
 
     @property
     def formatted_content(self):
-        """Property for getting the content. HTML will be prettified."""
+        """Return the content for display: HTML prettified, JSON pretty-printed, or raw."""
         if self._soup:
             return self._soup.prettify()
+        if self._is_json and self._data:
+            try:
+                return json.dumps(
+                    json.loads(self._data), indent=2, ensure_ascii=False
+                )
+            except (json.JSONDecodeError, RecursionError):
+                # Detected as JSON-shaped but unparsable — fall back to raw.
+                return self._data
         return self._data
 
     async def set_content(self, content):
         """Set the content to be scraped."""
         self._data = content
         parser = self._parser_factory.get_parser(content)
 
-        if isinstance(parser, JsonDetector):
+        if isinstance(parser, JsonParser):
             _LOGGER.debug(
-                "%s # Response seems to be json. Skip parsing with BeautifulSoup.",
+                "%s # Response detected as JSON; skipping BeautifulSoup parsing.",
                 self._config_name,
             )
             self._is_json = True
+            if self._file_manager:
+                await self._async_file_log("page_json", self.formatted_content)
             return
 
         try:
@@ -101,7 +113,9 @@ async def set_content(self, content):
             )
             raise
 
-    def scrape(self, selector, sensor, attribute=None, context: ScrapeContext | None = None):
+    def scrape(
+        self, selector, sensor, attribute=None, context: ScrapeContext | None = None
+    ):
         """Scrape based on given selector the data."""
         if context is None:
             context = ScrapeContext.empty()
@@ -123,25 +137,22 @@ def scrape(self, selector, sensor, attribute=None, context: ScrapeContext | None
         value = self._extract_value(selector, log_prefix)
 
         if value is not None and selector.value_template is not None:
-            _LOGGER.debug(
-                "%s # Applying value_template on selector result", log_prefix)
+            _LOGGER.debug("%s # Applying value_template on selector result", log_prefix)
             render_ctx = context.with_current_value(value)
             value = selector.value_template.async_render(
                 variables=render_ctx.to_template_variables(), parse_result=True
             )
 
         _LOGGER.debug(
-            "%s # Final selector value: %s of type %s", log_prefix, value, type(
-                value)
+            "%s # Final selector value: %s of type %s", log_prefix, value, type(value)
         )
         return value
 
     def _extract_value(self, selector, log_prefix):
         """Delegate extraction to ValueExtractor."""
         if selector.is_list:
             tags = self._soup.select(selector.list)
-            _LOGGER.debug("%s # List selector selected tags: %s",
-                          log_prefix, tags)
+            _LOGGER.debug("%s # List selector selected tags: %s", log_prefix, tags)
             return self._extractor.extract_list(tags, selector)
         else:
             tag = self._soup.select_one(selector.element)