|
1 | 1 | # Multiscrape |
2 | 2 |
|
3 | 3 | --- |
| 4 | + |
4 | 5 | > [!TIP] |
| 6 | +> |
5 | 7 | > ## 👋 A quick note |
6 | 8 | > |
7 | 9 | > I run **[Smart Home Newsletter](https://smarthomenewsletter.com/?utm_source=github&utm_medium=readme&utm_campaign=multiscrape)** — a weekly curated digest for smart home enthusiasts. |
|
11 | 13 | > Since you're here, you're clearly into home automation — so you might genuinely enjoy the newsletter. |
12 | 14 | > |
13 | 15 | > 👉 [Subscribe at smarthomenewsletter.com](https://smarthomenewsletter.com/?utm_source=github&utm_medium=readme&utm_campaign=multiscrape) |
14 | | -> |
15 | | ---- |
16 | 16 |
|
| 17 | +--- |
17 | 18 |
|
18 | 19 | [![GitHub Release][releases-shield]][releases] |
19 | 20 | [![License][license-shield]](LICENSE) |
|
33 | 34 | ## Need help with Multiscrape? |
34 | 35 |
|
35 | 36 | ### Personal (paid) support option |
| 37 | + |
36 | 38 | I very often get asked for help, for example with finding the right CSS selectors or with a login. Actually more often than I can handle, so I'm running an experiment with a paid support option! |
37 | 39 |
|
38 | 40 | **Sponsor me [here](https://github.com/sponsors/danieldotnl/sponsorships?tier_id=432422), and I'll try to assist you with your `multiscrape` configuration within 1-2 days.** The support funds will go towards family time, making up for the hours I spend on Home Assistant ☺️. |
39 | 41 |
|
40 | | -**Note:** Scraping isn't always possible. I'd love to offer a "no cure, no pay" service, but GitHub Sponsoring doesn't support that. If you're concerned about sponsoring without guarentee, please reach out by email before sponsoring! |
| 42 | +**Note:** Scraping isn't always possible. I'd love to offer a "no cure, no pay" service, but GitHub Sponsoring doesn't support that. If you're concerned about sponsoring without guarantee, please reach out by email before sponsoring! |
41 | 43 |
|
42 | 44 | ### Other options |
43 | 45 |
|
@@ -67,7 +69,8 @@ It is based on both the existing [Rest sensor](https://www.home-assistant.io/int |
67 | 69 | Install via HACS (default store) or install manually by copying the files in a new 'custom_components/multiscrape' directory. |
68 | 70 |
|
69 | 71 | ## Example configuration (YAML) |
70 | | -*This code example is to be placed into /config/configuration.yaml* |
| 72 | + |
| 73 | +_This code example is to be placed into /config/configuration.yaml_ |
71 | 74 |
|
72 | 75 | ```yaml |
73 | 76 | multiscrape: |
@@ -100,17 +103,21 @@ multiscrape: |
100 | 103 | select: ".release-date" |
101 | 104 | attribute: href |
102 | 105 | ``` |
| 106 | +
|
103 | 107 | ### Advanced Example Configuration (YAML) |
| 108 | +
|
104 | 109 | For background on splitting the HA configuration, see the [HA Documentation](https://www.home-assistant.io/docs/configuration/splitting_configuration/). |
105 | 110 |
|
106 | | -*Inside the configuration.yaml file* |
| 111 | +_Inside the configuration.yaml file_ |
| 112 | +
|
107 | 113 | ```yaml |
108 | 114 | multiscrape: !include multiscrape.yaml |
109 | 115 | ``` |
110 | 116 |
|
111 | 117 | Make a new file named /config/multiscrape.yaml |
112 | 118 |
|
113 | | -*Inside the multiscrape.yaml file. Syntax is the same but starting at the resource level* |
| 119 | +_Inside the multiscrape.yaml file. Syntax is the same but starting at the resource level_ |
| 120 | +
|
114 | 121 | ```yaml |
115 | 122 | - resource: https://www.home-assistant.io |
116 | 123 | scan_interval: 3600 |
@@ -145,28 +152,28 @@ Make a new file named /config/multiscrape.yaml |
145 | 152 |
|
146 | 153 | Based on latest (pre) release. |
147 | 154 |
|
148 | | -| name | description | required | default | type | |
149 | | -| ----------------- | ------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | --------------- | |
150 | | -| name | The name for the integration. | False | | string | |
151 | | -| resource | The url for retrieving the site or a template that will output an url. Not required when `resource_template` is provided. | True | | string | |
152 | | -| resource_template | A template that will output an url after being rendered. Only required when `resource` is not provided. | True | | template | |
153 | | -| authentication | Configure HTTP authentication. `basic` or `digest`. Use this with username and password fields. | False | | string | |
154 | | -| username | The username for accessing the url. | False | | string | |
155 | | -| password | The password for accessing the url. | False | | string | |
156 | | -| headers | The headers for the requests. | False | | template - list | |
157 | | -| params | The query params for the requests. | False | | template - list | |
158 | | -| method | The method for the request. Either `POST` or `GET`. | False | GET | string | |
| 155 | +| name | description | required | default | type | |
| 156 | +| ----------------- | ------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | ----------------- | |
| 157 | +| name | The name for the integration. | False | | string | |
| 158 | +| resource | The url for retrieving the site or a template that will output an url. Not required when `resource_template` is provided. | True | | string | |
| 159 | +| resource_template | A template that will output an url after being rendered. Only required when `resource` is not provided. | True | | template | |
| 160 | +| authentication | Configure HTTP authentication. `basic` or `digest`. Use this with username and password fields. | False | | string | |
| 161 | +| username | The username for accessing the url. | False | | string | |
| 162 | +| password | The password for accessing the url. | False | | string | |
| 163 | +| headers | The headers for the requests. | False | | template - list | |
| 164 | +| params | The query params for the requests. | False | | template - list | |
| 165 | +| method | The method for the request. Either `POST` or `GET`. | False | GET | string | |
159 | 166 | | payload | Optional payload to send with a POST request. | False | | template - string | |
160 | | -| verify_ssl | Verify the SSL certificate of the endpoint. | False | True | boolean | |
161 | | -| log_response | Log the HTTP responses and HTML parsed by BeautifulSoup in files. (Will be written to/config/multiscrape/name_of_config) | False | False | boolean | |
162 | | -| timeout | Defines max time to wait data from the endpoint. | False | 10 | int | |
163 | | -| scan_interval | Determines how often the url will be requested. | False | 60 | int | |
164 | | -| parser | Determines the parser to be used with beautifulsoup. `lxml-xml` for xml recommended and `lxml` for everything else. | False | lxml | string | |
165 | | -| list_separator | Separator to be used in combination with `select_list` features. | False | , | string | |
166 | | -| form_submit | See [Form-submit](#form-submit) | False | | | |
167 | | -| sensor | See [Sensor](#sensorbinary-sensor) | False | | list | |
168 | | -| binary_sensor | See [Binary sensor](#sensorbinary-sensor) | False | | list | |
169 | | -| button | See [Refresh button](#refresh-button) | False | | list | |
| 167 | +| verify_ssl | Verify the SSL certificate of the endpoint. | False | True | boolean | |
| 168 | +| log_response | Log the HTTP responses and HTML parsed by BeautifulSoup in files. (Will be written to/config/multiscrape/name_of_config) | False | False | boolean | |
| 169 | +| timeout | Defines max time to wait data from the endpoint. | False | 10 | int | |
| 170 | +| scan_interval | Determines how often the url will be requested. | False | 60 | int | |
| 171 | +| parser | Determines the parser to be used with beautifulsoup. `lxml-xml` for xml recommended and `lxml` for everything else. | False | lxml | string | |
| 172 | +| list_separator | Separator to be used in combination with `select_list` features. | False | , | string | |
| 173 | +| form_submit | See [Form-submit](#form-submit) | False | | | |
| 174 | +| sensor | See [Sensor](#sensorbinary-sensor) | False | | list | |
| 175 | +| binary_sensor | See [Binary sensor](#sensorbinary-sensor) | False | | list | |
| 176 | +| button | See [Refresh button](#refresh-button) | False | | list | |
170 | 177 |
|
171 | 178 | ### Sensor/Binary Sensor |
172 | 179 |
|
@@ -273,7 +280,7 @@ Configure what should happen in case of a scraping error (the css selector does |
273 | 280 | For each multiscrape instance, a service will be created to trigger a scrape run through an automation. (For manual triggering, the button entity can now be configured.) |
274 | 281 | The services are named `multiscrape.trigger_{name of integration}`. |
275 | 282 |
|
276 | | -Multiscrape also offers a `get_content` and a `scrape` service. `get_content` retrieves the content of the website you want to scrape. It shows the same data for which you now need to enable `log_response` and open the page_soup.txt file.\ |
| 283 | +Multiscrape also offers a `get_content` and a `scrape` service. `get_content` retrieves the content of the website you want to scrape. It shows the same data for which you now need to enable `log_response` and open the `page_soup.txt` file (or `page_json.txt` when the response is JSON).\ |
277 | 284 | `scrape` does what it says. It scrapes a website and provides the sensors and attributes. |
278 | 285 |
|
279 | 286 | Both services accept the same configuration as what you would provide in your configuration yaml (what is described above), with a small but important caveat: if the service input contains templates, those are automatically parsed by home assistant when the service is being called. That is fine for templates like `resource` and `select`, but templates that need to be applied on the scraped data itself (like `value_template`), cannot be parsed when the service is called. Therefore you need to slightly alter the syntax and add a `!` in the middle. E.g. `{{` becomes `{!{` and `%}` becomes `%!}`. Multiscrape will then understand that this string needs to handled as a template after the service has been called.\ |
|
0 commit comments