Skip to content

Commit c57bedc

Browse files
committed
fix: fix image URLs in scraper tutorials
1 parent 56a1307 commit c57bedc

4 files changed

Lines changed: 23 additions & 23 deletions

File tree

sources/academy/tutorials/apify_scrapers/cheerio_scraper.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -47,14 +47,14 @@ Before we start, let's do a quick recap of the data we chose to scrape:
4747
5. **Last modification date** - When the actor was last modified.
4848
6. **Number of runs** - How many times the actor was run.
4949

50-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/scraping-practice.jpg)
50+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/scraping-practice.webp)
5151

5252
We've already scraped number 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started)
5353
tutorial, so let's get to the next one on the list: title.
5454

5555
### [](#title) Title
5656

57-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/title.jpg)
57+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/title.webp)
5858

5959
By using the element selector tool, we find out that the title is there under an `<h1>` tag, as titles should be.
6060
Maybe surprisingly, we find that there are actually two `<h1>` tags on the detail page. This should get us thinking.
@@ -81,7 +81,7 @@ Getting the actor's description is a little more involved, but still pretty stra
8181
there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the actor description is nested within
8282
the `<header>` element too, same as the title. Moreover, the actual description is nested inside a `<span>` tag with a class `actor-description`.
8383

84-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/description.jpg)
84+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/description.webp)
8585

8686
```js
8787
return {
@@ -94,7 +94,7 @@ return {
9494

9595
The DevTools tell us that the `modifiedDate` can be found in a `<time>` element.
9696

97-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/modified-date.jpg)
97+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/modified-date.webp)
9898

9999
```js
100100
return {
@@ -258,7 +258,7 @@ the Network tab of the Chrome DevTools.
258258
We want to know what happens when we click the **Show more** button, so we open the DevTools **Network** tab and clear it.
259259
Then we click the **Show more** button and wait for incoming requests to appear in the list.
260260

261-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/inspect-network.jpg)
261+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/inspect-network.webp)
262262

263263
Now, this is interesting. It seems that we've only received two images after clicking the button and no additional
264264
data. This means that the data about actors must already be available in the page and the **Show more** button only displays it. This is good news.
@@ -271,7 +271,7 @@ few hits do not provide any interesting information, but in the end, we find our
271271
with the ID `__NEXT_DATA__` that seems to hold a lot of information about Web Scraper. In DevTools,
272272
you can right click an element and click **Store as global variable** to make this element available in the **Console**.
273273

274-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/find-data.jpg)
274+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/find-data.webp)
275275

276276
A `temp1` variable is now added to your console. We're mostly interested in its contents and we can get that using
277277
the `temp1.textContent` property. You can see that it's a rather large JSON string. How do we know?
@@ -285,7 +285,7 @@ const data = JSON.parse(temp1.textContent);
285285
After entering the above command into the console, we can inspect the `data` variable and see that all the information
286286
we need is there, in the `data.props.pageProps.items` array. Great!
287287

288-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/inspect-data.jpg)
288+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/inspect-data.webp)
289289

290290
> It's obvious that all the information we set to scrape is available in this one data object,
291291
so you might already be wondering, can I just make one request to the store to get this JSON

sources/academy/tutorials/apify_scrapers/getting_started.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Depending on how you arrived at this tutorial, you may already have your first t
2727

2828
> This tutorial covers the use of **Web**, **Cheerio**, and **Puppeteer** scrapers, but a lot of the information here can be used with all actors. For this tutorial, we will select **Web Scraper**.
2929
30-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/actor-selection.jpg)
30+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/actor-selection.webp)
3131

3232
### [](#running-a-task) Running a task
3333

@@ -47,15 +47,15 @@ After clicking **Save & Run**, the window will change to the run detail. Here, y
4747
4848
Now that the run has `SUCCEEDED`, click on the glowing **Results** card to see the scrape's results. This takes you to the **Dataset** tab, where you can display or download the results in various formats. For now, just click the **Preview** button. Voila, the scraped data!
4949

50-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/the-run-detail.jpg)
50+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/the-run-detail.webp)
5151

5252
Good job! We've run our first task and got some results. Let's learn how to change the default configuration to scrape something more interesting than just the page's `<title>`.
5353

5454
## [](#creating-your-own-task) Creating your own task
5555

5656
Before we jump into the scraping itself, let's just have a quick look at the user interface that's available to us. Click on the task's name in the top-left corner to visit the task's configuration.
5757

58-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/task-name.jpg)
58+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/task-name.webp)
5959

6060
### [](#input) Input and options
6161

@@ -204,7 +204,7 @@ The DevTools window will pop up and display a lot of, perhaps unfamiliar, inform
204204

205205
You'll see that the Element tab jumps to the first `<title>` element of the current page and that the title is **Store · Apify**. It's always good practice to do your research using the DevTools before writing the `pageFunction` and running your task.
206206

207-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/using-devtools.jpg)
207+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/using-devtools.webp)
208208

209209
> For the sake of brevity, we won't go into the details of using the DevTools in this tutorial. If you're just starting out with DevTools, this [Google tutorial](https://developers.google.com/web/tools/chrome-devtools/) is a good place to begin.
210210

sources/academy/tutorials/apify_scrapers/puppeteer_scraper.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -63,14 +63,14 @@ Before we start, let's do a quick recap of the data we chose to scrape:
6363
5. **Last modification date** - When the actor was last modified.
6464
6. **Number of runs** - How many times the actor was run.
6565

66-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/scraping-practice.jpg)
66+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/scraping-practice.webp)
6767

6868
We've already scraped number 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started)
6969
tutorial, so let's get to the next one on the list: title.
7070

7171
### [](#title) Title
7272

73-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/title.jpg)
73+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/title.webp)
7474

7575
By using the element selector tool, we find out that the title is there under an `<h1>` tag, as titles should be.
7676
Maybe surprisingly, we find that there are actually two `<h1>` tags on the detail page. This should get us thinking.
@@ -107,7 +107,7 @@ Getting the actor's description is a little more involved, but still pretty stra
107107
there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the actor description is nested within
108108
the `<header>` element too, same as the title. Moreover, the actual description is nested inside a `<span>` tag with a class `actor-description`.
109109

110-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/description.jpg)
110+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/description.webp)
111111

112112
```js
113113
const title = await page.$eval(
@@ -129,7 +129,7 @@ return {
129129

130130
The DevTools tell us that the `modifiedDate` can be found in a `<time>` element.
131131

132-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/modified-date.jpg)
132+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/modified-date.webp)
133133

134134
```js
135135
const title = await page.$eval(
@@ -415,7 +415,7 @@ div.show-more > button
415415

416416
> Don't forget to confirm our assumption in the DevTools finder tool (CTRL/CMD + F).
417417

418-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/waiting-for-the-button.jpg)
418+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/waiting-for-the-button.webp)
419419

420420
Now that we know what to wait for, we just plug it into the `waitFor()` function.
421421

@@ -568,7 +568,7 @@ through all the actors and then scrape all of their data. After it succeeds, ope
568568
You've successfully scraped Apify Store. And if not, no worries, just go through the code examples again,
569569
it's probably just some typo.
570570

571-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/plugging-it-into-the-pagefunction.jpg)
571+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/plugging-it-into-the-pagefunction.webp)
572572

573573
## [](#downloading-our-scraped-data) Downloading the scraped data
574574

sources/academy/tutorials/apify_scrapers/web_scraper.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,14 @@ Before we start, let's do a quick recap of the data we chose to scrape:
4545
5. **Last modification date** - When the actor was last modified.
4646
6. **Number of runs** - How many times the actor was run.
4747

48-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/scraping-practice.jpg)
48+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/scraping-practice.webp)
4949

5050
We've already scraped number 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started)
5151
tutorial, so let's get to the next one on the list: title.
5252

5353
### [](#title) Title
5454

55-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/title.jpg)
55+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/title.webp)
5656

5757
By using the element selector tool, we find out that the title is there under an `<h1>` tag, as titles should be.
5858
Maybe surprisingly, we find that there are actually two `<h1>` tags on the detail page. This should get us thinking.
@@ -79,7 +79,7 @@ Getting the actor's description is a little more involved, but still pretty stra
7979
there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the actor description is nested within
8080
the `<header>` element too, same as the title. Moreover, the actual description is nested inside a `<span>` tag with a class `actor-description`.
8181

82-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/description.jpg)
82+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/description.webp)
8383

8484
```js
8585
return {
@@ -92,7 +92,7 @@ return {
9292

9393
The DevTools tell us that the `modifiedDate` can be found in a `<time>` element.
9494

95-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/modified-date.jpg)
95+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/modified-date.webp)
9696

9797
```js
9898
return {
@@ -302,7 +302,7 @@ div.show-more > button
302302

303303
> Don't forget to confirm our assumption in the DevTools finder tool (CTRL/CMD + F).
304304
305-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/waiting-for-the-button.jpg)
305+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/waiting-for-the-button.webp)
306306

307307
Now that we know what to wait for, we just plug it into the `waitFor()` function.
308308

@@ -435,7 +435,7 @@ through all the actors and then scrape all of their data. After it succeeds, ope
435435
You've successfully scraped Apify Store. And if not, no worries, just go through the code examples again,
436436
it's probably just some typo.
437437

438-
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/plugging-it-into-the-pagefunction.jpg)
438+
![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/plugging-it-into-the-pagefunction.webp)
439439

440440
## [](#downloading-our-scraped-data) Downloading the scraped data
441441

0 commit comments

Comments
 (0)