You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/platform/scraping_with_apify_and_ai/01_creating_actor.md
+14-12Lines changed: 14 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ unlisted: true
9
9
10
10
---
11
11
12
-
Want to get data about prices on [this Sales page](https://warehouse-theme-metal.myshopify.com/collections/sales)? Even without knowing how to code, you can open [ChatGPT](https://chatgpt.com/), type the following, and you'll have a scraper ready:
12
+
Want to get data about prices on [this Sales page](https://warehouse-theme-metal.myshopify.com/collections/sales)? Even without knowing how to code, we can open [ChatGPT](https://chatgpt.com/), type the following, and we'll have a scraper ready:
13
13
14
14
```text
15
15
Create a scraper in JavaScript which downloads
@@ -50,11 +50,11 @@ With AI, we don't need to learn coding before we build a scraper. AI writes the
50
50
51
51
We'll develop our scraper in a mainstream programming language called JavaScript. To run command line programs written in JavaScript, we'll need a tool called Node.js.
52
52
53
-
Let's head to the [Download Node.js](https://nodejs.org/en/download) web page. You should see a row of configuration dropdowns and a rather large code block below, with quite a few commands. Check if the website guessed your operating system correctly, and copy the whole block to the clipboard:
53
+
Let's head to the [Download Node.js](https://nodejs.org/en/download) web page. We should see a row of configuration dropdowns and a rather large code block below, with quite a few commands. Let's check if the website guessed our operating system correctly, then copy the whole block to the clipboard:
54
54
55
55

56
56
57
-
Now paste it as-is to your Terminal (macOS/Linux) or Command Prompt (Windows) and let it execute using the <kbd>↵</kbd> key. Once the installation finishes, you should see versions of Node.js and npm (another related tool) printed:
57
+
Now let's paste it as-is to our Terminal (macOS/Linux) or Command Prompt (Windows) and let it execute using the <kbd>↵</kbd> key. Once the installation finishes, we should see versions of Node.js and npm (another related tool) printed:
58
58
59
59
```text
60
60
...
@@ -64,7 +64,7 @@ $ npm -v
64
64
11.6.2
65
65
```
66
66
67
-
The exact version numbers are not really important. If you see the versions printed, it means we've successfully installed Node.js and npm.
67
+
The exact version numbers are not really important. If we see the versions printed, it means we've successfully installed Node.js and npm.
68
68
69
69
## Installing Apify CLI
70
70
@@ -96,7 +96,7 @@ Now let's use the Apify CLI to help us kick off a new Actor:
96
96
apify create warehouse-scraper
97
97
```
98
98
99
-
It starts a wizard where you can choose from various options. For each option, press <kbd>↵</kbd> to accept the default:
99
+
It starts a wizard where we can choose from various options. For each option, let's press <kbd>↵</kbd> to accept the default:
100
100
101
101
```text
102
102
✔ Choose the programming language of your new Actor: JavaScript
@@ -133,7 +133,7 @@ Out of the box, the template includes a sample Actor that walks through the [cra
133
133
apify run
134
134
```
135
135
136
-
If you see a flood of output mentioning something called `CheerioCrawler`, it means the template works and we can move on to editing its files so that it does what we want.
136
+
If we see a flood of output mentioning something called `CheerioCrawler`, it means the template works and we can move on to editing its files so that it does what we want.
137
137
138
138
```text
139
139
...
@@ -146,7 +146,9 @@ INFO CheerioCrawler: Finished! Total 107 requests: 107 succeeded, 0 failed. {"t
146
146
147
147
We're done with commands for now, but do not close the Terminal or Command Prompt window yet, as we'll soon need it again.
148
148
149
-
If you run into issues with the template wizard or the sample Actor, share this tutorial with [ChatGPT](https://chatgpt.com/), include the errors you saw, and ask for help debugging.
149
+
:::caution Debugging
150
+
If we run into issues with the template wizard or the sample Actor, let's share this tutorial with [ChatGPT](https://chatgpt.com/), include the errors we saw, and ask for help debugging.
151
+
:::
150
152
151
153
## Scraping products
152
154
@@ -155,7 +157,7 @@ Now we're ready to get our own scraper done. We'll open the `src` directory insi
155
157
We'll open it in a _plain text editor_. Every operating system includes one: Notepad on Windows, TextEdit on macOS, and similar tools on Linux.
156
158
157
159
:::danger Avoid rich text editors
158
-
Do not use a _rich text editor_, such as Microsoft Word. They're great for human-readable documents with rich formatting, but for code editing, use either dedicated coding editors, or the simplest tool possible.
160
+
Let's not use a _rich text editor_, such as Microsoft Word. They're great for human-readable documents with rich formatting, but for code editing, we'll use either dedicated coding editors, or the simplest tool possible.
159
161
:::
160
162
161
163
In the editor, we can see JavaScript code. Let's select all the code and copy to our clipboard. Then we'll open a _new ChatGPT conversation_ and start with a prompt like this:
@@ -188,7 +190,7 @@ When we're done, we must not forget to _save the change_ with <kbd>Ctrl+S</kbd>
188
190
apify run
189
191
```
190
192
191
-
If we are lucky, the output should be similar to this:
193
+
If all goes well, the output should be similar to this:
192
194
193
195
```text
194
196
Run: npm run start
@@ -207,15 +209,15 @@ INFO Total products collected: 24
207
209
208
210
This output says `Total products collected: 24`. The Sales page displays 24 products per page and contains 50 products in total.
209
211
210
-
Depending on whether ChatGPT decided to walk through all pages or scrape just the first one, you might get 24 or more products. For now, any sign that it scraped products is good news.
212
+
Depending on whether ChatGPT decided to walk through all pages or scrape just the first one, we might get 24 or more products. For now, any sign that it scraped products is good news.
211
213
212
214
:::caution Debugging
213
-
If your program crashes instead, copy the error message, send it to your ChatGPT conversation, and ask for a fix.
215
+
If our program crashes instead, let's copy the error message, send it to our ChatGPT conversation, and ask for a fix.
214
216
:::
215
217
216
218
## Exporting to CSV
217
219
218
-
Our program likely works, but we haven't seen the data yet. Let's add a CSV export. CSV is a format most data apps can read, including Microsoft Excel, Google Sheets, and Apple Numbers. Continue your ChatGPT conversation with:
220
+
Our program likely works, but we haven't seen the data yet. Let's add a CSV export. CSV is a format most data apps can read, including Microsoft Excel, Google Sheets, and Apple Numbers. Let's continue our ChatGPT conversation with:
219
221
220
222
```text
221
223
Before the program ends, I want it to export all data
0 commit comments