You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**In this lesson we'll use ChatGPT and a few commands to create an application for watching prices on an e-commerce website.**
12
9
13
10
---
@@ -57,7 +54,7 @@ Let's head to the [Download Node.js](https://nodejs.org/en/download) web page. Y
57
54
58
55

59
56
60
-
Now paste it as-is to your Terminal (macOS/Linux) or Command Prompt (Windows) and run it using the <kbd>↵</kbd> key. Once the installation finishes, you should see versions of Node.js and npm (another related tool) printed:
57
+
Now paste it as-is to your Terminal (macOS/Linux) or Command Prompt (Windows) and let it execute using the <kbd>↵</kbd> key. Once the installation finishes, you should see versions of Node.js and npm (another related tool) printed:
61
58
62
59
```text
63
60
...
@@ -120,7 +117,7 @@ apify run
120
117
🌱 Git repository initialized in 'warehouse-scraper'. You can now commit and push your Actor to Git.
121
118
```
122
119
123
-
Now that's a lot of output, but no worries, the important part is that we've successfully used a template to set up a new Actor project.
120
+
Now that's a lot of output, but no worries, the important part is that we've successfully used a template to set up a new Actor project!
124
121
125
122
A new directory `warehouse-scraper` has been created for us, with a variety of files and directories inside. The output instructs us to go to this new project directory, so let's do it:
INFO CheerioCrawler: Finished! Total 107 requests: 107 succeeded, 0 failed. {"terminal":true}
148
145
```
149
146
147
+
We're done with commands for now, but do not close the Terminal or Command Prompt window yet, as we'll soon need it again.
148
+
150
149
If you struggle to use the template wizard or to run the sample Actor, share this tutorial with [ChatGPT](https://chatgpt.com/), add any errors you've encountered, and see if it can help you debug the issue.
151
150
152
151
## Scraping products
153
152
154
-
<!--
155
-
Save it to the template, setup Node/npm environment, run it, get results. If the student gets stuck setting up Node/npm, they ask ChatGPT. Roughly explaining what the program does, establishing basic terms.
156
-
-->
153
+
Now we're ready to get our own scraper done. We'll open the `src` directory inside the Actor project and find a file called `main.js`.
154
+
155
+
We'll open it in a *plain text editor*. Every operating system contains one out of the box: For Windows it's Notepad, for macOS it's TextEdit, etc.
156
+
157
+
:::danger Avoid rich text editors
158
+
Do not use a *rich text editor*, such as Microsoft Word. They're great for documents aimed at humans with all their formatting and advanced features, but for editing code we'll be better off with a tool as straightforward as possible.
159
+
:::
160
+
161
+
In the editor, we can see JavaScript code. Let's select all the code and copy to our clipboard. Then we'll open a *new ChatGPT conversation* and start with a prompt like this:
162
+
163
+
```text
164
+
I'm building Apify Actor which will run on the Apify platform.
165
+
I need to modify sample template project so that it downloads
extracts all the products in Sales. The data should contain
168
+
the following information for each product:
169
+
170
+
- Product name
171
+
- Product detail page URL
172
+
- Price
173
+
174
+
Before the program ends, it should log how many products got collected.
175
+
Code of main.js follows. You'll reply with a code block containing
176
+
a new version of that file.
177
+
```
178
+
179
+
Use <kbd>Shift+↵</kbd> to add a few more empty lines and then paste the code from your clipboard. After submitting, the AI chat should return a large code block with a new version of `main.js`. We'll copy its contents. Now we'll go back to our text editor, and replace the original contents of `main.js` with the version of the file from ChatGPT.
157
180
158
-
:::note Course under construction
159
-
This section hasn't been written yet. Come later, please!
181
+
:::info Code and colors
182
+
Code is truly just a plain text, but some tools can display it colored. They analyze the code and display different parts of code in different colors so that human coders can better orientate in it. This is what ChatGPT does, so you'll see the code colored there. But the plain text editor you're using isn't really meant as a tool for coders, so it'll display the code just black and white. That's okay!
160
183
:::
161
184
162
-
## Scraping stock units
185
+
When we're done, we must not forget to *save the change* with <kbd>Ctrl+S</kbd> or, on macOS, <kbd>Cmd+S</kbd>. Now let's see if the new code works! To run our program, let's go back to the Terminal (macOS/Linux) or Command Prompt (Windows) and use the Apify CLI again:
186
+
187
+
```text
188
+
apify run
189
+
```
190
+
191
+
If we are lucky, the output should be similar to this:
192
+
193
+
```text
194
+
Run: npm run start
195
+
196
+
> warehouse-scraper@0.0.1 start
197
+
> node src/main.js
163
198
164
-
<!--
165
-
Prompt ChatGPT to modify the program so that it scrapes stock units. Technically, modifying the program like this proves to be cumbersome, but doable. Run the program again, get better results.
199
+
INFO System info {"apifyVersion":"3.6.0","apifyClientVersion":"2.22.2","crawleeVersion":"3.16.0","osType":"Darwin","nodeVersion":"v25.6.1"}
200
+
...
201
+
INFO CheerioCrawler: Starting the crawler.
202
+
INFO CheerioCrawler: Processing page: https://warehouse-theme-metal.myshopify.com/collections/sales
203
+
...
204
+
INFO CheerioCrawler: Finished!
205
+
INFO Total products collected: 24
206
+
```
166
207
167
-
Teaser: In one of the next lessons we'll get rid of copying and pasting and updating the files ourselves, but first, let's see how we can deploy the scraper and run it periodically.
168
-
-->
208
+
This particular output says `Total products collected: 24`. The Sales page displays 24 products per page, and contains 50 products in total.
169
209
170
-
:::note Course under construction
171
-
This section hasn't been written yet. Come later, please!
210
+
Depending on whether ChatGPT decided to walk through the pages or scrape just the first one, we might get 24 or more products, but for a start, any indication that it scrapes the products is good news!
211
+
212
+
:::caution Debugging
213
+
If we saw our program crashing instead, we'd have to copy any error message and send it to the conversation with ChatGPT to nail down the issue and get it working.
172
214
:::
215
+
216
+
## Exporting to CSV
217
+
218
+
Our program supposedly works, but we haven't seen the data yet. Let's add an export to CSV, which is a format which any data app can read, including Microsoft Excel, Google Sheets, or Numbers by Apple. Let's continue our conversation with ChatGPT:
219
+
220
+
```text
221
+
Before the program ends, I want it to export all data
222
+
as "dataset.csv" in the current working directory.
223
+
```
224
+
225
+
ChatGPT should return a new code block with the CSV export implemented. We'll replace the contents of `main.js` with it and again, we won't forget to save our changes. Only then, we'll re-run the scraper:
226
+
227
+
```text
228
+
apify run
229
+
```
230
+
231
+
In the project directory, a new file called `dataset.csv` should emerge. We can use any of the programs mentioned earlier to check what's inside:
232
+
233
+
| productName | productUrl | price |
234
+
|---|---|---|
235
+
| JBL Flip 4 Waterproof Portable Bluetooth Speaker |https://warehouse-theme-metal.myshopify.com/products/jbl-flip-4-waterproof-portable-bluetooth-speaker| Sale price$74.95 |
236
+
| Sony XBR-950G BRAVIA 4K HDR Ultra HD TV |https://warehouse-theme-metal.myshopify.com/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv| Sale priceFrom $1,398.00 |
237
+
| Sony SACS9 10" Active Subwoofer |https://warehouse-theme-metal.myshopify.com/products/sony-sacs9-10-inch-active-subwoofer| Sale price$158.00 |
238
+
239
+
…and so on. Looks good!
240
+
241
+
Well, does it? With more attention to detail, we can see that the prices include some text, which isn't exactly ideal. We'll need to improve this part in one of the next lessons. And we'll better improve our workflow as well, so that we don't have to copy and paste something all the time.
242
+
243
+
Despite a few flaws, we managed to create a first working prototype of an application for watching prices, with no coding knowledge. And with some minimal effort in command line, we've got something we can immediately to deploy to a platform where it can run regularly and reliably. In the next lesson we'll do exactly that.
0 commit comments