Skip to content

Commit 1bd628d

Browse files
committed
better writing
1 parent 8ec00a0 commit 1bd628d

1 file changed

Lines changed: 27 additions & 27 deletions

File tree

sources/academy/platform/scraping_with_apify_and_ai/01_creating_actor.md

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ slug: /scraping-with-apify-and-ai/creating-actor-with-ai-chat
55
unlisted: true
66
---
77

8-
**In this lesson we'll use ChatGPT and a few commands to create an application for watching prices on an e-commerce website.**
8+
**In this lesson, we'll use ChatGPT and a few commands to create an app for tracking prices on an e-commerce website.**
99

1010
---
1111

@@ -29,14 +29,14 @@ Try it! While the code generated will most likely work out of the box, the resul
2929

3030
Some are technical challenges:
3131

32-
- _No monitoring:_ Even if we knew how to setup a server or home installation so that our scraper runs regularly, we'd have little insight into whether it ran successfully, what errors or warnings occurred, how long it took, or what resources it used.
33-
- _Anti-scraping risks:_ If the target website detects our scraper, they can rate-limit or block us. Sure, we could run it from a coffee shop's Wi-Fi, but eventually, they'd block that too—risking seriously annoying our barista.
32+
- _No monitoring:_ Even if we knew how to set up a server or home installation so our scraper runs regularly, we'd have little insight into whether it ran successfully, what errors or warnings occurred, how long it took, or what resources it used.
33+
- _Anti-scraping risks:_ If the target website detects our scraper, they can rate-limit or block us. Sure, we could run it from a coffee shop's Wi-Fi, but eventually they'd block that too, and we'd seriously annoy our barista.
3434

3535
<!-- TODO START rewrite this paragraph, it's really bad -->
36-
To address all of these, we'll use the [Apify](https://apify.com/) platform, where it's possible to deploy any program, as far as it's structured as a so-called Actor. We'll thank ourselves later if we start our program as an Actor from the very beginning.
36+
To address all of this, we'll use [Apify](https://apify.com/), where we can deploy any program as long as it's structured as an Actor. We'll thank ourselves later if we start as an Actor from the beginning.
3737
<!-- TODO END rewrite this paragraph, it's really bad -->
3838

39-
First, we'll use a few commands to setup an Actor template, and then we'll prompt ChatGPT to generate the code necessary for scraping that Sales page.
39+
First, we'll use a few commands to set up an Actor template, and then we'll prompt ChatGPT to generate the code for scraping that Sales page.
4040

4141
:::info The Warehouse store
4242

@@ -46,7 +46,7 @@ In this course, we'll scrape a real e-commerce site instead of artificial playgr
4646

4747
## Installing Node.js
4848

49-
With AI we don't need to learn how to code to develop a scraper. AI will write the code for us. We still need to setup our environment to be able to run that code, though.
49+
With AI, we don't need to learn coding before we build a scraper. AI writes the code for us. We still need to set up our environment so we can run that code.
5050

5151
We'll develop our scraper in a mainstream programming language called JavaScript. To run command line programs written in JavaScript, we'll need a tool called Node.js.
5252

@@ -68,7 +68,7 @@ The exact version numbers are not really important. If you see the versions prin
6868

6969
## Installing Apify CLI
7070

71-
Now another thing we'll need is Apify CLI. It's a command line program, which works as a remote control for the Apiary platform. It'll also help us with structuring our scraper as an Actor, so that it can run on the platform.
71+
Now we'll install Apify CLI. It's a command-line tool that works as a remote control for the Apify platform. It also helps us structure our scraper as an Actor so it can run on the platform.
7272

7373
Apify CLI happens to be also made in JavaScript, so we can use the npm tool we just installed to get it on our computer:
7474

@@ -96,7 +96,7 @@ Now let's use the Apify CLI to help us kick off a new Actor:
9696
apify create warehouse-scraper
9797
```
9898

99-
It starts a wizard where you can choose from various options. For each option, only repeatedly use the <kbd>↵</kbd> key to confirm whatever is set as the first or default:
99+
It starts a wizard where you can choose from various options. For each option, press <kbd>↵</kbd> to accept the default:
100100

101101
```text
102102
✔ Choose the programming language of your new Actor: JavaScript
@@ -127,7 +127,7 @@ cd "warehouse-scraper"
127127

128128
Now we can run commands which control this new project. We didn't change the template in any way though, so it won't scrape the Warehouse store for us yet.
129129

130-
Out of the box, the template implements a sample Actor which walks through the [crawlee.dev](https://crawlee.dev/) website and downloads all of its pages. Such thing is called _crawling_, and Crawlee is a popular tool for crawling which this Actor internally uses. Let's see if it works for us:
130+
Out of the box, the template includes a sample Actor that walks through the [crawlee.dev](https://crawlee.dev/) website and downloads all its pages. This process is called _crawling_, and the sample Actor uses a crawling tool called Crawlee, so its documentation is chosen as a sample target website. Let's see if we can run it:
131131

132132
```text
133133
apify run
@@ -146,43 +146,43 @@ INFO CheerioCrawler: Finished! Total 107 requests: 107 succeeded, 0 failed. {"t
146146

147147
We're done with commands for now, but do not close the Terminal or Command Prompt window yet, as we'll soon need it again.
148148

149-
If you struggle to use the template wizard or to run the sample Actor, share this tutorial with [ChatGPT](https://chatgpt.com/), add any errors you've encountered, and see if it can help you debug the issue.
149+
If you run into issues with the template wizard or the sample Actor, share this tutorial with [ChatGPT](https://chatgpt.com/), include the errors you saw, and ask for help debugging.
150150

151151
## Scraping products
152152

153153
Now we're ready to get our own scraper done. We'll open the `src` directory inside the Actor project and find a file called `main.js`.
154154

155-
We'll open it in a _plain text editor_. Every operating system contains one out of the box: For Windows it's Notepad, for macOS it's TextEdit, etc.
155+
We'll open it in a _plain text editor_. Every operating system includes one: Notepad on Windows, TextEdit on macOS, and similar tools on Linux.
156156

157157
:::danger Avoid rich text editors
158-
Do not use a _rich text editor_, such as Microsoft Word. They're great for documents aimed at humans with all their formatting and advanced features, but for editing code we'll be better off with a tool as straightforward as possible.
158+
Do not use a _rich text editor_, such as Microsoft Word. They're great for human-readable documents with rich formatting, but for code editing, use either dedicated coding editors, or the simplest tool possible.
159159
:::
160160

161161
In the editor, we can see JavaScript code. Let's select all the code and copy to our clipboard. Then we'll open a _new ChatGPT conversation_ and start with a prompt like this:
162162

163163
```text
164-
I'm building Apify Actor which will run on the Apify platform.
165-
I need to modify sample template project so that it downloads
164+
I'm building an Apify Actor that will run on the Apify platform.
165+
I need to modify a sample template project so it downloads
166166
https://warehouse-theme-metal.myshopify.com/collections/sales,
167-
extracts all the products in Sales. The data should contain
167+
extracts all products in Sales, and returns data with
168168
the following information for each product:
169169
170170
- Product name
171171
- Product detail page URL
172172
- Price
173173
174-
Before the program ends, it should log how many products got collected.
175-
Code of main.js follows. You'll reply with a code block containing
174+
Before the program ends, it should log how many products it collected.
175+
Code from main.js follows. Reply with a code block containing
176176
a new version of that file.
177177
```
178178

179-
Use <kbd>Shift+↵</kbd> to add a few more empty lines and then paste the code from your clipboard. After submitting, the AI chat should return a large code block with a new version of `main.js`. We'll copy its contents. Now we'll go back to our text editor, and replace the original contents of `main.js` with the version of the file from ChatGPT.
179+
We'll use <kbd>Shift+↵</kbd> to add a few empty lines, then paste the code from our clipboard. After submitting, the AI chat should return a large code block with a new version of `main.js`. Copy it, go back to our text editor, and replace the original `main.js` content.
180180

181181
:::info Code and colors
182-
Code is truly just a plain text, but some tools can display it colored. They analyze the code and display different parts of code in different colors so that human coders can better orientate in it. This is what ChatGPT does, so you'll see the code colored there. But the plain text editor you're using isn't really meant as a tool for coders, so it'll display the code just black and white. That's okay!
182+
Code is plain text. Some tools color it to make it easier to read, and ChatGPT does this by default. Plain text editors usually show code in black and white, and that's completely fine.
183183
:::
184184

185-
When we're done, we must not forget to _save the change_ with <kbd>Ctrl+S</kbd> or, on macOS, <kbd>Cmd+S</kbd>. Now let's see if the new code works! To run our program, let's go back to the Terminal (macOS/Linux) or Command Prompt (Windows) and use the Apify CLI again:
185+
When we're done, we must not forget to _save the change_ with <kbd>Ctrl+S</kbd> or, on macOS, <kbd>Cmd+S</kbd>. Now let's see if the new code works. To run the program, let's go back to Terminal (macOS/Linux) or Command Prompt (Windows) and use Apify CLI again:
186186

187187
```text
188188
apify run
@@ -205,24 +205,24 @@ INFO CheerioCrawler: Finished!
205205
INFO Total products collected: 24
206206
```
207207

208-
This particular output says `Total products collected: 24`. The Sales page displays 24 products per page, and contains 50 products in total.
208+
This output says `Total products collected: 24`. The Sales page displays 24 products per page and contains 50 products in total.
209209

210-
Depending on whether ChatGPT decided to walk through the pages or scrape just the first one, we might get 24 or more products, but for a start, any indication that it scrapes the products is good news!
210+
Depending on whether ChatGPT decided to walk through all pages or scrape just the first one, you might get 24 or more products. For now, any sign that it scraped products is good news.
211211

212212
:::caution Debugging
213-
If we saw our program crashing instead, we'd have to copy any error message and send it to the conversation with ChatGPT to nail down the issue and get it working.
213+
If your program crashes instead, copy the error message, send it to your ChatGPT conversation, and ask for a fix.
214214
:::
215215

216216
## Exporting to CSV
217217

218-
Our program supposedly works, but we haven't seen the data yet. Let's add an export to CSV, which is a format which any data app can read, including Microsoft Excel, Google Sheets, or Numbers by Apple. Let's continue our conversation with ChatGPT:
218+
Our program likely works, but we haven't seen the data yet. Let's add a CSV export. CSV is a format most data apps can read, including Microsoft Excel, Google Sheets, and Apple Numbers. Continue your ChatGPT conversation with:
219219

220220
```text
221221
Before the program ends, I want it to export all data
222222
as "dataset.csv" in the current working directory.
223223
```
224224

225-
ChatGPT should return a new code block with the CSV export implemented. We'll replace the contents of `main.js` with it and again, we won't forget to save our changes. Only then, we'll re-run the scraper:
225+
ChatGPT should return a new code block with CSV export added. Let's replace `main.js` with that version and save our changes. Then let's run the scraper again:
226226

227227
```text
228228
apify run
@@ -238,6 +238,6 @@ In the project directory, a new file called `dataset.csv` should emerge. We can
238238

239239
…and so on. Looks good!
240240

241-
Well, does it? With more attention to detail, we can see that the prices include some text, which isn't exactly ideal. We'll need to improve this part in one of the next lessons. And we'll better improve our workflow as well, so that we don't have to copy and paste something all the time.
241+
Well, does it? If we look closely, the prices include extra text, which isn't ideal. We'll improve this in one of the next lessons. We'll also improve the workflow so we don't have to keep copying and pasting.
242242

243-
Despite a few flaws, we managed to create a first working prototype of an application for watching prices, with no coding knowledge. And with some minimal effort in command line, we've got something we can immediately to deploy to a platform where it can run regularly and reliably. In the next lesson we'll do exactly that.
243+
Despite a few flaws, we've successfully created a first working prototype of a price-watching app with no coding knowledge. And with a bit of extra command-line work, we now have something we can deploy to a platform where it can run regularly and reliably. In the next lesson, we'll do exactly that.

0 commit comments

Comments
 (0)