Skip to content

Commit b5f7f68

Browse files
authored
chore: test Academy exercises (#2097)
This is a proof of concept how we could test exercises in the academy: - Exercises target real world websites, so they can easily break without us knowing - We could test them e.g. weekly, so it's not too noisy - The solution should be as simple as possible, and capable running both JavaScript and Python - While working on the PoC I actually discovered one exercise which is broken 95% of time due to aggressive anti-scraping protections, so I changed it - The test can be executed with `bats -r --print-output-on-failure .` - Bats is a simple testing framework based on Bash, which allows to run arbitrary programs and evaluate their output - https://github.com/bats-core/bats-core Not sure yet how to get it inside the CI, but on macOS it's just `brew install bats` Todo: - [x] Implement Python exercise - [x] Implement JS exercise - [x] Implement GitHub Action - [x] Discuss with the team whether we want this - [x] Document the solution - [x] Port the rest of the exercises - [x] Change the YAML to do just crons <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Adds a monthly GitHub Action and Bats-based suite to run Academy JS/Python exercise solutions, embeds solutions into lessons, and updates docs/ignore files. > > - **CI**: > - Add monthly and manual workflow `.github/workflows/test-academy.yml` to run Academy exercises via Node (npm) and Python (uv). > - **Testing**: > - Introduce Bats-based tests for Academy exercises: `sources/academy/**/exercises/test.bats` (JS & Python) executing solutions and asserting outputs. > - New npm script `test:academy` and dev dependency `bats`. > - **Docs/Academy content**: > - Embed executable exercise solutions using `CodeBlock` + `!!raw-loader` across JS/Python lessons (`04–12`), replacing inline snippets. > - Update several exercises (e.g., switch example from AliExpress to LEGO) and add multiple new solution files (Cheerio/BeautifulSoup/Crawlee). > - **Repo**: > - Update `.gitignore` to exclude exercise artifacts (`storage`, `node_modules`, `package*.json`, `dataset.json`). > - Document testing process in `CONTRIBUTING.md` (broken links check + Academy exercises CI). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 118a550. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent 7068ad7 commit b5f7f68

61 files changed

Lines changed: 1332 additions & 932 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/test-academy.yml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
name: Test Academy
2+
3+
on:
4+
schedule:
5+
- cron: "0 3 1 * *" # at 3am UTC on 1st day of month
6+
workflow_dispatch: # allows running this workflow manually from the Actions tab
7+
8+
jobs:
9+
test-exercises:
10+
name: Test Academy Exercises
11+
runs-on: ubuntu-latest
12+
steps:
13+
- name: Checkout Source code
14+
uses: actions/checkout@v6
15+
16+
- name: Setup Node.js
17+
uses: actions/setup-node@v6
18+
with:
19+
cache: npm
20+
cache-dependency-path: package-lock.json
21+
22+
- name: Setup Python
23+
uses: astral-sh/setup-uv@v7
24+
25+
- name: Install Bats
26+
run: |
27+
corepack enable
28+
npm install --only=dev
29+
30+
- name: Test
31+
run: npm run test:academy

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,7 @@ codegen/*/generated/
2828
codegen/*/go.sum
2929
.github/styles/Microsoft
3030
.github/styles/write-good
31+
sources/academy/**/exercises/storage
32+
sources/academy/**/exercises/node_modules
33+
sources/academy/**/exercises/package*.json
34+
sources/academy/**/exercises/dataset.json

CONTRIBUTING.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -335,6 +335,12 @@ Add languages by adding new folders at the appropriate path level.
335335
- Run `vale sync` to download styles
336336
- Configure exceptions in `accepts.txt`
337337

338+
### Testing
339+
340+
- **Broken links**: [Periodic GitHub Action](.github/workflows/lychee.yml) checks broken links by [lychee](https://lychee.cli.rs/). If the Action fails, we manually fix the issues.
341+
342+
- **Academy exercises**: At the end of each lesson in the academy courses, there are exercises that target real-world websites. Each exercise includes a solution, stored as a separate file containing executable code. These files are included in the docs using the `!!raw-loader` syntax. Each course has a [Bats](https://bats-core.readthedocs.io/) test file named `test.bats`. The tests run each solution as a standalone program and verify that it produces output matching the expected results. A [periodic GitHub Action](.github/workflows/test-academy.yml) runs all these tests using `npm run test:academy`. If the Action fails, we rework the exercises.
343+
338344
## Pull request process
339345

340346
1. Follow [Conventional Commits](https://www.conventionalcommits.org/)

package-lock.json

Lines changed: 11 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
"lint:md:fix": "markdownlint '**/*.md' --fix",
4141
"lint:code": "eslint .",
4242
"lint:code:fix": "eslint . --fix",
43+
"test:academy": "bats --print-output-on-failure -r .",
4344
"postinstall": "patch-package",
4445
"postbuild": "node ./scripts/joinLlmsFiles.mjs && node ./scripts/indentLlmsFile.mjs"
4546
},
@@ -48,6 +49,7 @@
4849
"@apify/tsconfig": "^0.1.0",
4950
"@types/react": "^19.0.0",
5051
"babel-plugin-styled-components": "^2.1.4",
52+
"bats": "^1.13.0",
5153
"cross-env": "^10.0.0",
5254
"eslint": "^9.32.0",
5355
"eslint-plugin-react": "^7.37.5",

sources/academy/webscraping/scraping_basics_javascript/04_downloading_html.md

Lines changed: 6 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@ description: Lesson about building a Node.js application for watching prices. Us
55
slug: /scraping-basics-javascript/downloading-html
66
---
77

8+
import CodeBlock from '@theme/CodeBlock';
89
import LegacyJsCourseAdmonition from '@site/src/components/LegacyJsCourseAdmonition';
910
import Exercises from '../scraping_basics/_exercises.mdx';
11+
import LegoExercise from '!!raw-loader!roa-loader!./exercises/lego.mjs';
1012

1113
<LegacyJsCourseAdmonition />
1214

@@ -184,28 +186,17 @@ Letting our program visibly crash on error is enough for our purposes. Now, let'
184186

185187
<Exercises />
186188

187-
### Scrape AliExpress
189+
### Scrape LEGO
188190

189-
Download HTML of a product listing page, but this time from a real world e-commerce website. For example this page with AliExpress search results:
191+
Download HTML of a product listing page, but this time from a real world e-commerce website. For example this page with LEGO search results:
190192

191193
```text
192-
https://www.aliexpress.com/w/wholesale-darth-vader.html
194+
https://www.lego.com/en-us/themes/star-wars
193195
```
194196

195197
<details>
196198
<summary>Solution</summary>
197-
198-
```js
199-
const url = "https://www.aliexpress.com/w/wholesale-darth-vader.html";
200-
const response = await fetch(url);
201-
202-
if (response.ok) {
203-
console.log(await response.text());
204-
} else {
205-
throw new Error(`HTTP ${response.status}`);
206-
}
207-
```
208-
199+
<CodeBlock language="js">{LegoExercise.code}</CodeBlock>
209200
</details>
210201

211202
### Save downloaded HTML as a file

sources/academy/webscraping/scraping_basics_javascript/05_parsing_html.md

Lines changed: 5 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,11 @@ description: Lesson about building a Node.js application for watching prices. Us
55
slug: /scraping-basics-javascript/parsing-html
66
---
77

8+
import CodeBlock from '@theme/CodeBlock';
89
import LegacyJsCourseAdmonition from '@site/src/components/LegacyJsCourseAdmonition';
910
import Exercises from '../scraping_basics/_exercises.mdx';
11+
import F1AcademyTeamsExercise from '!!raw-loader!roa-loader!./exercises/f1academy_teams.mjs';
12+
import F1AcademyDriversExercise from '!!raw-loader!roa-loader!./exercises/f1academy_drivers.mjs';
1013

1114
<LegacyJsCourseAdmonition />
1215

@@ -183,22 +186,7 @@ https://www.f1academy.com/Racing-Series/Teams
183186

184187
<details>
185188
<summary>Solution</summary>
186-
187-
```js
188-
import * as cheerio from 'cheerio';
189-
190-
const url = "https://www.f1academy.com/Racing-Series/Teams";
191-
const response = await fetch(url);
192-
193-
if (response.ok) {
194-
const html = await response.text();
195-
const $ = cheerio.load(html);
196-
console.log($(".teams-driver-item").length);
197-
} else {
198-
throw new Error(`HTTP ${response.status}`);
199-
}
200-
```
201-
189+
<CodeBlock language="js">{F1AcademyTeamsExercise.code}</CodeBlock>
202190
</details>
203191

204192
### Scrape F1 Academy drivers
@@ -207,20 +195,5 @@ Use the same URL as in the previous exercise, but this time print a total count
207195

208196
<details>
209197
<summary>Solution</summary>
210-
211-
```js
212-
import * as cheerio from 'cheerio';
213-
214-
const url = "https://www.f1academy.com/Racing-Series/Teams";
215-
const response = await fetch(url);
216-
217-
if (response.ok) {
218-
const html = await response.text();
219-
const $ = cheerio.load(html);
220-
console.log($(".driver").length);
221-
} else {
222-
throw new Error(`HTTP ${response.status}`);
223-
}
224-
```
225-
198+
<CodeBlock language="js">{F1AcademyDriversExercise.code}</CodeBlock>
226199
</details>

sources/academy/webscraping/scraping_basics_javascript/06_locating_elements.md

Lines changed: 7 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,12 @@ description: Lesson about building a Node.js application for watching prices. Us
55
slug: /scraping-basics-javascript/locating-elements
66
---
77

8+
import CodeBlock from '@theme/CodeBlock';
89
import LegacyJsCourseAdmonition from '@site/src/components/LegacyJsCourseAdmonition';
910
import Exercises from '../scraping_basics/_exercises.mdx';
11+
import WikipediaCountriesExercise from '!!raw-loader!roa-loader!./exercises/wikipedia_countries.mjs';
12+
import WikipediaCountriesSingleSelectorExercise from '!!raw-loader!roa-loader!./exercises/wikipedia_countries_single_selector.mjs';
13+
import GuardianF1TitlesExercise from '!!raw-loader!roa-loader!./exercises/guardian_f1_titles.mjs';
1014

1115
<LegacyJsCourseAdmonition />
1216

@@ -238,36 +242,7 @@ Djibouti
238242

239243
<details>
240244
<summary>Solution</summary>
241-
242-
```js
243-
import * as cheerio from 'cheerio';
244-
245-
const url = "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Africa";
246-
const response = await fetch(url);
247-
248-
if (response.ok) {
249-
const html = await response.text();
250-
const $ = cheerio.load(html);
251-
252-
for (const tableElement of $(".wikitable").toArray()) {
253-
const $table = $(tableElement);
254-
const $rows = $table.find("tr");
255-
256-
for (const rowElement of $rows.toArray()) {
257-
const $row = $(rowElement);
258-
const $cells = $row.find("td");
259-
260-
if ($cells.length > 0) {
261-
const $thirdColumn = $($cells[2]);
262-
const $link = $thirdColumn.find("a").first();
263-
console.log($link.text());
264-
}
265-
}
266-
}
267-
} else {
268-
throw new Error(`HTTP ${response.status}`);
269-
}
270-
```
245+
<CodeBlock language="js">{WikipediaCountriesExercise.code}</CodeBlock>
271246

272247
Because some rows contain [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th), we skip processing a row if `table_row.select("td")` doesn't find any [table data](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td) cells.
273248

@@ -288,27 +263,7 @@ You may want to check out the following pages:
288263

289264
<details>
290265
<summary>Solution</summary>
291-
292-
```js
293-
import * as cheerio from 'cheerio';
294-
295-
const url = "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Africa";
296-
const response = await fetch(url);
297-
298-
if (response.ok) {
299-
const html = await response.text();
300-
const $ = cheerio.load(html);
301-
302-
for (const element of $(".wikitable tr td:nth-child(3)").toArray()) {
303-
const $nameCell = $(element);
304-
const $link = $nameCell.find("a").first();
305-
console.log($link.text());
306-
}
307-
} else {
308-
throw new Error(`HTTP ${response.status}`);
309-
}
310-
```
311-
266+
<CodeBlock language="js">{WikipediaCountriesSingleSelectorExercise.code}</CodeBlock>
312267
</details>
313268

314269
### Scrape F1 news
@@ -330,23 +285,5 @@ Max Verstappen wins Canadian Grand Prix: F1 – as it happened
330285

331286
<details>
332287
<summary>Solution</summary>
333-
334-
```js
335-
import * as cheerio from 'cheerio';
336-
337-
const url = "https://www.theguardian.com/sport/formulaone";
338-
const response = await fetch(url);
339-
340-
if (response.ok) {
341-
const html = await response.text();
342-
const $ = cheerio.load(html);
343-
344-
for (const element of $("#maincontent ul li h3").toArray()) {
345-
console.log($(element).text());
346-
}
347-
} else {
348-
throw new Error(`HTTP ${response.status}`);
349-
}
350-
```
351-
288+
<CodeBlock language="js">{GuardianF1TitlesExercise.code}</CodeBlock>
352289
</details>

0 commit comments

Comments
 (0)