Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
4954ee6
wip
honzajavorek Nov 18, 2025
b9c8cd9
feat: keep exercises as separate files, include them to Markdown
honzajavorek Nov 21, 2025
0125aae
chore: implement testing of JavaScript exercises
honzajavorek Nov 24, 2025
2401347
refactor: use shorter names
honzajavorek Nov 24, 2025
ac7e71a
chore: add GitHub Action to run tests automatically
honzajavorek Nov 24, 2025
26c9c2e
chore: ouch, wrong branch
honzajavorek Nov 24, 2025
390a890
chore: one does not simply npm install
honzajavorek Nov 24, 2025
44da821
style: make linter happy
honzajavorek Nov 24, 2025
ab61f89
chore: make sure there is no schedule until we merge this, add explan…
honzajavorek Nov 24, 2025
9dbdc11
refactor: simplify the tests
honzajavorek Nov 24, 2025
9ca5758
docs: document lychee and academy testing
honzajavorek Nov 24, 2025
c6fceb3
refactor: make exercises testable
honzajavorek Nov 25, 2025
9c589f3
fix: avoid the yes option, fix crawlee installation, improve readabil…
honzajavorek Nov 25, 2025
bd72f32
chore: make the tests more meaningful
honzajavorek Nov 25, 2025
d65a95e
chore: improve the JS test suite
honzajavorek Nov 25, 2025
820b1d4
style: make the code linter happy
honzajavorek Nov 25, 2025
f439180
style: condense and fix the solutions markup
honzajavorek Nov 25, 2025
d9182fd
chore: setup and teardown for Python
honzajavorek Nov 25, 2025
5fcc7a2
chore: fix the JS test suite not to rely on npx --package
honzajavorek Nov 25, 2025
1ad107f
fix: improve the Python test suite and fix solutions using Crawlee (s…
honzajavorek Nov 25, 2025
2f3cb5c
style: fix markup
honzajavorek Nov 25, 2025
3d9bbba
chore: fix typo
honzajavorek Nov 25, 2025
3239895
chore: enable only as a cron
honzajavorek Nov 25, 2025
971da08
chore: run monthly
honzajavorek Nov 25, 2025
1e413ef
style: fix markup
honzajavorek Nov 25, 2025
3bde6d6
fix: address bugbot comments
honzajavorek Nov 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/test-academy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Test Academy

on:
schedule:
- cron: "0 3 1 * *" # at 3am UTC on 1st day of month
workflow_dispatch: # allows running this workflow manually from the Actions tab

jobs:
test-exercises:
name: Test Academy Exercises
runs-on: ubuntu-latest
steps:
- name: Checkout Source code
uses: actions/checkout@v6

- name: Setup Node.js
uses: actions/setup-node@v6
with:
cache: npm
cache-dependency-path: package-lock.json

- name: Setup Python
uses: astral-sh/setup-uv@v7

- name: Install Bats
run: |
corepack enable
npm install --only=dev

- name: Test
run: npm run test:academy
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,7 @@ codegen/*/generated/
codegen/*/go.sum
.github/styles/Microsoft
.github/styles/write-good
sources/academy/**/exercises/storage
sources/academy/**/exercises/node_modules
sources/academy/**/exercises/package*.json
sources/academy/**/exercises/dataset.json
6 changes: 6 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,12 @@ Add languages by adding new folders at the appropriate path level.
- Run `vale sync` to download styles
- Configure exceptions in `accepts.txt`

### Testing
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TC-MO Can you take a look at this README change, please? Does it make sense this way?


- **Broken links**: [Periodic GitHub Action](.github/workflows/lychee.yml) checks broken links by [lychee](https://lychee.cli.rs/). If the Action fails, we manually fix the issues.

- **Academy exercises**: At the end of each lesson in the academy courses, there are exercises that target real-world websites. Each exercise includes a solution, stored as a separate file containing executable code. These files are included in the docs using the `!!raw-loader` syntax. Each course has a [Bats](https://bats-core.readthedocs.io/) test file named `test.bats`. The tests run each solution as a standalone program and verify that it produces output matching the expected results. A [periodic GitHub Action](.github/workflows/test-academy.yml) runs all these tests using `npm run test:academy`. If the Action fails, we rework the exercises.

## Pull request process

1. Follow [Conventional Commits](https://www.conventionalcommits.org/)
Expand Down
11 changes: 11 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
"lint:md:fix": "markdownlint '**/*.md' --fix",
"lint:code": "eslint .",
"lint:code:fix": "eslint . --fix",
"test:academy": "bats --print-output-on-failure -r .",
"postinstall": "patch-package",
"postbuild": "node ./scripts/joinLlmsFiles.mjs && node ./scripts/indentLlmsFile.mjs"
},
Expand All @@ -48,6 +49,7 @@
"@apify/tsconfig": "^0.1.0",
"@types/react": "^19.0.0",
"babel-plugin-styled-components": "^2.1.4",
"bats": "^1.13.0",
"cross-env": "^10.0.0",
"eslint": "^9.32.0",
"eslint-plugin-react": "^7.37.5",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@ description: Lesson about building a Node.js application for watching prices. Us
slug: /scraping-basics-javascript/downloading-html
---

import CodeBlock from '@theme/CodeBlock';
import LegacyJsCourseAdmonition from '@site/src/components/LegacyJsCourseAdmonition';
import Exercises from '../scraping_basics/_exercises.mdx';
import LegoExercise from '!!raw-loader!roa-loader!./exercises/lego.mjs';

<LegacyJsCourseAdmonition />

Expand Down Expand Up @@ -184,28 +186,17 @@ Letting our program visibly crash on error is enough for our purposes. Now, let'

<Exercises />

### Scrape AliExpress
### Scrape LEGO
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only I fixed. After this one I decided fixing the exercises should be in separate PRs, not in this one: #2113


Download HTML of a product listing page, but this time from a real world e-commerce website. For example this page with AliExpress search results:
Download HTML of a product listing page, but this time from a real world e-commerce website. For example this page with LEGO search results:

```text
https://www.aliexpress.com/w/wholesale-darth-vader.html
https://www.lego.com/en-us/themes/star-wars
```

<details>
<summary>Solution</summary>

```js
const url = "https://www.aliexpress.com/w/wholesale-darth-vader.html";
const response = await fetch(url);

if (response.ok) {
console.log(await response.text());
} else {
throw new Error(`HTTP ${response.status}`);
}
```

<CodeBlock language="js">{LegoExercise.code}</CodeBlock>
</details>

### Save downloaded HTML as a file
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,11 @@ description: Lesson about building a Node.js application for watching prices. Us
slug: /scraping-basics-javascript/parsing-html
---

import CodeBlock from '@theme/CodeBlock';
import LegacyJsCourseAdmonition from '@site/src/components/LegacyJsCourseAdmonition';
import Exercises from '../scraping_basics/_exercises.mdx';
import F1AcademyTeamsExercise from '!!raw-loader!roa-loader!./exercises/f1academy_teams.mjs';
import F1AcademyDriversExercise from '!!raw-loader!roa-loader!./exercises/f1academy_drivers.mjs';

<LegacyJsCourseAdmonition />

Expand Down Expand Up @@ -183,22 +186,7 @@ https://www.f1academy.com/Racing-Series/Teams

<details>
<summary>Solution</summary>

```js
import * as cheerio from 'cheerio';

const url = "https://www.f1academy.com/Racing-Series/Teams";
const response = await fetch(url);

if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);
console.log($(".teams-driver-item").length);
} else {
throw new Error(`HTTP ${response.status}`);
}
```

<CodeBlock language="js">{F1AcademyTeamsExercise.code}</CodeBlock>
</details>

### Scrape F1 Academy drivers
Expand All @@ -207,20 +195,5 @@ Use the same URL as in the previous exercise, but this time print a total count

<details>
<summary>Solution</summary>

```js
import * as cheerio from 'cheerio';

const url = "https://www.f1academy.com/Racing-Series/Teams";
const response = await fetch(url);

if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);
console.log($(".driver").length);
} else {
throw new Error(`HTTP ${response.status}`);
}
```

<CodeBlock language="js">{F1AcademyDriversExercise.code}</CodeBlock>
</details>
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,12 @@ description: Lesson about building a Node.js application for watching prices. Us
slug: /scraping-basics-javascript/locating-elements
---

import CodeBlock from '@theme/CodeBlock';
import LegacyJsCourseAdmonition from '@site/src/components/LegacyJsCourseAdmonition';
import Exercises from '../scraping_basics/_exercises.mdx';
import WikipediaCountriesExercise from '!!raw-loader!roa-loader!./exercises/wikipedia_countries.mjs';
import WikipediaCountriesSingleSelectorExercise from '!!raw-loader!roa-loader!./exercises/wikipedia_countries_single_selector.mjs';
import GuardianF1TitlesExercise from '!!raw-loader!roa-loader!./exercises/guardian_f1_titles.mjs';

<LegacyJsCourseAdmonition />

Expand Down Expand Up @@ -238,36 +242,7 @@ Djibouti

<details>
<summary>Solution</summary>

```js
import * as cheerio from 'cheerio';

const url = "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Africa";
const response = await fetch(url);

if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

for (const tableElement of $(".wikitable").toArray()) {
const $table = $(tableElement);
const $rows = $table.find("tr");

for (const rowElement of $rows.toArray()) {
const $row = $(rowElement);
const $cells = $row.find("td");

if ($cells.length > 0) {
const $thirdColumn = $($cells[2]);
const $link = $thirdColumn.find("a").first();
console.log($link.text());
}
}
}
} else {
throw new Error(`HTTP ${response.status}`);
}
```
<CodeBlock language="js">{WikipediaCountriesExercise.code}</CodeBlock>

Because some rows contain [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th), we skip processing a row if `table_row.select("td")` doesn't find any [table data](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/td) cells.

Expand All @@ -288,27 +263,7 @@ You may want to check out the following pages:

<details>
<summary>Solution</summary>

```js
import * as cheerio from 'cheerio';

const url = "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Africa";
const response = await fetch(url);

if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

for (const element of $(".wikitable tr td:nth-child(3)").toArray()) {
const $nameCell = $(element);
const $link = $nameCell.find("a").first();
console.log($link.text());
}
} else {
throw new Error(`HTTP ${response.status}`);
}
```

<CodeBlock language="js">{WikipediaCountriesSingleSelectorExercise.code}</CodeBlock>
</details>

### Scrape F1 news
Expand All @@ -330,23 +285,5 @@ Max Verstappen wins Canadian Grand Prix: F1 – as it happened

<details>
<summary>Solution</summary>

```js
import * as cheerio from 'cheerio';

const url = "https://www.theguardian.com/sport/formulaone";
const response = await fetch(url);

if (response.ok) {
const html = await response.text();
const $ = cheerio.load(html);

for (const element of $("#maincontent ul li h3").toArray()) {
console.log($(element).text());
}
} else {
throw new Error(`HTTP ${response.status}`);
}
```

<CodeBlock language="js">{GuardianF1TitlesExercise.code}</CodeBlock>
</details>
Loading
Loading