chore: test Academy exercises#2097
Conversation
ebf76a7 to
05099c1
Compare
|
Preview for this PR was built for commit |
|
From my perspective, we should design them in a way that they run automatically and do not require manual start-up. Otherwise, we'll forget about it eventually. |
|
Sure, I'd make a GitHub Action, which runs like once a week or once a month (depends on our ability to fix the exercises, doesn't make sense to run them too often). |
|
(Creating such GitHub Action is a matter of a few lines and I'll add it to this PR) |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
0214184 to
b930d1f
Compare
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
4fccbf4 to
96ae391
Compare
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
6c5f05b to
118a550
Compare
|
Preview for this PR was built for commit |
|
I think this should be ready now. The Bats tests currently do fail, because there are failing exercises, so that's expected. But the testing infrastructure is solid and this PR is about the infrastructure. I've set the frequency of the tests to monthly and let's see. I didn't want to snowball this PR, so I recorded the failures and other issues separately and I'll work on them in subsequent PRs: |
| - Run `vale sync` to download styles | ||
| - Configure exceptions in `accepts.txt` | ||
|
|
||
| ### Testing |
There was a problem hiding this comment.
@TC-MO Can you take a look at this README change, please? Does it make sense this way?
| <Exercises /> | ||
|
|
||
| ### Scrape AliExpress | ||
| ### Scrape LEGO |
There was a problem hiding this comment.
This is the only I fixed. After this one I decided fixing the exercises should be in separate PRs, not in this one: #2113
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Comment @cursor review or bugbot run to trigger another review on this PR
…ity of uv options
0382a09 to
3bde6d6
Compare
|
Preview for this PR was built for commit |
| const crawler = new CheerioCrawler({ | ||
| async requestHandler({ $, request, enqueueLinks, pushData }) { | ||
| if (request.label === 'DRIVER') { | ||
| const info = {}; | ||
| for (const itemElement of $('.common-driver-info li').toArray()) { | ||
| const name = $(itemElement).find('span').text().trim(); | ||
| const value = $(itemElement).find('h4').text().trim(); | ||
| info[name] = value; | ||
| } | ||
|
|
||
| const detail = {}; | ||
| for (const linkElement of $('.driver-detail--cta-group a').toArray()) { | ||
| const name = $(linkElement).find('p').text().trim(); | ||
| const value = $(linkElement).find('h2').text().trim(); | ||
| detail[name] = value; | ||
| } | ||
|
|
||
| const dob = info.DOB ?? ''; | ||
| const [dobDay = '', dobMonth = '', dobYear = ''] = dob.split('/'); | ||
|
|
||
| await pushData({ | ||
| url: request.url, | ||
| name: $('h1').text().trim(), | ||
| team: detail.Team, | ||
| nationality: info.Nationality, | ||
| dob: dobYear && dobMonth && dobDay ? `${dobYear}-${dobMonth}-${dobDay}` : null, | ||
| instagram_url: $(".common-social-share a[href*='instagram']").attr('href') ?? null, | ||
| }); | ||
| } else { | ||
| await enqueueLinks({ selector: '.teams-driver-item a', label: 'DRIVER' }); | ||
| } | ||
| }, | ||
| }); |
There was a problem hiding this comment.
this would be better done via router: https://crawlee.dev/js/api/core/class/Router
(up to you if you want to keep the old version, nothing wrong with that functionality-wise)
| const crawler = new CheerioCrawler({ | |
| async requestHandler({ $, request, enqueueLinks, pushData }) { | |
| if (request.label === 'DRIVER') { | |
| const info = {}; | |
| for (const itemElement of $('.common-driver-info li').toArray()) { | |
| const name = $(itemElement).find('span').text().trim(); | |
| const value = $(itemElement).find('h4').text().trim(); | |
| info[name] = value; | |
| } | |
| const detail = {}; | |
| for (const linkElement of $('.driver-detail--cta-group a').toArray()) { | |
| const name = $(linkElement).find('p').text().trim(); | |
| const value = $(linkElement).find('h2').text().trim(); | |
| detail[name] = value; | |
| } | |
| const dob = info.DOB ?? ''; | |
| const [dobDay = '', dobMonth = '', dobYear = ''] = dob.split('/'); | |
| await pushData({ | |
| url: request.url, | |
| name: $('h1').text().trim(), | |
| team: detail.Team, | |
| nationality: info.Nationality, | |
| dob: dobYear && dobMonth && dobDay ? `${dobYear}-${dobMonth}-${dobDay}` : null, | |
| instagram_url: $(".common-social-share a[href*='instagram']").attr('href') ?? null, | |
| }); | |
| } else { | |
| await enqueueLinks({ selector: '.teams-driver-item a', label: 'DRIVER' }); | |
| } | |
| }, | |
| }); | |
| const crawler = new CheerioCrawler(); | |
| crawler.router.addDefaultHandler(async ({ enqueueLinks }) => { | |
| await enqueueLinks({ selector: '.teams-driver-item a', label: 'DRIVER' }); | |
| }); | |
| crawler.router.addHandler('DRIVER', async ({ $, request, enqueueLinks, pushData }) => { | |
| const info = {}; | |
| for (const itemElement of $('.common-driver-info li').toArray()) { | |
| const name = $(itemElement).find('span').text().trim(); | |
| const value = $(itemElement).find('h4').text().trim(); | |
| info[name] = value; | |
| } | |
| const detail = {}; | |
| for (const linkElement of $('.driver-detail--cta-group a').toArray()) { | |
| const name = $(linkElement).find('p').text().trim(); | |
| const value = $(linkElement).find('h2').text().trim(); | |
| detail[name] = value; | |
| } | |
| const dob = info.DOB ?? ''; | |
| const [dobDay = '', dobMonth = '', dobYear = ''] = dob.split('/'); | |
| await pushData({ | |
| url: request.url, | |
| name: $('h1').text().trim(), | |
| team: detail.Team, | |
| nationality: info.Nationality, | |
| dob: dobYear && dobMonth && dobDay ? `${dobYear}-${dobMonth}-${dobDay}` : null, | |
| instagram_url: $(".common-social-share a[href*='instagram']").attr('href') ?? null, | |
| }); | |
| }); |
There was a problem hiding this comment.
Oh! This is cool! I was wondering if there's JS alternative to the project structure which in Python Crawlee can be achieved by the @crawler.router.handler(...) decorators. And – surprisingly – it has even the same name, router 🤦♂️
This needs to be done throughout the whole lesson, so I filed #2181 and will tackle it separately. Linking this code suggestion so that it's not lost! 📁
Fix #2181 Address #2097 (comment) by @B4nan
Fix #2181, address #2097 (comment) by @B4nan <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk documentation/exercise refactor that changes sample code structure to the newer Crawlee `router` API without altering the scraping behavior. > > **Overview** > Refactors the Crawlee lesson (`12_framework.md`) to construct `CheerioCrawler` without an inline `requestHandler` and instead register routing via `crawler.router.addDefaultHandler()` plus labeled `addHandler()` functions (e.g. `DETAIL`, `IMDB_SEARCH`, `IMDB`). > > Updates the associated exercise solutions (`crawlee_f1_drivers.mjs`, `crawlee_netflix_ratings.mjs`) to match the router-based pattern, removing `request.label` branching and splitting listing/search/detail logic into dedicated handlers while keeping the same data extraction and dataset export steps. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 1aa2eb6. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
This is a proof of concept how we could test exercises in the academy:
bats -r --print-output-on-failure .brew install batsTodo:
Note
Adds a monthly GitHub Action and Bats-based suite to run Academy JS/Python exercise solutions, embeds solutions into lessons, and updates docs/ignore files.
.github/workflows/test-academy.ymlto run Academy exercises via Node (npm) and Python (uv).sources/academy/**/exercises/test.bats(JS & Python) executing solutions and asserting outputs.test:academyand dev dependencybats.CodeBlock+!!raw-loaderacross JS/Python lessons (04–12), replacing inline snippets..gitignoreto exclude exercise artifacts (storage,node_modules,package*.json,dataset.json).CONTRIBUTING.md(broken links check + Academy exercises CI).Written by Cursor Bugbot for commit 118a550. Configure here.