In this chapter, you will learn how to extract data from websites using BeautifulSoup, and how to take a professional approach by checking for available APIs before scraping raw HTML.
- Understand what web scraping is and where to use it.
- Learn to parse HTML with
BeautifulSoup. - Learn to inspect websites and find hidden APIs.
- Use
requeststo fetch data from API endpoints. - Build a real-world notifier program based on MAKAUT notices.
┌────────────┐
│ API/HTML │
└─────┬──────┘
│ requests.get()
▼
┌────────────┐
│ Program │
└─────┬──────┘
│ Parse JSON or HTML
▼
┌──────────────┐
│ Compare Data │ ← Last notice stored in file
└─────┬────────┘
│
├──▶ No change → Sleep → Repeat
│
▼
┌─────────────────────┐
│ Show Notification │
│ Print Latest Notices│
│ Save New Notice │
└─────────────────────┘
When solving real-world problems:
- Inspect the website (Check Elements → Network Tab).
- Look for hidden API endpoints to avoid brittle HTML scraping.
- Always save state (last notice, last price, last update).
- Notify the user when something new happens.
- Make it repeatable with loops or schedulers.