Welcome to the AI Web Scraper project! In this first chapter, we'll explore how you interact with the tool — how you give it instructions and how it shows results. This is all about the Application Workflow & UI.
Think of it as the control panel of a smart robot 🧠🤖: You enter a request, press buttons, and get the information you need, neatly extracted.
Tired of copy-pasting ingredients from recipe websites 🍲 or pulling product details from 50+ pages manually? 😩
The AI Web Scraper automates this for you! Just tell it:
-
🔗 What URL to visit
-
🔍 What data to extract
The UI is the dashboard where you give these instructions and see the results.
-
Built with Streamlit
-
Provides input boxes, buttons, and output displays
-
Lets you enter URLs and instructions
-
The behind-the-scenes steps triggered by your input
-
From clicking "Scrape" to seeing a clean result
Let's walk through our recipe ingredient example. How would you use the UI to get the ingredients from a recipe page?
Here's how you'd use the scraper to extract ingredients from a recipe page:
-
🔗 Enter a URL – Paste the recipe page link
-
🧹 Click "Scrape Site" – The scraper grabs the page's content
-
📝 See Raw Text – Messy but useful, just an intermediate step
-
🔍 Describe What You Want – e.g., “List all ingredients in the recipe”
-
🤖 Click "Parse Content" – AI reads and extracts the info
-
✅ See the Result – A clean list of ingredients appears!
Here's a peek at the code from main.py that creates these parts of the UI:
import streamlit as st
# ... other imports ...
st.title("AI Web Scraper") # This is the main title you see
url = st.text_input("Enter the URL to scrape") # This creates the box for the URL
if st.button("Scrape Site"): # This creates the first button
# ... code for scraping happens here ...
st.write("Scraping...") # Shows a message
# ... code to show DOM content ...
if "dom_content" in st.session_state: # This part only shows up after scraping
parse_description = st.text_area("Describe what you want to parse?") # Box for your request
if st.button("Parse Content"): # The second button
if parse_description:
# ... code for parsing happens here ...
st.write("Parsing the content...") # Shows a message
# ... code to display final result ...This simple script uses streamlit commands like st.title, st.text_input, st.button, st.write, and st.text_area to build the interactive parts you see and use.
🧠 Simplified Breakdown
-
You enter a URL and hit scrape
-
main.py triggers scrape.py to fetch content
-
Content is cleaned and stored
-
You describe what to extract
-
main.py sends the content to parse.py
-
The parser returns clean, structured info
-
The UI shows it to you! 🎉
Note -> 🗒️ st.session_state keeps the scraped content stored between actions — like a sticky note for your app!
-
The UI is your main control center 🕹️
-
You give the what and where, and the app does the rest
-
The workflow coordinates scraping, cleaning, parsing, and displaying
-
You get accurate, clean results with just a few clicks
Next up: Let’s look at how the Web Scraper fetches content from a site! 🕸️