Skip to content

Latest commit

 

History

History
112 lines (63 loc) · 3.65 KB

File metadata and controls

112 lines (63 loc) · 3.65 KB

📘 Chapter 1: Application Workflow & UI

Welcome to the AI Web Scraper project! In this first chapter, we'll explore how you interact with the tool — how you give it instructions and how it shows results. This is all about the Application Workflow & UI.

Think of it as the control panel of a smart robot 🧠🤖: You enter a request, press buttons, and get the information you need, neatly extracted.

💡 Problem This Solves

Tired of copy-pasting ingredients from recipe websites 🍲 or pulling product details from 50+ pages manually? 😩

The AI Web Scraper automates this for you! Just tell it:

  • 🔗 What URL to visit

  • 🔍 What data to extract

The UI is the dashboard where you give these instructions and see the results.

🧱 Key Components

1. User Interface (UI) 🖥️

  • Built with Streamlit

  • Provides input boxes, buttons, and output displays

  • Lets you enter URLs and instructions

2. Workflow 🔄

  • The behind-the-scenes steps triggered by your input

  • From clicking "Scrape" to seeing a clean result

🚀 Your First Scrape (Step-by-Step)

Let's walk through our recipe ingredient example. How would you use the UI to get the ingredients from a recipe page?

Here's how you'd use the scraper to extract ingredients from a recipe page:

  1. 🔗 Enter a URL – Paste the recipe page link

  2. 🧹 Click "Scrape Site" – The scraper grabs the page's content

  3. 📝 See Raw Text – Messy but useful, just an intermediate step

  4. 🔍 Describe What You Want – e.g., “List all ingredients in the recipe”

  5. 🤖 Click "Parse Content" – AI reads and extracts the info

  6. ✅ See the Result – A clean list of ingredients appears!

Here's a peek at the code from main.py that creates these parts of the UI:

import streamlit as st
# ... other imports ...

st.title("AI Web Scraper") # This is the main title you see

url = st.text_input("Enter the URL to scrape") # This creates the box for the URL

if st.button("Scrape Site"): # This creates the first button
    # ... code for scraping happens here ...
    st.write("Scraping...") # Shows a message

# ... code to show DOM content ...

if "dom_content" in st.session_state: # This part only shows up after scraping
    parse_description = st.text_area("Describe what you want to parse?") # Box for your request

    if st.button("Parse Content"): # The second button
        if parse_description:
            # ... code for parsing happens here ...
            st.write("Parsing the content...") # Shows a message
            # ... code to display final result ...

This simple script uses streamlit commands like st.title, st.text_input, st.button, st.write, and st.text_area to build the interactive parts you see and use.

🔄 Behind the Scenes: The Workflow

🧠 Simplified Breakdown

  1. You enter a URL and hit scrape

  2. main.py triggers scrape.py to fetch content

  3. Content is cleaned and stored

  4. You describe what to extract

  5. main.py sends the content to parse.py

  6. The parser returns clean, structured info

  7. The UI shows it to you! 🎉

Note -> 🗒️ st.session_state keeps the scraped content stored between actions — like a sticky note for your app!

✅ Summary

  • The UI is your main control center 🕹️

  • You give the what and where, and the app does the rest

  • The workflow coordinates scraping, cleaning, parsing, and displaying

  • You get accurate, clean results with just a few clicks

Next up: Let’s look at how the Web Scraper fetches content from a site! 🕸️

Next Chapter: Web Scraper