Skip to content

Latest commit

 

History

History
230 lines (173 loc) · 5.81 KB

File metadata and controls

230 lines (173 loc) · 5.81 KB

🌸 Perfume Scraper

A Python-based web scraping project designed to extract structured perfume product data from Liliome.com.
The scraper collects brand and product information and supports multiple database backends, allowing the same data pipeline to be stored in relational and NoSQL databases.


📌 Features

✔ Robust HTTP session

  • Uses requests.Session with retry logic
  • Handles connection failures gracefully (safe_get())

✔ Web scraping

  • Extracts:
    • Brand name
    • English title
    • Persian title
    • Old price
    • New price
    • Product rating (Point)
    • Photo URL
  • Automatically discovers all available brands and their product pages

✔ Pagination handling

  • Detects number of pages for each brand using total_pages()

🗄️ Supported Databases

The project has been refactored to support multiple storage backends, making it easy to switch between databases:

  • SQLite – lightweight local storage
  • SQL Server – enterprise relational database
  • PostgreSQL – open-source relational database
  • MongoDB – NoSQL document-based storage

This design enables comparison between SQL and NoSQL data models using the same scraping logic. Two tables are created automatically:

Brands

Column Type Description
Brand_ID INTEGER Primary key
Brand_Link TEXT URL of brand page
Brand_Name TEXT Extracted brand name

Master

Column Type Description
ID INTEGER Primary key
Brand TEXT Brand slug
EnglishName TEXT Product English title
Name TEXT Product Persian title
Point FLOAT Product rating
OldPrice INTEGER Old price
NewPrice INTEGER New price
Photo TEXT Image URL

🛠 Technologies Used

  • Python 3
  • Requests
  • BeautifulSoup4
  • SQLite3
  • SQL Server
  • PostgreSQL
  • MongoDB
  • Retry & Timeout handling
  • Regex for price cleanup

📁 Project Structure

Perfume_Scraper/
│
├── assets/
│ └── mongodb_brands.png
│ └── mongodb_master.png
│ └── postgres_brands.png
│ └── postgres_master.png
│ └── sqlite_brands.png
│ └── sqlite_master.png
│ └── sqlserver_brands.png
│ └── sqlserver_master.png
│
├── db/
│ └── Perfume.db # Automatically created database for SQLite
│
├── Scraper_MongoDB.py
├── Scraper_MongoDB_Safe.py
├── Scraper_PostgreSQL.py
├── Scraper_PostgreSQL_Safe.py
├── Scraper_SQL.py
├── Scraper_SQL_Safe.py
├── Scraper_SQLite.py
├── Scraper_SQLite_Safe.py
├── README.md
├── requirements.txt

🚀 How It Works

1️⃣ Load Liliome brand list

The script visits:

https://liliome.com/برندها-عطر-ادکلن-فروشگاه-عطر-لیلیوم

It finds all brand links and stores them in the Brands table.


2️⃣ For each brand:

  • Detects how many pages of products exist
  • Extracts products from each page
  • Saves structured data into the Master table

▶️ How to Run

  1. Clone the repository:
git clone https://github.com/SamiraSiavash/Perfume_Scraper_Multi_DB.git
cd Perfume_Scraper_Multi_DB
  1. Install dependencies:
pip install -r requirements.txt
  1. Run one of the scrapers:
Scraper_MongoDB.py
Scraper_MongoDB_Safe.py
Scraper_PostgreSQL.py
Scraper_PostgreSQL_Safe.py
Scraper_SQL.py
Scraper_SQL_Safe.py
Scraper_SQLite.py
Scraper_SQLite_Safe.py

🖼 Screenshots

SQLite

![Brands Table](assets/sqlite_brands.png)
sqlite_brands
![Master Table](assets/sqlite_master.png)
sqlite_master

SQL Server

![Brands Table](assets/sqlserver_brands.png)
sqlserver_brands
![Master Table](assets/sqlserver_master.png)
sqlserver_master

PostgreSQL

![Brands Table](assets/postgres_brands.png)
postgres_brands
![Master Table](assets/postgres_master.png)
postgres_master

MongoDB

![Brands Collection](assets/mongodb_brands.png)
mongodb_brands
![Master Collection](assets/mongodb_master.png)
mongodb_master

📝 Notes

  • Adjust CSS selectors depending on website structure.
  • Website layouts may change; update selectors accordingly.
  • Always follow the target website’s Terms of Service.

📄 License

MIT License (optional)


✨ Author

Samira Siavash

🔗 GitHub: https://github.com/SamiraSiavash

🔗 LinkedIn: https://linkedin.com/in/samira-siavash