Smart Search Engine using Vector Space Model (VSM):

A Python-based Search Engine that retrieves and ranks documents from a local dataset using the **Vector Space Model (VSM) and TF-IDF ranking.

This project demonstrates core concepts of Information Retrieval, Search Algorithms, and Text Processing used in real-world systems.

Core Features

Tokenization (splitting text into words)
Stopword Removal (removing common words like the, is, a)
Stemming (reducing words to root form)
Inverted Index (fast word → document lookup)
TF-IDF Ranking (relevance-based scoring)

Advanced enhancements

Title Boost (title matches weighted higher)
Category Filter (search within specific categories)
Boolean Search (AND, OR, NOT)
Autocomplete suggestions
Search History tracking

What is Vector Space Model (VSM)?

The Vector Space Model (VSM) represents documents and queries as vectors in a multi-dimensional space.

Key Idea

Each document is converted into a vector
Each word represents a dimension
Importance of words is calculated using TF-IDF

How It Works

User enters a query
Text is processed:
- Tokenization
- Stopword removal
- Stemming
Documents are converted into vectors using TF-IDF
Query vector is compared with document vectors
Similarity is calculated
Results are ranked based on relevance

Cosine Similarity

Used to measure similarity between query and documents: cos(θ) = (A · B) / (||A|| × ||B||)

A = Query vector
B = Document vector
Value ranges from 0 to 1

Tech Stack

Python
Jupyter Notebook
JSON (for dataset storage)

Project Structure

search-engine-project/
│
├── main.ipynb            # Main code
├── database.json         # Dataset
└── README.md             # Documentation

How to Run

Clone the repository:

git clone https://github.com/Swetalin26/search-engine-project.git

Go to project folder:

cd search-engine-project

Open Jupyter Notebook:

jupyter notebook

Run search_engine.ipynb

🧪 Example Queries

machine learning
climate change
blockchain AND cryptocurrency
mental health
stock market

Learning Outcomes

Understanding Vector Space Model
Implementing TF-IDF
Building search using inverted index
Applying Boolean logic in search
Improving user experience with enhancements

Future Improvements

Web interface (React / Next.js)
Better ranking using NLP
Voice-based search
Real-time data integration

Author

Swetalin Sahoo B.Tech Student | Aspiring Developer

Note

This is a beginner-friendly project that demonstrates how modern search engines rank and retrieve information.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
database.json		database.json
index.pkl		index.pkl
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart Search Engine using Vector Space Model (VSM):

Core Features

Advanced enhancements

What is Vector Space Model (VSM)?

Key Idea

How It Works

Cosine Similarity

Tech Stack

Project Structure

How to Run

🧪 Example Queries

Learning Outcomes

Future Improvements

Author

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smart Search Engine using Vector Space Model (VSM):

Core Features

Advanced enhancements

What is Vector Space Model (VSM)?

Key Idea

How It Works

Cosine Similarity

Tech Stack

Project Structure

How to Run

🧪 Example Queries

Learning Outcomes

Future Improvements

Author

Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages