Skip to content

Commit f1bc4b4

Browse files
DSA Project: Search Engine with Stack + Inverted Index
1 parent b8f706a commit f1bc4b4

14 files changed

Lines changed: 246 additions & 196 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ NFC-Projects/
1414
├── PhoneBook-HashTable/
1515
│ └── README.md
1616
17-
├── Mini-Search-Engine/
17+
├── SearchEngine/
1818
│ └── README.md
1919
2020
└── README.md (this file)

SearchEngine/README.md

Lines changed: 78 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,78 +1,107 @@
1-
# 🔍 Mini Search Engine
1+
# 🔍 Mini Search Engine with Stack
22

3-
A simple **Search Engine project** built as part of my **2nd Semester DSA Lab Project (BS AI, NFC IET Multan)**.
4-
It demonstrates core **Data Structures and Algorithms concepts** such as **hash tables, string processing, searching, and modular coding**.
3+
A **Mini Search Engine project** developed as part of my **2nd Semester DSA Lab Project (BS AI, NFC IET Multan)**.
4+
It demonstrates **core Data Structures and Algorithms (DSA)** concepts such as:
5+
- **Stack** (for search history navigation)
6+
- **Inverted Index / Hash Map** (for efficient keyword-based searching)
7+
- **String processing & searching algorithms**
58

69
---
710

8-
## 📌 Features
9-
-Store and retrieve web pages using **Hash Tables**
10-
-Search queries with **best keyword matching**
11-
-Support for **multiple results** with user choice
12-
-Update and delete existing page links
13-
-Organized into **modular files** (`search_engine.py`, `main.py`)
14-
-Example usage in the `examples/` folder
11+
## Features
12+
-**Keyword-based Search** → finds documents containing query terms
13+
-**Ranked Results** → based on frequency of query terms
14+
-**Search History (Stack)** → supports `back` command just like a browser
15+
-**Document Viewer** → open `.txt` files directly from search results
16+
-**Automatic Crawler** → indexes all `.txt` files in the `documents/` folder
17+
-**Clean modular structure** for GitHub
1518

1619
---
1720

1821
## 🗂️ Project Structure
1922
```txt
20-
NFC-Search-Engine/
23+
Mini-Search-Engine/
2124
22-
├── search_engine.py # Core SearchEngine class implementation
23-
├── main.py # Main program (semester project runner)
25+
├── stack.py # Stack implementation (push, pop, peek, empty)
26+
├── index.py # Inverted Index implementation
27+
├── search.py # Search Engine logic
28+
├── main.py # Entry point for running the project
2429
25-
├── examples/ # Example usage (optional demos)
26-
│ └── demo.py
30+
├── documents/ # Folder containing sample text files
31+
│ ├── doc1.txt
32+
│ ├── doc2.txt
33+
│ └── ...
2734
28-
└── README.md # Project documentation
35+
└── README.md # Project documentation
2936
```
30-
31-
---
32-
3337
## ⚡ How It Works
34-
1. Pages are added with a **keyword → link** mapping.
35-
2. A **hash function** stores them in a fixed-size hash table.
36-
3. The user can search with any query:
37-
- If one best match → the page opens directly.
38-
- If multiple matches → a numbered menu appears for choice.
39-
4. Extra features:
40-
- `update_page(keyword, new_link)` → update existing link.
41-
- `delete_page(keyword, link)` → delete specific page link.
38+
- The program scans the `documents/` folder and builds an **inverted index**.
39+
- When the user searches, queries are **cleaned** (lowercased, punctuation removed, split into words).
40+
- Matching documents are **ranked by query word frequency**.
41+
- The query is **pushed onto the Stack (history)**.
42+
- If the user types **back**, the last query is **popped** and the previous one is shown again.
43+
- The user can open a result to see the **full content of the file**.
4244

4345
---
4446

4547
## ▶️ Usage
4648

47-
### Run the main program:
49+
### Run the program:
4850
```bash
4951
python main.py
5052
```
51-
## Example Search
52-
```python
53-
Please, Enter any query to search: nfc and ai
54-
Multiple results found:
55-
1. ai → https://en.wikipedia.org/wiki/Artificial_intelligence
56-
2. nfc → https://www.nfciet.edu.pk/
57-
3. ai_department → https://www.nfciet.edu.pk/bs-artificial-intelligence/
53+
### Example Session:
5854
```
55+
Building index...
56+
Index built with 5 docs.
5957
60-
## 🏫 Academic Info
61-
- 📖 **Course**: Data Structures & Algorithms (DSA)
62-
- 🎓 **Semester**: 2nd Semester, BS Artificial Intelligence
63-
- 🏛️ **University**: NFC IET Multan
64-
- 👨‍💻 **Student**: Muawiya
58+
Enter search query, 'back', or 'quit': programming
59+
Searching: 'programming'
60+
Found 2 docs:
61+
1. doc3.txt | Score: 2
62+
2. document1.txt | Score: 1
6563
66-
---
67-
## 👥 Team Members
68-
- 👨‍💻 **Muawiya** (Team Leader)
69-
- 👨‍💻 M. Umar
70-
- 👨‍💻 Hassan Khan
64+
Enter doc number to open, or 'next': 1
65+
66+
--- doc3.txt ---
67+
An algorithm is a step-by-step procedure for solving problems.
68+
Algorithms are crucial in computer programming...
69+
------------------
7170
71+
Enter search query, 'back', or 'quit': back
72+
Back to: 'programming'
73+
Found 2 docs:
74+
1. doc3.txt | Score: 2
75+
2. document1.txt | Score: 1
76+
```
7277
---
73-
## 🚀 Future Improvements
74-
- Add **ranking system** for results (frequency & relevance).
75-
- Build a **GUI or Web-based interface**.
76-
- Support **export/import** of stored links.
78+
## 🏫 Academic Info
79+
80+
+ 📖 Course: Data Structures & Algorithms (DSA)
81+
82+
+ 🎓 Semester: 2nd Semester, BS Artificial Intelligence
83+
84+
+ 🏛️ University: NFC IET Multan
85+
86+
+ 👨‍💻 Student: Muawiya Amir
87+
88+
----
89+
### 👥 Team Members
90+
91+
+ 👨‍💻 Muawiya (Team Leader)
92+
93+
+ 👨‍💻 M. Umar
7794

7895
---
96+
97+
### 🚀 Future Improvements
98+
99+
+ Add ***synonym & fuzzy*** matching for queries
100+
101+
+ Implement ***OR / NOT*** search operators
102+
103+
+ Enhance ranking with ***TF-IDF instead*** of simple counts
104+
105+
+ Build a ***GUI or Web-based*** interface
106+
107+
------

SearchEngine/documents/doc1.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Python is a powerful programming language. It is widely used in data science, artificial intelligence, and web development. Its simple and readable syntax makes it an excellent choice for beginners. Many open-source libraries are available to extend its functionality.

SearchEngine/documents/doc2.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Data structures are a fundamental concept in computer science. They are used to organize and store data efficiently. Examples include arrays, linked lists, stacks, and queues. Understanding these structures is key to writing effective algorithms.

SearchEngine/documents/doc3.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
An algorithm is a step-by-step procedure or formula for solving a problem. Algorithms are crucial in computer programming and are independent of the programming language. Searching and sorting are classic examples of problems solved with various algorithms.

SearchEngine/documents/doc4.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
A stack is a linear data structure that follows the Last-In, First-Out (LIFO) principle. Think of it like a stack of plates. The last plate you put on top is the first one you take off. Operations on a stack include push (to add an item) and pop (to remove an item).

SearchEngine/documents/doc5.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
The field of artificial intelligence, or AI, involves the development of computer systems that can perform tasks that would normally require human intelligence. This includes things like visual perception, speech recognition, decision-making, and language translation. Machine learning is a subfield of AI.

SearchEngine/examples/demo.py

Lines changed: 0 additions & 14 deletions
This file was deleted.

SearchEngine/index.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# index.py
2+
import string
3+
4+
class InvertedIndex:
5+
"""Inverted Index for storing word → doc → frequency."""
6+
7+
def __init__(self):
8+
self.index = {}
9+
10+
def add_doc(self, doc, text):
11+
text = text.lower().translate(str.maketrans('', '', string.punctuation))
12+
words = [w.strip() for w in text.split() if w.strip()]
13+
for w in words:
14+
if w not in self.index:
15+
self.index[w] = {}
16+
self.index[w][doc] = self.index[w].get(doc, 0) + 1
17+
18+
def search(self, query):
19+
q_words = self._clean_query(query)
20+
if not q_words:
21+
return []
22+
23+
if q_words[0] not in self.index:
24+
return []
25+
results = set(self.index[q_words[0]].keys())
26+
27+
for w in q_words[1:]:
28+
if w not in self.index:
29+
return []
30+
results &= set(self.index[w].keys())
31+
32+
ranked = []
33+
for doc in results:
34+
score = sum(self.index[w].get(doc, 0) for w in q_words)
35+
ranked.append({"doc": doc, "score": score})
36+
37+
return sorted(ranked, key=lambda x: x["score"], reverse=True)
38+
39+
def _clean_query(self, query):
40+
query = query.lower().translate(str.maketrans('', '', string.punctuation))
41+
return [w.strip() for w in query.split() if w.strip()]

SearchEngine/main.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# main.py
2+
from search import SearchSim
3+
4+
if __name__ == "__main__":
5+
SearchSim().run()

0 commit comments

Comments
 (0)