|
1 | | -# 🔍 Mini Search Engine |
| 1 | +# 🔍 Mini Search Engine with Stack |
2 | 2 |
|
3 | | -A simple **Search Engine project** built as part of my **2nd Semester DSA Lab Project (BS AI, NFC IET Multan)**. |
4 | | -It demonstrates core **Data Structures and Algorithms concepts** such as **hash tables, string processing, searching, and modular coding**. |
| 3 | +A **Mini Search Engine project** developed as part of my **2nd Semester DSA Lab Project (BS AI, NFC IET Multan)**. |
| 4 | +It demonstrates **core Data Structures and Algorithms (DSA)** concepts such as: |
| 5 | +- **Stack** (for search history navigation) |
| 6 | +- **Inverted Index / Hash Map** (for efficient keyword-based searching) |
| 7 | +- **String processing & searching algorithms** |
5 | 8 |
|
6 | 9 | --- |
7 | 10 |
|
8 | | -## 📌 Features |
9 | | -- ✅ Store and retrieve web pages using **Hash Tables** |
10 | | -- ✅ Search queries with **best keyword matching** |
11 | | -- ✅ Support for **multiple results** with user choice |
12 | | -- ✅ Update and delete existing page links |
13 | | -- ✅ Organized into **modular files** (`search_engine.py`, `main.py`) |
14 | | -- ✅ Example usage in the `examples/` folder |
| 11 | +## ✨ Features |
| 12 | +- ✅ **Keyword-based Search** → finds documents containing query terms |
| 13 | +- ✅ **Ranked Results** → based on frequency of query terms |
| 14 | +- ✅ **Search History (Stack)** → supports `back` command just like a browser |
| 15 | +- ✅ **Document Viewer** → open `.txt` files directly from search results |
| 16 | +- ✅ **Automatic Crawler** → indexes all `.txt` files in the `documents/` folder |
| 17 | +- ✅ **Clean modular structure** for GitHub |
15 | 18 |
|
16 | 19 | --- |
17 | 20 |
|
18 | 21 | ## 🗂️ Project Structure |
19 | 22 | ```txt |
20 | | -NFC-Search-Engine/ |
| 23 | +Mini-Search-Engine/ |
21 | 24 | │ |
22 | | -├── search_engine.py # Core SearchEngine class implementation |
23 | | -├── main.py # Main program (semester project runner) |
| 25 | +├── stack.py # Stack implementation (push, pop, peek, empty) |
| 26 | +├── index.py # Inverted Index implementation |
| 27 | +├── search.py # Search Engine logic |
| 28 | +├── main.py # Entry point for running the project |
24 | 29 | │ |
25 | | -├── examples/ # Example usage (optional demos) |
26 | | -│ └── demo.py |
| 30 | +├── documents/ # Folder containing sample text files |
| 31 | +│ ├── doc1.txt |
| 32 | +│ ├── doc2.txt |
| 33 | +│ └── ... |
27 | 34 | │ |
28 | | -└── README.md # Project documentation |
| 35 | +└── README.md # Project documentation |
29 | 36 | ``` |
30 | | - |
31 | | ---- |
32 | | - |
33 | 37 | ## ⚡ How It Works |
34 | | -1. Pages are added with a **keyword → link** mapping. |
35 | | -2. A **hash function** stores them in a fixed-size hash table. |
36 | | -3. The user can search with any query: |
37 | | - - If one best match → the page opens directly. |
38 | | - - If multiple matches → a numbered menu appears for choice. |
39 | | -4. Extra features: |
40 | | - - `update_page(keyword, new_link)` → update existing link. |
41 | | - - `delete_page(keyword, link)` → delete specific page link. |
| 38 | +- The program scans the `documents/` folder and builds an **inverted index**. |
| 39 | +- When the user searches, queries are **cleaned** (lowercased, punctuation removed, split into words). |
| 40 | +- Matching documents are **ranked by query word frequency**. |
| 41 | +- The query is **pushed onto the Stack (history)**. |
| 42 | +- If the user types **back**, the last query is **popped** and the previous one is shown again. |
| 43 | +- The user can open a result to see the **full content of the file**. |
42 | 44 |
|
43 | 45 | --- |
44 | 46 |
|
45 | 47 | ## ▶️ Usage |
46 | 48 |
|
47 | | -### Run the main program: |
| 49 | +### Run the program: |
48 | 50 | ```bash |
49 | 51 | python main.py |
50 | 52 | ``` |
51 | | -## Example Search |
52 | | -```python |
53 | | -Please, Enter any query to search: nfc and ai |
54 | | -Multiple results found: |
55 | | -1. ai → https://en.wikipedia.org/wiki/Artificial_intelligence |
56 | | -2. nfc → https://www.nfciet.edu.pk/ |
57 | | -3. ai_department → https://www.nfciet.edu.pk/bs-artificial-intelligence/ |
| 53 | +### Example Session: |
58 | 54 | ``` |
| 55 | +Building index... |
| 56 | +Index built with 5 docs. |
59 | 57 |
|
60 | | -## 🏫 Academic Info |
61 | | -- 📖 **Course**: Data Structures & Algorithms (DSA) |
62 | | -- 🎓 **Semester**: 2nd Semester, BS Artificial Intelligence |
63 | | -- 🏛️ **University**: NFC IET Multan |
64 | | -- 👨💻 **Student**: Muawiya |
| 58 | +Enter search query, 'back', or 'quit': programming |
| 59 | +Searching: 'programming' |
| 60 | +Found 2 docs: |
| 61 | +1. doc3.txt | Score: 2 |
| 62 | +2. document1.txt | Score: 1 |
65 | 63 |
|
66 | | ---- |
67 | | -## 👥 Team Members |
68 | | -- 👨💻 **Muawiya** (Team Leader) |
69 | | -- 👨💻 M. Umar |
70 | | -- 👨💻 Hassan Khan |
| 64 | +Enter doc number to open, or 'next': 1 |
| 65 | +
|
| 66 | +--- doc3.txt --- |
| 67 | +An algorithm is a step-by-step procedure for solving problems. |
| 68 | +Algorithms are crucial in computer programming... |
| 69 | +------------------ |
71 | 70 |
|
| 71 | +Enter search query, 'back', or 'quit': back |
| 72 | +Back to: 'programming' |
| 73 | +Found 2 docs: |
| 74 | +1. doc3.txt | Score: 2 |
| 75 | +2. document1.txt | Score: 1 |
| 76 | +``` |
72 | 77 | --- |
73 | | -## 🚀 Future Improvements |
74 | | -- Add **ranking system** for results (frequency & relevance). |
75 | | -- Build a **GUI or Web-based interface**. |
76 | | -- Support **export/import** of stored links. |
| 78 | +## 🏫 Academic Info |
| 79 | + |
| 80 | ++ 📖 Course: Data Structures & Algorithms (DSA) |
| 81 | + |
| 82 | ++ 🎓 Semester: 2nd Semester, BS Artificial Intelligence |
| 83 | + |
| 84 | ++ 🏛️ University: NFC IET Multan |
| 85 | + |
| 86 | ++ 👨💻 Student: Muawiya Amir |
| 87 | + |
| 88 | +---- |
| 89 | +### 👥 Team Members |
| 90 | + |
| 91 | ++ 👨💻 Muawiya (Team Leader) |
| 92 | + |
| 93 | ++ 👨💻 M. Umar |
77 | 94 |
|
78 | 95 | --- |
| 96 | + |
| 97 | +### 🚀 Future Improvements |
| 98 | + |
| 99 | ++ Add ***synonym & fuzzy*** matching for queries |
| 100 | + |
| 101 | ++ Implement ***OR / NOT*** search operators |
| 102 | + |
| 103 | ++ Enhance ranking with ***TF-IDF instead*** of simple counts |
| 104 | + |
| 105 | ++ Build a ***GUI or Web-based*** interface |
| 106 | + |
| 107 | +------ |
0 commit comments