-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathVectorSearch.txt
More file actions
47 lines (25 loc) · 1.16 KB
/
VectorSearch.txt
File metadata and controls
47 lines (25 loc) · 1.16 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Summary: How Search Works (Vector & Index Approach)
Document Representation (Vectorization)
Convert each document into a vector based on a vocabulary of words.
Example: "I love beach" → d1 = (1,0,1,0,0,0)
Inverted Index (Preprocessing)
Store a mapping of word → list of documents containing it.
Example:
love → [d1, d3]
beach → [d1, d2]
music → [d3]
Makes searching fast (O(1) lookup per word) instead of scanning all documents.
Query Handling
User types query → convert query to vector.
Use inverted index to fetch only documents containing query words.
Ranking (Similarity Calculation)
SimSca (dot product): counts word overlaps; favors long documents.
SimCos (cosine similarity): normalizes vector length; focuses on real similarity.
Rank documents by similarity to the query.
Result
Return top-ranked documents to the user.
Efficient, scalable, and avoids scanning all documents every search.
💡 Bonus Note:
This approach is very similar to how Google Search or chat-matching apps (like Omegle) store and retrieve data efficiently:
Key-value pairs for instant lookups (word → documents or user → partner)
Normalization / ranking for relevance