Skip to content

Commit f09d2cc

Browse files
committed
feat(datastructures, hashmap): design a hashmap)
1 parent 98bf909 commit f09d2cc

28 files changed

+461
-133
lines changed
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Jewels and Stones
2+
3+
You're given strings `jewels` representing the types of stones that are jewels, and `stones` representing the stones you have.
4+
Each character in stones is a type of stone you have. You want to know how many of the stones you have are also jewels.
5+
6+
Letters are case-sensitive, so "a" is considered a different type of stone from "A".
7+
8+
## Examples
9+
10+
Example 1:
11+
12+
```text
13+
Input: jewels = "aA", stones = "aAAbbbb"
14+
Output: 3
15+
```
16+
17+
Example 2:
18+
19+
```text
20+
Input: jewels = "z", stones = "ZZ"
21+
Output: 0
22+
```
23+
24+
## Constraints
25+
26+
- 1 <= jewels.length, stones.length <= 50
27+
- jewels and stones consist of only English letters.
28+
- All the characters of jewels are unique.
29+
30+
## Topics
31+
32+
- Hash Table
33+
- String
34+
35+
## Solution
36+
37+
The core intuition behind solving this problem is to treat it as a membership-counting task: we aren’t transforming
38+
either string, we’re simply counting how many characters in stones belong to the set of jewel types in jewels, while
39+
respecting case sensitivity. This maps naturally to a hash-based lookup because it lets us store all jewel types in a
40+
structure that supports fast membership checks. In other words, we treat jewels as an allowlist of valid types and stones
41+
as a stream of items to evaluate. As we scan through stones, we increment a counter whenever the current character appears
42+
in the jewel set. As comparisons are case-sensitive, only exact matches contribute to the final count, which represents
43+
how many of your stones are jewels.
44+
45+
Using the intuition above, we implement the algorithm as follows:
46+
47+
1. Initialize a new set, jewelSet, from the given jewels.
48+
2. Initialize a variable count to 0.
49+
3. Iterate through each character ch in the stones:
50+
- If ch exists in jewelSet:
51+
- Increment count.
52+
4. After successfully iterating through the stones array, return count.
53+
54+
### Time complexity
55+
56+
The time complexity of the solution is O(m+n) because it first builds a set from the m characters in jewels, then scans the
57+
n characters in stones once to count matches.
58+
59+
### Space complexity
60+
61+
The space complexity of the solution is O(m) because it stores up to m unique jewel characters in a set.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
from typing import Set, Counter
2+
from collections import Counter
3+
4+
5+
def num_jewels_in_stones_with_set(jewels: str, stones: str) -> int:
6+
# Store all jewel types for fast membership checking
7+
jewel_set: Set[str] = set(jewels)
8+
9+
# Count how many stones are jewels
10+
count = 0
11+
12+
# Check each stone and increment count if it's a jewel
13+
for ch in stones:
14+
if ch in jewel_set:
15+
count += 1
16+
17+
# Return the total number of jewels found in stones
18+
return count
19+
20+
21+
def num_jewels_in_stones_with_dict(jewels: str, stones: str) -> int:
22+
# Store all jewel types for fast membership checking
23+
stone_counts: Counter[str] = Counter(stones)
24+
25+
# Count how many stones are jewels
26+
count = 0
27+
28+
# Check each stone and increment count if it's a jewel
29+
for jewel in jewels:
30+
if jewel in stone_counts:
31+
count += stone_counts[jewel]
32+
33+
# Return the total number of jewels found in stones
34+
return count
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import unittest
2+
from parameterized import parameterized
3+
from algorithms.hash_table.jewels_and_stones import (
4+
num_jewels_in_stones_with_dict,
5+
num_jewels_in_stones_with_set,
6+
)
7+
8+
JEWELS_AND_STONES_TEST_CASES = [
9+
("pQ", "ppPQQq", 4),
10+
("k", "kkkkK", 4),
11+
("LMn", "lLmMNn", 3),
12+
("cD", "ddddccccDD", 6),
13+
("tRz", "RttZzr", 4),
14+
]
15+
16+
17+
class JewelsAndStonesTestCase(unittest.TestCase):
18+
@parameterized.expand(JEWELS_AND_STONES_TEST_CASES)
19+
def test_num_jewels_in_stones_with_set(
20+
self, jewels: str, stones: str, expected: int
21+
):
22+
actual = num_jewels_in_stones_with_set(jewels, stones)
23+
self.assertEqual(actual, expected)
24+
25+
@parameterized.expand(JEWELS_AND_STONES_TEST_CASES)
26+
def test_num_jewels_in_stones_with_dict(
27+
self, jewels: str, stones: str, expected: int
28+
):
29+
actual = num_jewels_in_stones_with_dict(jewels, stones)
30+
self.assertEqual(actual, expected)
31+
32+
33+
if __name__ == "__main__":
34+
unittest.main()
File renamed without changes.

cryptography/caeser_cipher/__init__.py renamed to algorithms/strings/caeser_cipher/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ def __init__(self, shift):
8282
self.alpha = ascii_uppercase
8383
self.new_alpha = self.alpha[shift:] + self.alpha[:shift]
8484

85-
def encode(self, plaintext):
85+
def encode(self, plaintext: str):
8686
t = plaintext.maketrans(self.alpha, self.new_alpha)
8787
return plaintext.upper().translate(t)
8888

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

datastructures/hashmap/README.md

Lines changed: 181 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,183 @@
11
# HashMap Design
22

3-
Constraints and assumptions
4-
For simplicity, are the keys integers only?
5-
Yes
6-
For collision resolution, can we use chaining?
7-
Yes
8-
Do we have to worry about load factors?
9-
No
10-
Can we assume inputs are valid or do we have to validate them?
11-
Assume they're valid
12-
Can we assume this fits memory?
13-
Yes
3+
Design a HashMap without using any built-in hash table libraries.
4+
5+
Implement the HashMap class:
6+
7+
- HashMap() initializes the object with an empty map.
8+
- void `put(int key, int value)` inserts a (key, value) pair into the HashMap. If the key already exists in the map,
9+
update the corresponding value.
10+
- `int get(int key)` returns the value to which the specified key is mapped, or -1 if this map contains no mapping for
11+
the key.
12+
- `void remove(key)` removes the key and its corresponding value if the map contains the mapping for the key.
13+
14+
## Example
15+
16+
Example 1:
17+
18+
```text
19+
Input
20+
["MyHashMap", "put", "put", "get", "get", "put", "get", "remove", "get"]
21+
[[], [1, 1], [2, 2], [1], [3], [2, 1], [2], [2], [2]]
22+
Output
23+
[null, null, null, 1, -1, null, 1, null, -1]
24+
25+
Explanation
26+
MyHashMap myHashMap = new MyHashMap();
27+
myHashMap.put(1, 1); // The map is now [[1,1]]
28+
myHashMap.put(2, 2); // The map is now [[1,1], [2,2]]
29+
myHashMap.get(1); // return 1, The map is now [[1,1], [2,2]]
30+
myHashMap.get(3); // return -1 (i.e., not found), The map is now [[1,1], [2,2]]
31+
myHashMap.put(2, 1); // The map is now [[1,1], [2,1]] (i.e., update the existing value)
32+
myHashMap.get(2); // return 1, The map is now [[1,1], [2,1]]
33+
myHashMap.remove(2); // remove the mapping for 2, The map is now [[1,1]]
34+
myHashMap.get(2); // return -1 (i.e., not found), The map is now [[1,1]]
35+
```
36+
37+
## Constraints
38+
39+
- 0 <= key, value <= 106
40+
- At most 104 calls will be made to put, get, and remove.
41+
42+
## Topics
43+
44+
- Array
45+
- Hash Table
46+
- Linked List
47+
- Design
48+
- Hash Function
49+
50+
## Solution
51+
52+
A hash map is a fundamental data structure found in various programming languages. Its key feature is facilitating fast
53+
access to a value associated with a given key. Designing an efficient hash map involves addressing two main challenges:
54+
55+
1. **Hash function design**: The hash function serves to map a key to a location in the storage space. A good hash
56+
function ensures that keys are evenly distributed across the storage space, preventing the clustering of keys in
57+
certain locations. This even distribution helps maintain efficient access to stored values.
58+
59+
2. **Collision handling**: Despite efforts to evenly distribute keys, collisions—where two distinct keys map to the same
60+
storage location—are inevitable due to the finite nature of the storage space compared to the potentially infinite
61+
key space. Effective collision-handling strategies are crucial to ensure data integrity and efficient retrieval. To
62+
deal with collisions, we can use methods like chaining, where we link multiple values together at that location, or
63+
open addressing, where we find another empty location for the key.
64+
65+
### Step-by-step solution construction
66+
67+
The first step is to design a hash function using the modulo operator, particularly suitable for integer-type keys.
68+
The modulo operator, denoted by %, is a mathematical operation that returns the remainder of dividing one number by
69+
another. When selecting a modulo base, it’s advisable to choose a prime number. This is because choosing a prime number
70+
as the modulo base helps minimize collisions. Since prime numbers offer better distribution of hash codes, reducing the
71+
likelihood of collisions (where two different keys hash to the same value).
72+
73+
Here’s the implementation of a hash function using a prime number, 2069, as the modulo base. This particular prime number
74+
is likely chosen because it is relatively large, offering a wide range of possible hash codes and reducing the chance
75+
of collisions.
76+
77+
```python
78+
def calculate_hash(key):
79+
key_base = 2069
80+
return key % key_base
81+
82+
def main():
83+
# Example usage:
84+
keys = [1, 2068, 2070]
85+
i = 0
86+
for key in keys:
87+
i+=1
88+
hashed_value = calculate_hash(key)
89+
print(i, ".\tKey:", key)
90+
print("\tHashed value:", hashed_value)
91+
92+
main()
93+
```
94+
95+
```text
96+
1 . Key: 1
97+
Hashed value: 1
98+
99+
2 . Key: 2068
100+
Hashed value: 2068
101+
102+
3 . Key: 2070
103+
Hashed value: 1
104+
```
105+
106+
In the code provided above, collisions occur because when taking the modulo of keys with the base value of 2069, both
107+
keys 1 and 2070 yield the same hash value of 1, leading to a collision.
108+
109+
Now, let’s look at a visual representation of hash collision:
110+
111+
![Solution Key Collision](./images/solutions/hash_map_design_key_collision.png)
112+
113+
In the scenario illustrated in the diagram above, when two distinct keys are assigned to the same address, it results in
114+
a collision. Therefore, the second step is to handle collision by using a storage space where each element is indexed by
115+
the output of the hash function. To address this, we use a container, bucket, designed to store all values that are
116+
assigned the same hash value by the hash function.
117+
118+
Let’s look at the diagram below to visualize collision handling through the use of buckets:
119+
120+
![Solution Key Collision Buckets](./images/solutions/hash_map_design_key_collision_with_buckets.png)
121+
122+
Now, let’s design a Bucket for collision handling supporting primary operations: Get, Update, and Remove. These operations
123+
allow for efficient management of key-value pairs within each bucket, accommodating cases where multiple keys hash to
124+
the same index.
125+
126+
- **Get(key)**: Searches the bucket for a key-value pair where the key matches the provided argument. If such a pair is
127+
found, the method returns the corresponding value. If the key does not exist within the bucket, the method returns
128+
−1. This functionality is crucial for retrieval operations in a hash table, allowing for efficient access to stored
129+
data based on keys.
130+
- **Update(key, value)**: Looks for the specified key in the bucket. If the key is found, the method updates the existing
131+
key-value pair with the new value provided. If the key is not found, the method adds a new key-value pair to the bucket.
132+
This dual functionality ensures that the bucket can dynamically adjust to changes in data, either by updating existing
133+
entries or adding new ones to accommodate new keys.
134+
- **Remove(key)**: Searches the bucket for a key-value pair matching the specified key. If such a pair is found, the
135+
method removes it from the bucket, effectively handling the deletion of entries.
136+
137+
Collision handling occurs implicitly within the Update function of the Bucket. It effectively handles collisions by
138+
allowing multiple key-value pairs with the same hash value (i.e., the same bucket index) to coexist within the bucket.
139+
140+
Moving forward, the third step involves designing a hash map by utilizing the hash function and the Bucket designed earlier.
141+
142+
To design a hash map, the core operation involves locating stored values by key. Therefore, for each hash map method—
143+
Get, Put, and Remove—the primary task revolves around locating stored values by key. This process involves two steps:
144+
145+
1. Applying the hash function to generate a hash key for a given key value, determining the address in the main storage
146+
and finding the corresponding bucket.
147+
2. Iterating through the bucket to check if the desired key-value pair exists.
148+
149+
![Solution 1](./images/solutions/hash_map_design_solution_1.png)
150+
![Solution 2](./images/solutions/hash_map_design_solution_2.png)
151+
![Solution 3](./images/solutions/hash_map_design_solution_3.png)
152+
![Solution 4](./images/solutions/hash_map_design_solution_4.png)
153+
![Solution 5](./images/solutions/hash_map_design_solution_5.png)
154+
![Solution 6](./images/solutions/hash_map_design_solution_6.png)
155+
![Solution 7](./images/solutions/hash_map_design_solution_7.png)
156+
![Solution 8](./images/solutions/hash_map_design_solution_8.png)
157+
![Solution 9](./images/solutions/hash_map_design_solution_9.png)
158+
![Solution 10](./images/solutions/hash_map_design_solution_10.png)
159+
![Solution 11](./images/solutions/hash_map_design_solution_11.png)
160+
161+
### Solution Summary
162+
163+
1. Choose a prime number for the key space size (preferably a large one).
164+
2. Create an array and initialize it with empty buckets equal to the key space size.
165+
3. Generate a hash key by taking the modulus of the input key with the key space size.
166+
4. Implement the following functions:
167+
- Put(key, value): Inserts the value into the bucket at the computed hash key index
168+
- Get(key): Searches for the key in the bucket and returns the associated value
169+
- Remove(key): Deletes the element at the specified key from the bucket and the hash map
170+
171+
### Time Complexity
172+
173+
Each method of the hash map has a time complexity of O(N/K), where N represents the total number of possible keys, and
174+
K represents the key space size, which in our case is 2069.
175+
176+
In an ideal scenario with evenly distributed keys, the average size of each bucket can be considered as N/K. However, in
177+
the worst-case scenario, we may need to iterate through an entire bucket to find the desired value, resulting in a time
178+
complexity of O(N) for each method.
179+
180+
### Space Complexity
181+
182+
The space complexity is O(K+M), where K denotes the key space size, and M represents the number of unique keys that have
183+
been inserted into the hashmap.

0 commit comments

Comments
 (0)