Skip to content

Commit 5181b3b

Browse files
committed
refactor(datastructures, lfu-cache): v2 of cache
1 parent 43afedd commit 5181b3b

17 files changed

+390
-140
lines changed

datastructures/lfucache/README.md

Lines changed: 72 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,75 @@ To determine the least frequently used key, a use counter is maintained for each
1717
smallest use counter is the least frequently used key.
1818

1919
When a key is first inserted into the cache, its use counter is set to 1 (due to the put operation). The use counter
20-
for a key in the cache is incremented either a get or put operation is called on it.
20+
for a key in the cache is incremented either a get or put operation is called on it.
21+
22+
## Solution
23+
24+
The LFU cache algorithm tracks how often each key is accessed to determine which keys to remove when the cache is full.
25+
It uses one hash map to store key-value pairs and another to group keys by their access frequency. Each group in this
26+
frequency hash map contains nodes arranged in a doubly linked list. Additionally, it keeps track of the current least
27+
frequency to quickly identify the least used keys. When the cache reaches its limit, the key with the lowest frequency
28+
is removed first, specifically from the head of the corresponding linked list.
29+
30+
Each time a key is accessed, its frequency increases, and its position in the frequency hash map is updated, ensuring
31+
that the least used keys are prioritized for removal. This is where the doubly linked list is helpful, as the node being
32+
updated might be located somewhere in the middle of the list. Shifting the node to the next frequency level can be done
33+
in constant time, making the update process efficient.
34+
35+
Let’s discuss the algorithm of the LFU cache data structure in detail. We maintain two hash maps, `lookup` and `frequencyMap`,
36+
and an integer, `minimum_frequency`, as follows:
37+
38+
- `lookup` keeps the key-node pairs.
39+
- The node contains three values: `key`, `value`, and `frequency`.
40+
41+
- `frequencyMap` maintains doubly linked lists against every frequency existing in the data.
42+
- For example, all the keys that have been accessed only once reside in the double linked list stored at `frequencyMap[1]`,
43+
all the keys that have been accessed twice reside in the double linked list stored at `frequencyMap[2]`, and so on.
44+
45+
- `minimum_frequency` keeps the frequency of the least frequently used key.
46+
47+
Apart from the required functions i.e., Get and Put, we implement a helper function, `PromoteKey` that helps us maintain
48+
the order of the keys with respect to the frequency of their use. This function is implemented as follows:
49+
50+
- First, retrieve the node associated with the key.
51+
- If node's `frequency` is 0, the key is new. We simply increment its `frequency` and insert it at the tail of the
52+
linked list corresponding to the frequency 1
53+
- Otherwise, detach the `node` from its corresponding linked list.
54+
- If the corresponding linked list becomes empty after detaching the node, and the node’s `frequency` equals `minimum_frequency`,
55+
there's no key left with a frequency equal to `minimum_frequency`. Hence, increment `minimum_frequency`.
56+
- Increment `frequency` of the key
57+
- Insert node at the tail of the linked list associated with the frequency corresponding to the updated `frequency`.
58+
- Before inserting it, check if the linked list exists. Suppose it doesn’t, create one.
59+
60+
After implementing `PromoteKey()`, the LFU cache functions are implemented as follows:
61+
- `Get`: We check if the key exists in the cache.
62+
- If it doesn't, we return `None`
63+
- Otherwise, we promote the key using `PromoteKey()` function and return the value associated with the key.
64+
- `Put`: We check if the key exists in the cache.
65+
- If it doesn't, we must add this (key, value) pair to our cache.
66+
- Before adding it, we check if the cache has already reached capacity. If it has, we remove the LFU key. To do that,
67+
we remove the head node of the linked list accociated with the frequency equal to `minimum_frequency`.
68+
- If it does, we simply update key with the value.
69+
- At the end of both steps, we adjust the frequency order of the key using `PromoteKey()`.
70+
71+
![Solution 1](./images/solutions/lfu_cache_solution_1.png)
72+
![Solution 2](./images/solutions/lfu_cache_solution_2.png)
73+
![Solution 3](./images/solutions/lfu_cache_solution_3.png)
74+
![Solution 4](./images/solutions/lfu_cache_solution_4.png)
75+
![Solution 5](./images/solutions/lfu_cache_solution_5.png)
76+
![Solution 6](./images/solutions/lfu_cache_solution_6.png)
77+
![Solution 7](./images/solutions/lfu_cache_solution_7.png)
78+
![Solution 8](./images/solutions/lfu_cache_solution_8.png)
79+
![Solution 9](./images/solutions/lfu_cache_solution_9.png)
80+
![Solution 10](./images/solutions/lfu_cache_solution_10.png)
81+
82+
### Time Complexity
83+
84+
The time complexity of `PromoteKey()` is `O(1)` because the time taken to detach a node from a doubly linked list and
85+
insert a node at the tail of a linked list is `O(1)`. The time complexity of both Put and Get functions is `O(1)` because
86+
they utilize `PromoteKey()` and some other constant time operations.
87+
88+
### Space Complexity
89+
90+
The space complexity of this algorithm is linear, `O(n)`, where `n` refers to the capacity of the data structure. This
91+
is the space occupied by the hash maps.
Lines changed: 4 additions & 126 deletions
Original file line numberDiff line numberDiff line change
@@ -1,127 +1,5 @@
1-
from collections import defaultdict
2-
from typing import Any, Union, Dict
1+
from datastructures.lfucache.lfu_cache_node import LfuCacheNode
2+
from datastructures.lfucache.lfu_cache import LFUCache
3+
from datastructures.lfucache.lfu_cache_v2 import LFUCacheV2
34

4-
from datastructures.linked_lists.doubly_linked_list import DoublyLinkedList
5-
from datastructures.linked_lists.doubly_linked_list.node import DoubleNode
6-
7-
8-
class LfuCacheNode(DoubleNode):
9-
def __init__(self, data):
10-
super().__init__(data)
11-
self.frequency = 1
12-
13-
14-
class LFUCache:
15-
def __init__(self, capacity: int):
16-
"""
17-
Initializes an instance of a LFUCache
18-
@param capacity: Capacity of the cache
19-
@type capacity int
20-
21-
1. Dict named node self._lookup for retrieval of all nodes given a key. O(1) time to retrieve a node given a key
22-
2. Each frequency has a DoublyLinkedList stored in self._frequency where key is the frequency and value is an
23-
object of DoublyLinkedList
24-
3. minimum frequency through all nodes, this can be maintained in O(1) time, taking advantage of the fact that
25-
the frequency can only increment by 1. use the following 2 rules:
26-
i. Whenever we see the size of the DoublyLinkedList of current min frequency is 0, increment min_frequency
27-
by 1
28-
ii. Whenever we put in a new (key, value), the min frequency must be 1 (the new node)
29-
"""
30-
self.capacity = capacity
31-
self._current_size = 0
32-
self._lookup = dict()
33-
self._frequency: Dict[int, DoublyLinkedList] = defaultdict(DoublyLinkedList)
34-
self._minimum_frequency = 0
35-
36-
def __update(self, node: LfuCacheNode):
37-
"""
38-
Helper function used in 2 cases:
39-
1. When get(key) is called
40-
2. When put(key, value) is called and key exists
41-
42-
Common point of the 2 cases:
43-
1. no new node comes in
44-
2. node is visited one more time -> node.frequency changed -> thus the place of this node will change
45-
46-
Logic:
47-
1. Pop node from 'old' DoublyLinkedList with frequency
48-
2. Append node to 'new' DoublyLinkedList with frequency + 1
49-
3. If 'old' DoublyLinkedList has size 0 & self.minimum_frequency is frequency, update self.minimum_frequency
50-
to frequency + 1
51-
52-
Complexity Analysis:
53-
Time Complexity: O(1) time
54-
55-
@param node: Node to update in the Cache
56-
@type node LfuCacheNode
57-
"""
58-
frequency = node.frequency
59-
60-
# pop the node from the 'old' DoublyLinkedList
61-
self._frequency[frequency].delete_node(node)
62-
63-
if self._minimum_frequency == frequency and not self._frequency[frequency]:
64-
self._minimum_frequency += 1
65-
66-
node.frequency += 1
67-
frequency = node.frequency
68-
69-
# add to 'new' DoublyLinkedList with new frequency
70-
self._frequency[frequency].prepend(node)
71-
72-
def get(self, key: int) -> Union[Any, None]:
73-
"""
74-
Gets an item from the Cache given the key
75-
@param key: Key to use to fetch data from Cache
76-
@return: Data mapped to the key
77-
"""
78-
if key not in self._lookup:
79-
return None
80-
81-
node = self._lookup[key]
82-
data = node.data
83-
self.__update(node)
84-
return data
85-
86-
def put(self, key: int, value: Any) -> None:
87-
"""
88-
If key is already present in the self._lookup, we perform same operations as get, except updating the node data
89-
to new value
90-
91-
Otherwise, below operations are performed:
92-
1. If cache reaches capacity, pop least frequently used item.
93-
2 Facts:
94-
a. we maintain self._minimum_frequency, minimum possible frequency in cache
95-
b. All cache with the same frequency are stored as a DoublyLinkedList, with recently used order (Always
96-
append to head).
97-
98-
Consequence is that the tail of the DoublyLinkedList with self._minimum_frequency is the least recently used
99-
one, pop it.
100-
101-
2. Add new node to self._lookup
102-
3. add new node to DoublyLinkedList with frequency of 1
103-
4. reset minimum_frequency to 1
104-
105-
@param key: Key to use for lookup
106-
@param value: Value to store in the cache
107-
@return: None
108-
"""
109-
110-
if self.capacity == 0:
111-
return None
112-
113-
if key in self._lookup:
114-
node = self._lookup[key]
115-
self.__update(node)
116-
node.data = value
117-
else:
118-
if self._current_size == self.capacity:
119-
node = self._frequency[self._minimum_frequency].pop()
120-
self._lookup.pop(node.key)
121-
self._current_size -= 1
122-
123-
node = DoubleNode(data=value, key=key)
124-
self._lookup[key] = node
125-
self._frequency[1].append(node)
126-
self._minimum_frequency = 1
127-
self._current_size += 1
5+
__all__ = ["LFUCache", "LFUCacheV2", "LfuCacheNode"]
18.9 KB
Loading
33.5 KB
Loading
26.4 KB
Loading
46.9 KB
Loading
36.6 KB
Loading
28.9 KB
Loading
39.1 KB
Loading
39.8 KB
Loading

0 commit comments

Comments
 (0)