Hash-Table-Implementation/description.md at main · copela99/Hash-Table-Implementation

Due: Thursday, November 18th @ 8:00p ET

This is not a team project. Do not copy someone else’s work.

Assignment Overview

Hash Tables are a very powerful data structure that are best known for their ability to insert, delete, and lookup in O(1) time. This allows them to be very powerful in storing data that needs to be accessed quickly. Other data structures we have explored, such as Linked Lists (O(n) lookup and deletion) and AVL Trees (log(n) lookup, insertion, and deletion) lack that O(1) ability accross the board.

A lot of you may already be familiar with the concept of hash tables, as they are implemented in Python as dictionaries and C++ as unordered maps.

In this project, you will be implementing a Hash Table from scratch in Python and applying it to an application problem.

Assignment Notes

The use of a Python dictionary or set results in a grade of 0 for the project!
In addition,the only python container/collection type you can use is the built in list class (no linked lists, queues, etc.)
We are going to have you use many of pythons built in "magic methods" in this project. A magic method is one that has the two underscores on the front and the back, such as __len__. In this project, these "magic methods" won't be doing much, they will call the other protected methods that you write! Seriously they should not be more than a few lines.
So, what are "protected methods"? Protected methods are methods prefaced with a single underscore, such as a function called "_insert". Protected methods are meant to only be called inside other functions in the class. This is Pythons way of implementing the C++ equivilant of "public" and "private" - protected methods meant to be treated as private!
Building on the above point, all attributes/functions that are protected (that is, leading with an underscore) should not be called outside of your class, which means they should not be accessed in your application problem!
Use of _hash(), _insert(), _delete(), _get(), and _grow() is STRICTLY FORBIDDEN in the application!!! This will result in a 10 point deduction.
Calling magic methods with the syntax table.__len__() instead of len( table), or the equivalent special syntax for other magic methods, will result in a 2 point deduction each time this is done up to 10 total points.
We have very small test cases for the _insert(), _get(), and _delete() functions. The purpose is to make sure you split the work between the magic and hidden methods appropriately. The majority of the testing will take place in the magic method implementations!
If you inspect _hash_1 and _hash_2 you will see that they depend on the size of the string. For the purposes of this assignment, treat these as taking O(1) (constant) time.
A few guarentees:
1. Capacity will not grow past ~1000
2. All keys will be of type string

Here is an table that shows how private methods and magic methods relate to each other:

Assignment Specifications

class HashNode:

DO NOT MODIFY the following attributes/functions

Attributes
- key: str: The key of the hash node (this is what is used in hashing)
- value: T: Value being held in the node. Note that this may be any type, such as a str, int, float, dict, or a more complex object.
- deleted: bool: Whether or not the node has been deleted.
__init__(self, key: str, value: T, deleted: bool = False) -> None
- Constructs a hash node.
- key: str: The key of the hash node.
- value: T: Value being held in the node.
- deleted: bool: Whether or not the node has been deleted. Defaults to false.
- Returns: None.
__str__(self) -> str and __repr__(self) -> str
- Represents the Node as a string.
- Returns: str representation of node
__eq__(self, other: HashNode) -> bool
- Compares to see if two hash nodes are equal
- other: HashNode: The HashNode we are comparing against
- Returns: bool stating whether or not they are equal
__iadd__(self, other: T) -> bool
- Adds to the value of the current HashNode
- other: T: The value we are adding to our current value
- Returns: None

class HashTable:

DO NOT MODIFY the following attributes/functions

Attributes (you may edit the values of attributes but do not remove them)
- capacity: int: Capacity of the hash table.
- size: int: Current number of nodes in the hash table.
- table: List: This is where the actual data for our hash table is stored
- prime_index: int: Current index of the prime numbers we are using in _hash_2()
primes
- This is a list of all the prime numbers, from 2 until 1000, used for _hash_2(). This is a class attribute, so it is accesed by HashTable.primes, NOT self.primes()!
__init__(self, capacity: int = 8) -> None
- Construct an empty hash table, with the capacity as specified in the input
- capacity: int:
- Returns: None.
__str__(self) -> str and __repr__(self) -> str
- Represents the HashTable as a string.
- Returns: str.
__eq__(self, other: HashTable) -> bool
- Checks if two HashTables are equal
- other: HashTable: the hashtable we are comparing against
- Returns: bool stating whether or not they are equal
_hash_1(self, key: str) -> int
- The first of the two hash functions used to turn a key into a bin number
- Assume this is O(1) time/space complexity
- key: str: key we are hashing
- Returns: int that is the bin number
_hash_2(self, key: str) -> int
- The second of the two hash functions used to turn a key into a bin number. This hash function acts as the tie breaker.
- Assume this is O(1) time/space complexity
- key: str: key we are hashing
- Returns: int that is the bin number

IMPLEMENT the following functions

__len__(self) -> int
- Getter for the size (that, is, the number of elements) in the HashTable
- Time Complexity: O(1)
- Space Complexity: O(1)
- Returns: int that is size of hash table
__setitem__(self, key: str, value: T) -> None
- Sets the value with an associated key in the HashTable
  - This should be a short, ~1 line function- the majority of the work should be done in the _insert() method!
- Time Complexity: O(1)*
- Space Complexity: O(1)*
- key: str: The key we are hashing
- value: T: The associated value we are storing
- Returns: None
__getitem__(self, key: str) -> T
- Looks up the value with an associated key in the HashTable
  - **If the key does not exist in the table, raise a KeyError **
  - This should be a short, ~3 line function- the majority of the work should be done in the _get() method!
- Time Complexity: O(1)*
- Space Complexity: O(1)
- key: str: The key we are seraching for the associated value of
- Returns: The value with an associated Key
__delitem__(self, key: str) -> None
- Deletes the value with an associated key in the HashTable
  - **If the key does not exist in the table, raise a KeyError **
  - This should be a short, ~3 line function- the majority of the work should be done in the _get() and _delete() methods!
- Time Complexity: O(1)*
- Space Complexity: O(1)
- key: str: The key we are deleting the associated value of
- Returns: None
__contains__(self, key: str) -> bool
- Determines if a node with the key denoted by the parameter exists in the table
  - This should be a short, ~3 line function- the majority of the work should be done in the _get() method!
- Time Complexity: O(1)*
- Space Complexity: O(1)
- key: str: The key we are checking to be a part of the hash table
- Returns: None
_hash(self, key: str, inserting: bool = False) -> int\
- Given a key string return an index in the hash table.
- Should implement probing with double hashing.\
  - If the key exists in the hash table, return the index of the existing HashNode
  - If the key does not exist in the hash table, return the index of the next available empty position in the hash table.
    - Collision resolution should implement double hashing with hash1 as the initial hash and hash2 as the step size
  - Note - There are 2 possibilities when hashing for an index:
    - When inserting a node into the hash table we want to insert into the next available bin. \
    - When performing a lookup/deletion in the hash table we want to continue until we either find the proper HashNode or until we reach a bin that has never held a value. This is to preserve the collison resolution methodology.
    - The inserting parameter should be used to differentiate between these two cases.
- Time Complexity: O(1)*
- Space Complexity: O(1)
- key: str: The key being used in our hash function
- inserting: bool: Whether or not we are doing an insertion. Important for the reasons described above.
- Returns: int that is the bin we hashed into
_insert(self, key: str, value: T) -> None
- Use the key and value parameters to add a HashNode to the hash table.\
  - If the key exists, overwrite the existing value
  - In the event that inserting causes the table to have a load factor of 0.5 or greater you must grow the table to double the existing capacity. This should use the _grow method.
- Time Complexity: O(1)*
- Space Complexity: O(1)*
- key: str: The key associated with the value we are storing
- value: T: The associated value we are storing
- Returns: None
_get(self, key: str) -> HashNode
- Find the HashNode with the given key in the hash table.\
  - If the element does not exist, return None
- Time Complexity: O(1)*
- Space Complexity: O(1)
- key: str: The key we looking up
- Returns: HashNode with the key we looked up
_delete(self, key: str) -> None
- Removes the HashNode with the given key from the hash table .
  - If the node is found assign its key and value to None, and set the deleted flag to True
- Time Complexity: O(1)*
- Space Complexity: O(1)
- key: str: The key of the Node we are looking to delete
- Returns: None
_grow(self) -> None
- Double the capacity of the existing hash table.
  - Do NOTrehash deleted HashNodes
  - Must update self.prime_index, the value of self.prime_index should be the index of the largest prime smaller than self.capacity in the HashTable.primes tuple.\
- Time Complexity: O(N)
- Space Complexity: O(N)
- Returns: None
update(self, pairs: List[Tuple[str, T]] = []) -> None
- Updates the hash table using an iterable of key value pairs
  - If the value already exists, update it, otherwise enter it into the table\
- Time Complexity: O(M)*, where M is length of pairs
- Space Complexity: O(M)
- pairs: List[Tuple[str, T]]: list of tuples (key, value) being updated
- Returns: None
keys(self) -> List[str]
- Makes a list that contains all of the keys in the table
  - Order does not matter!
- Time Complexity: O(N)*
- Space Complexity: O(N)
- Returns: List of the keys
values(self) -> List[T]
- Makes a list that contains all of the values in the table
  - Order does not matter!
- Time Complexity: O(N)*
- Space Complexity: O(N)
- Returns: List of the values
items(self) -> List[Tuple[str,T]]
- Makes a list that contains all of the key value pairs in the table
  - Order does not matter!
- Time Complexity: O(N)*
- Space Complexity: O(N)
- Returns: List of Tuples of the form (key, value)
clear(self) -> None
- Should clear the table of HashNodes completely, in essence a reset of the table
  - Should not modify capacity
  - Notice the O(1) space complexity - this must be done in place!
- Time Complexity: O(N)
- Space Complexity: O(1)
- Returns: None

*Expected Complexity

Application: Detecting Duplicate Network Packets

You work for a unicorn startup with a subscription service which purchases gourmet meals from the finest restaurants in the world, cuts them up into portions for your pets, flies them in by private jet, and conveniently delivers them straight to your door three times a day for only $25.99 per month.

Specifically, you work on the service that signs customers up for recurring billing when they create an account. Each time a new customer is signed up, the signup service sends a message to your service over the network, and will resend this message periodically until a confirmation is received to make sure no one gets access to this premium service without being charged.

One day, you get a message from your boss. The accounting department has determined that many customers are being billed twice or more each month. With your company quickly burning through its Series J funding round, and the customers unable to notice this amongst the charges for all of their other subscription services, everyone is quite pleased about your double billing feature and you are getting a bonus stock grant. Not having taken CSE 300, and looking forward to the upcoming IPO, this whole situation seems pretty good and you're pretty happy about it.

To refine this feature and boost the double-billing rate, you investigate and find that the confirmation message from the signup service to your service is often getting lost, resulting in your billing service receiving several messages to sign up the same customers for recurring charges. You think back to your time in 331 and get curious about how you might efficiently prevent this situation if you actually wanted to, and decide to write up a solution just for fun.

For your application problem, you will create a class called ExecuteOnlyOnce

Its constructor takes one parameter, max_time, whose purpose is explained below, you will need to modify this constructor to set up the class with whatever data structures you need.

When the method, handle_request is called with a timestamp, client ID and request ID, you must respond with

0 if the request has not been seen before from this client,

or otherwise respond with the number of times the request has been seen before from this client.

If the first time some message was seen is more than max_time time units in the past ( assume max_time and the time argument are in the same units), treat it as unseen and reset the count for the number of times it has been seen. This is because IDs might be reused eventually after enough time goes by.

Guarantees:

Requests will be sent to your function in order of timestamp. Your handle_request method will never be called on the same object with an earlier timestamp than what has already been passed to that method of that object.
If you receive two requests from the same client with the same ID within max_time of each other, they will always be repeated attempts for the same request. If two requests from the same client with the same ID show up more than max_time apart they will always be unique requests where the ID had to be reused.
Both the client_id and request_id are comprised of only alphanumeric characters.

Illustration

As the image shows, requests with a request ID and client ID are sent from the client, and these will be assigned a timestamp when received. Since the acknowledgment message may never be received by the client, in this case the signup service, the same request may be sent more than once and received by the server repeatedly. The server attempts to let the client know how many times the request was received so the client can adapt to the network conditions by retrying more or less aggressively.

Examples

Example 1

handler = Execute_only_once(max_time=5)
handler.handle_request(0, "A", "A") # Returns 0
handler.handle_request(1, "A", "A") # Returns 1
handler.handle_request(2, "A", "A") # Returns 2
handler.handle_request(12, "A", "A") # Returns 0, because max time exceeded since first request.

Example 2

handler = ExecuteOnlyOnce(max_time=4)
handler.handle_request(0, "A", "A") # Returns 0
handler.handle_request(1, "B", "B") # Returns 0
handler.handle_request(2, "A", "A") # Returns 1
handler.handle_request(3, "B", "B") # Returns 1
handler.handle_request(15, "B", "B") # Returns 0, due to max_time being exceeded since first request.

Example 3

handler = ExecuteOnlyOnce(max_time=6)
handler.handle_request(0, "A", "A") # Returns  0
handler.handle_request(0, "B", "A") # Returns  0
handler.handle_request(1, "A", "B") # Returns  0
handler.handle_request(2, "A", "B") # Returns  1
handler.handle_request(2, "A", "B") # Returns  2
handler.handle_request(3, "B", "A") # Returns  1
Returns 2, stops being the same request when more than max_time has gone by

handler.handle_request(6, "A", "A")
handler.handle_request(7, "A", "B") # Returns  3
handler.handle_request(13, "A", "A") # Returns 0

class ExecuteOnlyOnce:

__init__(self, max_time: int):
- Design your data structure here, and make sure to store max_time
handle_request(time: int, id: str) -> int:
- Return the number of times this request has been seen already
- If the first time the request was seen was more than max_time ago, treat it as unseen and treat this as the first time you have seen it.
- Time complexity: O(1)
- The total space complexity of the class should be O(n) where n is the number of times handle_request has been called.

REMEMBER THAT HASHTABLE VALUES CAN BE ANY TYPE
Use of _hash(), _insert(), _delete(), _get(), and _grow() is STRICTLY FORBIDDEN in the application!!!

Submission

Deliverables

Be sure to upload the following deliverables in a .zip folder to Mimir by 8pm ET on Thursday, November 18th.

Project4.zip
    |— Project4/
        |— README.xml      (for project feedback)
        |— __init__.py     (for proper Mimir testcase loading)
        |— hashtable.py    (contains your solution source code)

Grading

Tests (70)
- Coding Standard: __/5
- README.xml Validity Check: __/5
- hashtable: __/45
  - _hash: __/7
  - _insert: __/1
  - _get: __/1
  - _delete: __/1
  - __len__: __/1
  - __setitem__: __/4
  - __getitem__: __/4
  - __delitem__: __/4
  - __contains__: __/3
  - update: __/3
  - keys/values/items: __/6
  - clear: __/2
  - comprehensive: __/8
- ExecuteOnlyOnce: __/15
  - Application Basic: 5
  - Application Comprehensive: 10
Manual (30)
- Manual (30)
  - Time and space complexity points are all-or-nothing for each function. If you fail to meet time or space complexity in a given function, you do not receive manual points for that function.
  - Up to 3 points deduction for missing docstrings across the full project
- hashtable time/space: __/24
  - _hash: __/3
  - __len__: __/1
  - __setitem__: __/3
  - __getitem__: __/3
  - __delitem__: __/3
  - __contains__: __/2
  - _grow: __/3
  - update: __/2
  - keys/values/items: __/2
  - clear: __/2
- ExecuteOnlyOnce time/space: __/6

Appendix

Authors

Project developed by Alex Woodring, Joseph Pallipadan, and Zach Matson.

Adapted from the work of Brandon Field, Yash Vesikar, Ian Barber, and Max Huang.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assignment Overview

Assignment Notes

Assignment Specifications

class HashNode:

class HashTable:

Application: Detecting Duplicate Network Packets

Returns 2, stops being the same request when more than max_time has gone by

Submission

Deliverables

Grading

Appendix

Authors

FilesExpand file tree

description.md

Latest commit

History

description.md

File metadata and controls

Assignment Overview

Assignment Notes

Assignment Specifications

class HashNode:

class HashTable:

Application: Detecting Duplicate Network Packets

Returns 2, stops being the same request when more than max_time has gone by

Submission

Deliverables

Grading

Appendix

Authors