Hash Tables

DIY: Building a Custom HashSet from Scratch

Introduction

Welcome back! Now that we have covered how to implement a Hash Table, it’s time to dive into another essential data structure: the Hash Set. While both structures share similarities, the Hash Set is unique in that it only stores keys without associated values. Let’s explore how to implement a Hash Set from scratch in Java.

Key Differences Between Hash Table and Hash Set

Before we start, let's clarify the major differences between a Hash Table and a Hash Set:
Hash Table: Stores key-value pairs, allowing retrieval of values based on keys.
Hash Set: Stores only keys, ensuring each key is unique. It does not store or return values.

Step-by-Step Implementation

To implement a Hash Set, we will adapt our Hash Table implementation to ignore values and focus solely on keys.

1. Create the HashSet Class

First, we will create a HashSet class. This class will include an inner Node class to represent individual entries.
class HashSet:
class Node:
key

constructor Node(key):
this.key = key

buckets // Array of linked lists
bucketSize
size

constructor HashSet():
bucketSize = 4
initialize buckets with empty linked lists of size bucketSize
size = 0
Explanation:
Node Class: Represents each entry in the HashSet. Each node contains only a key since HashSets do not store values.
buckets: An array of linked lists to handle collisions.
bucketSize: The initial number of buckets (array size).
size: Tracks the number of elements in the HashSet.
Constructor: Initializes the bucket array with a default size and sets up empty linked lists for each bucket.

2. Implementing the Hash Function

We need a hash function to determine the bucket index for each key. This function will convert the key into a bucket index using its hash code.
function hashFunction(key):
hashCode = key.hashCode()
return absolute value of (hashCode % bucketSize)
Explanation:
hashFunction: Converts the key to a hash code using the key's hashCode method.
absolute value: Ensures the hash code is non-negative.
% bucketSize: Ensures the hash code fits within the bounds of our bucket array.

3. Adding Keys to the HashSet

Next, we implement the add method to insert keys into the HashSet. This method will ensure no duplicate keys are stored.
function add(key):
bucketIndex = hashFunction(key)
bucket = buckets[bucketIndex]

for node in bucket:
if node.key == key:
return // Key already exists

add new Node(key) to bucket
size++

if (size / bucketSize) > 2:
rehash()

Explanation:
bucketIndex: Determines the appropriate bucket for the key using the hash function.
bucket: Fetches the linked list at the calculated bucket index.
for loop: Checks if the key already exists in the bucket to avoid duplicates.
add new Node: Adds a new node with the key to the linked list if the key does not exist.
size++: Increments the size of the HashSet.
if condition: Checks the load factor (number of elements divided by the number of buckets). If it exceeds a threshold (2 in this case), the rehash method is called to resize the buckets array.

4. Checking for Key Existence

The contains method will check if a key is present in the HashSet.
function contains(key):
bucketIndex = hashFunction(key)
bucket = buckets[bucketIndex]

for node in bucket:
if node.key == key:
return true

return false

Explanation:
bucketIndex: Determines the bucket for the key using the hash function.
bucket: Fetches the linked list at the calculated bucket index.
for loop: Iterates over the linked list to check if the key exists.
return true: Returns true if the key is found.
return false: Returns false if the key is not found after iterating through the bucket.

5. Removing Keys from the HashSet

We also need a method to remove keys from the HashSet.
function remove(key):
bucketIndex = hashFunction(key)
bucket = buckets[bucketIndex]

targetNode = null
for node in bucket:
if node.key == key:
targetNode = node
break

if targetNode != null:
remove targetNode from bucket
size--

Explanation:
bucketIndex: Determines the bucket for the key using the hash function.
bucket: Fetches the linked list at the calculated bucket index.
for loop: Iterates over the linked list to find the node with the matching key.
targetNode: Stores the node to be removed if found.
remove targetNode: Removes the node from the linked list if it exists.
size--: Decrements the size of the HashSet.

6. Rehashing for Dynamic Size Management

To handle dynamic size management, we implement the rehash method.
function rehash():
oldBuckets = buckets
bucketSize *= 2
initialize new buckets with empty linked lists of size bucketSize

size = 0
for bucket in oldBuckets:
for node in bucket:
add(node.key)

Explanation:
oldBuckets: Stores the current buckets array.
bucketSize = 2: Doubles the size of the buckets array.
initialize new buckets: Creates a new array of linked lists with the updated size.
for loop: Initializes each index in the new bucket array with an empty linked list.
size = 0: Resets the size to 0 because we will re-add all elements.
nested for loop: Iterates over the old buckets and re-adds each key to the new buckets array using the add method, ensuring they are rehashed correctly.

Complete Pseudo Code

class HashSet:
class Node:
key

constructor Node(key):
this.key = key

buckets // Array of linked lists
bucketSize
size

constructor HashSet():
bucketSize = 4
initialize buckets with empty linked lists of size bucketSize
size = 0

function hashFunction(key):
hashCode = key.hashCode()
return absolute value of (hashCode % bucketSize)

function add(key):
bucketIndex = hashFunction(key)
bucket = buckets[bucketIndex]

for node in bucket:
if node.key == key:
return // Key already exists

add new Node(key) to bucket
size++

if (size / bucketSize) > 2:
rehash()

function contains(key):
bucketIndex = hashFunction(key)
bucket = buckets[bucketIndex]

for node in bucket:
if node.key == key:
return true

return false

function remove(key):
bucketIndex = hashFunction(key)
bucket = buckets[bucketIndex]

targetNode = null
for node in bucket:
if node.key == key:
targetNode = node
break

if targetNode != null:
remove targetNode from bucket
size--

function rehash():
oldBuckets = buckets
bucketSize *= 2
initialize new buckets with empty linked lists of size bucketSize

size = 0
for bucket in oldBuckets:
for node in bucket:
add(node.key)

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.