Skip to content

rborale5/consistent-hashing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Consistent Hashing

A Go implementation of consistent hashing with bounded load balancing support.

Overview

Consistent hashing is a distributed hashing scheme that minimizes key remapping when the number of nodes changes. This implementation uses virtual nodes (replicas) to ensure even distribution of keys across hosts.

Features

  • Consistent hashing ring with configurable virtual nodes (replicas)
  • Bounded load balancing - prevents any single host from being overloaded
  • Thread-safe - all operations are protected by read-write locks
  • Fast hashing - uses BLAKE2b-SIMD for high-performance hashing
  • Dynamic cluster management - add and remove hosts at runtime

Installation

go get github.com/rborale5/consistent-hashing

Quick Start

package main

import (
    "fmt"
    consistent "github.com/rborale5/consistent-hashing"
)

func main() {
    // Create a new consistent hash ring
    c := consistent.New()

    // Add hosts to the ring
    c.Add("server-1")
    c.Add("server-2")
    c.Add("server-3")

    // Get the host for a key
    host, err := c.Get("my-key")
    if err != nil {
        panic(err)
    }
    fmt.Printf("Key 'my-key' maps to: %s\n", host)
}

API Reference

Creating a Ring

// Create with default replica factor (20)
c := consistent.New()

// Create with custom replica factor
c := consistent.NewWithConfig(50)

Managing Hosts

// Add a host
c.Add("server-1")

// Remove a host
removed := c.Remove("server-1") // returns true if host existed

// Get all hosts
hosts := c.GetHosts() // []string{"server-1", "server-2", ...}

// Get host count
count := c.Hosts() // int

Key Lookup

// Get host for a key (basic consistent hashing)
host, err := c.Get("user-123")

// Get host with bounded load (prevents overloading)
host, err := c.GetLeast("user-123")

Load Tracking (for Bounded Load)

// Increment load when starting a request
host, _ := c.GetLeast("request-key")
c.Inc(host)

// Decrement load when request completes
c.Done(host)

// Check current load
load, err := c.GetLoad("server-1")

// Get max allowed load (avgLoad * 1.25)
maxLoad := c.MaxLoad()

Bounded Load Consistent Hashing

The GetLeast method implements bounded load consistent hashing, which ensures no host receives more than avgLoad * 1.25 of requests. When a host is at capacity, the algorithm walks the ring to find the next available host.

This is useful for scenarios where you need to:

  • Prevent hot spots in your cluster
  • Ensure even distribution of active connections
  • Handle varying request costs
// Track load for bounded distribution
for _, request := range requests {
    host, _ := c.GetLeast(request.Key)
    c.Inc(host)
    
    go func(h string) {
        defer c.Done(h)
        processRequest(request)
    }(host)
}

How It Works

The Hash Ring

At the core is a conceptual ring spanning the full uint64 hash space (02^64 - 1). Every server and every key gets hashed into a position on this ring using BLAKE2b-SIMD.

                         0 / MAX
                          │
               S1-2 ──────┤──────── S1-0
              /            │            \
           S3-1            │            S2-0
            │         Hash Ring         │
           S3-0            │            S3-1  (replicas interleaved)
              \            │            /
               S2-1 ───────┴──────── S1-1

Virtual Nodes (Replicas)

Each physical server is placed at multiple positions on the ring — not just one. These are called virtual nodes or replicas.

for i := 0; i < int(c.replicaFactor); i++ {
    h := c.hash(fmt.Sprintf("%s%d", host, i))  // "server-1:0", "server-1:1", ...
    c.hosts[h] = host
    c.sortedSet = append(c.sortedSet, h)
}

With replicaFactor = 20 (default), 3 servers produce 60 virtual nodes spread evenly across the ring. This prevents any single server from owning a disproportionately large arc — a common problem when you have very few physical nodes.

Without virtual nodes (3 servers, poor distribution):
  ──[ S1 ]──────────────────────────[ S2 ]──────[ S3 ]──

With virtual nodes (each server has 3 replicas, even spread):
  ──[S1-0]──[S3-0]──[S2-0]──[S1-1]──[S3-1]──[S2-1]──[S1-2]──

Key Lookup: Get(key)

Locating the server for a key is a O(log n) binary search on sortedSet — a sorted []uint64 of all virtual node hashes.

Step 1: hash("user-42") → some uint64 value, e.g. 0x3F...

Step 2: binary search sortedSet[] for first position >= hash

Step 3: if past end of ring → wrap around to index 0 (ring property)

Step 4: return hosts[sortedSet[idx]]
idx := sort.Search(len(c.sortedSet), func(i int) bool {
    return c.sortedSet[i] >= key
})
if idx >= len(c.sortedSet) {
    idx = 0  // wrap around
}

The key always lands on the next clockwise virtual node, which maps back to a physical server.

Bounded Load: GetLeast(key)

Plain consistent hashing can still create hot spots if one server happens to receive a flood of popular keys. GetLeast solves this with the bounded load algorithm.

The maximum allowed load per host is:

maxLoad = ceil(avgLoad * 1.25)
avgLoad = (totalLoad + 1) / numHosts

The +1 prevents division-by-zero and ensures maxLoad ≥ 1 at all times.

GetLeast("user-42"):

1. Hash key → find target node (same as Get)
2. Is target.load < maxLoad?
   ├── YES → use this host, return it
   └── NO  → walk clockwise to next virtual node, repeat
for i := 0; i < len(c.sortedSet); i++ {
    host := c.hosts[c.sortedSet[(idx+i)%len(c.sortedSet)]]
    if c.loadMap[host].Load < c.maxLoad() {
        return host, nil
    }
}

If every host is at capacity (extreme edge case), it falls back to the originally-hashed node to preserve consistency.

Load Lifecycle

You must call Inc / Done to let the algorithm track active load. This is intentionally manual so it can wrap any unit of work — HTTP requests, DB queries, goroutines, etc.

                ┌──────────────────────────────────┐
                │                                  │
  GetLeast(key) → Inc(host) → [do work] → Done(host)
                │                                  │
                └──────────────────────────────────┘
                    load counter tracks in-flight work

Adding and Removing Hosts

Add: Hashes replicaFactor virtual nodes → inserts them into sortedSet → re-sorts. Only keys that now hash into the new server's arcs will move; everything else stays put.

Remove: Deletes all virtual nodes for that host → removes from sortedSet. Keys that were owned by the removed server get picked up by the next clockwise neighbor. No other keys are affected.

This is the core property of consistent hashing: O(K/N) keys remapped on average when a host is added or removed, rather than a full rehash.

Internal Data Structures

Field Type Purpose
hosts map[uint64]string Virtual node hash → server name
sortedSet []uint64 Sorted virtual node hashes for binary search
loadMap map[string]*Host Server name → {Name, Load} for bounded load
totalLoad int64 Sum of all active loads (for avgLoad calc)
replicaFactor uint16 Virtual nodes per physical host

Configuration

Parameter Default Description
replicaFactor 20 Number of virtual nodes per host

Higher replica factors provide better distribution but use more memory.

Running the Example

cd example
go run example.go

Running Tests

go test -v ./...

Benchmarks

go test -bench=. -benchmem

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages