Skip to content
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
0d83453
core implementation
aneubeck Aug 11, 2025
89f8ad4
Update README.md
aneubeck Aug 12, 2025
b03f0b7
Apply suggestion from @Copilot
aneubeck Aug 12, 2025
0dbf1c9
finish proof
aneubeck Aug 13, 2025
953d9bd
Merge branch 'aneubeck/sampling' of https://github.com/github/rust-ge…
aneubeck Aug 13, 2025
a5eb91e
Update README.md
aneubeck Aug 13, 2025
220624d
Update README.md
aneubeck Aug 13, 2025
fc69a9e
Update README.md
aneubeck Aug 13, 2025
90259e9
Update README.md
aneubeck Aug 13, 2025
8480ea3
Replace key with hasher traits
aneubeck Aug 13, 2025
0baaafc
Update lib.rs
aneubeck Aug 13, 2025
0dcb137
Update README.md
aneubeck Aug 13, 2025
d4b8410
Update crates/consistent-hashing/README.md
aneubeck Aug 15, 2025
0935ea0
Update crates/consistent-hashing/README.md
aneubeck Aug 15, 2025
99c69f3
Update crates/consistent-hashing/README.md
aneubeck Aug 15, 2025
496f539
Update crates/consistent-hashing/README.md
aneubeck Aug 15, 2025
23f3080
add benchmark
aneubeck Aug 15, 2025
5d52237
remove second vector
aneubeck Aug 15, 2025
f6e29f7
Update README.md
aneubeck Aug 15, 2025
d20f9b6
Update performance.rs
aneubeck Aug 15, 2025
1dde97c
make linter happy
aneubeck Aug 15, 2025
9171444
some more docu + better choose_k implementation
aneubeck May 18, 2026
20048d8
fix some tests
aneubeck May 18, 2026
e9fe3bd
fix remaining tests
aneubeck May 18, 2026
bec3c8f
clippy + fmt
aneubeck May 18, 2026
da37dd7
Update bounded_load.rs
aneubeck May 19, 2026
48e3b0b
Update bounded_load.rs
aneubeck May 19, 2026
25a42f8
Update bounded_load.rs
aneubeck May 19, 2026
04da223
Update README.md
aneubeck May 19, 2026
df712d9
Merge branch 'main' into aneubeck/sampling
aneubeck May 19, 2026
f3bbcd9
add a proper consistentnodemap implementation + documentation
aneubeck May 20, 2026
7deb027
Update README.md
aneubeck May 20, 2026
e54eaa2
Update README.md
aneubeck May 20, 2026
cad9115
Update README.md
aneubeck May 20, 2026
4e3dc65
clippy
aneubeck May 20, 2026
2dda8bd
Update README.md
aneubeck May 20, 2026
83174e2
rename crate
aneubeck May 20, 2026
bd4083f
upgrade dependencies
aneubeck May 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions crates/consistent-hashing/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[package]
name = "consistent-hashing"
version = "0.1.0"
edition = "2021"
description = "Constant time consistent hashing algorithms."
repository = "https://github.com/github/rust-gems"
license = "MIT"
keywords = ["probabilistic", "algorithm", "consistent hashing", "jump hashing", "rendezvous hashing"]
categories = ["algorithms", "data-structures", "mathematics", "science"]

[lib]
crate-type = ["lib", "staticlib"]
bench = false

[dependencies]

[dev-dependencies]
99 changes: 99 additions & 0 deletions crates/consistent-hashing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Consistent Hashing

Consistent hashing maps keys to a changing set of nodes (shards, servers) so that when nodes join or leave, only a small fraction of keys move. It is used in distributed caches, databases, object stores, and load balancers to achieve scalability and high availability with minimal data reshuffling.

Common algorithms
- [Consistent hashing](https://en.wikipedia.org/wiki/Consistent_hashing) (hash ring with virtual nodes)
- [Rendezvous hashing](https://en.wikipedia.org/wiki/Rendezvous_hashing)
- [Jump consistent hash](https://arxiv.org/pdf/1406.2294)
- [Maglev hashing](https://research.google/pubs/pub44824)
- [AnchorHash: A Scalable Consistent Hash](https://arxiv.org/abs/1812.09674)
- [DXHash](https://arxiv.org/abs/2107.07930)
- [JumpBackHash](https://arxiv.org/abs/2403.18682)

## Complexity summary

where `N` is the number of nodes and `R` is the number of replicas.

| Algorithm | Lookup per key | Node add/remove | Memory | Lookup with replication |
| | (no replication) | | | |
|-------------------------|---------------------|----------------------------------------|---------------------------|-------------------------------------|
| Hash ring (with vnodes) | O(log N): binary search over N points; O(1): with specialized structures | O(log N) | O(N) | O(log N + R): Take next R distinct successors |
| Rendezvous | O(N): max score | O(1) | O(N) node list | O(N log R): pick top R scores |
| Jump consistent hash | O(log(N)) expected | 0 | O(1) | O(R log N) |
| AnchorHash | O(1) expected | O(1) | O(N) | Not native |
| DXHash | O(1) expected | O(1) | O(N) | Not native |
| JumpBackHash | O(1) expected | 0 | O(1) | Not native |
| **ConsistentChooseK** | **O(1) expected** | **0** | **O(1)** | **O(R^2)**; **O(R log(R))**: using heap |
Comment thread
aneubeck marked this conversation as resolved.
Outdated

Replication of keys
- Hash ring: replicate by walking clockwise to the next R distinct nodes. Virtual nodes help spread replicas more evenly. Replicas are not independently distributed.
- Rendezvous hashing: replicate by selecting the top R nodes by score for the key. This naturally yields R distinct owners and supports weights.
- Jump consistent hash: the base function doesn't support replication. But the math can be easily modified to support consistent replication.
- JumpBackHash and variants: The trick of Jump consistent hash to support replication won't work here due to the additional state introduced.
- ConsistentChooseK: Faster and more memory efficient than all other solutions.

Why replication matters
- Tolerates node failures and maintenance without data unavailability.
- Distributes read/write load across multiple owners, reducing hotspots.
- Enables fast recovery and higher tail-latency resilience.

## ConsistentChooseK algorithm

The following functions summarize the core algorithmic innovation as a minimal Rust excerpt.
`n` is the number of nodes and `k` is the number of desired replica.
The chosen nodes are returned as distinct integers in the range `0..n`.

```
fn consistent_choose_k<Key>(key: Key, k: usize, n: usize) -> Vec<usize> {
(0..k).rev().scan(n, |n, k| Some(consistent_choose_max(key, k + 1, n))).collect()
}

fn consistent_choose_max<Key>(key: Key, k: usize, n: usize) -> usize {
(0..k).map(|k| consistent_hash(key, k, n - k) + k).max()
}

fn consistent_hash<Key>(key: Key, i: usize, n: usize) -> usize {
// compute the i-th independent consistent hash for `key` and `n` nodes.
}
```

`consistent_choose_k` makes `k` calls to `consistent_choose_max` which calls `consistent_hash` another `k` times.
In total, `consistent_hash` is called `k * (k+1) / 2` Utilizing a `O(1)` solution for `consistent_hash` leads to a `O(k^2)` runtime.
Comment thread
aneubeck marked this conversation as resolved.
Outdated
This runtime can be further improved by replacing the max operation with a heap where popped elements are updated according to the new arguments `n` and `k`.
With this optimization, the complexity reduces to `O(k log k)`.
With some probabilistic bucketing strategy, it should be possible to reduce the expected runtime to `O(k)`.
For small `k` neither optimization is probably improving the actual performance though.

The next section proves the correctness of this algorithm.

## N-Choose-R replication
Comment thread
aneubeck marked this conversation as resolved.
Outdated

We define the consistent `n-choose-k` replication as follows:

1. For a given number `n` of nodes, choose `k` distinct nodes `S`.
2. For a given `key` the chosen set of nodes must be uniformly chosen from all possible sets of size `k`.
3. When `n` increases by one, exactly one node in the chosen set will be changed.
4. and the node will be changed with probability `k/(n+1)`.

In the remainder of this section we prove that the `consistent_choose_k` algorithm satisfies those properties.

Let's define `M(k,n) = consistent_choose_max(_, k, n)` and `S(k, n) := consistent_choose_k(_, k, n)` as short-cuts for some arbitrary fixed `key`.
We assume that `consistent_hash(key, k, n)` computes `k` independent consistent hash functions.

Since `M(k, n) < n` and `S(k, n) = {M(k, n)} ∪ S(k - 1, M(k, n))` for `k > 1`, `S(k, n)` constructs a strictly monotonically decreasing sequence. The sequence outputs exactly `k` elements which therefore must all be distinct which proves property 1 for `k <= n`.

Properties 2, 3, and 4 can be proven via induction as follows.

`k = 1`: We expect that `consistent_hash` returns a single uniformly distributed node index which is consistent in `n`, i.e. changes the hash value with probability `1/(n+1)`, when `n` increments by one. In our implementation, we use an `O(1)` implementation of the jump-hash algorithm. For `k=1`, `consistent_choose_k(key, 1, n)` becomes a single function call to `consistent_choose_max(key, 1, n)` which in turn calls `consistent_hash(key, 0, n)`. I.e. `consistent_choose_k` inherits the all the desired properties from `consistent_hash` for `k=1` and all `n>=1`.

`k -> k+1`: `M(k+1, n+1) = M(k+1, n)` iff `M(k, n+1) < n` and `consistent_hash(_, k, n+1-k) < n - k`. The probability for this is `(n+1-k)/(n+1)` for the former by induction and `(n-k)/(n+1-k)` by the assumption that `consistent_hash` is a proper consistent hash function. Since both these probabilities are assumed to be independent, the probability that our initial value changes is `1 - (n+1-k)/(n+1) * (n-k)/(n+1-k) = 1 - (n-k)/(n+1) = (k+1)/(n+1)` proving property 4.

Property 3 is trivially satisfied if `S(k+1, n+1) = S(k+1, n)`. So, we focus on the case where `S(k+1, n+1) != S(k+1, n)`, which implies that `n ∈ S(k+1, n+1)` as largest element.
We know that `S(k+1, n) = {m} ∪ S(k, m)` for some `m` by definition and `S(k, n) = S(k, u) ∖ {v} ∪ {w}` by induction for some `u`, `v`, and `w`. Thus far we have `S(k+1, n+1) = {n} ∪ S(k, n) = {n} ∪ S(k, u) ∖ {v} ∪ {w}`.

If `u = m`, then `S(k+1, n) = {m} ∪ S(k, m) ∖ {v} ∪ {w}` and `S(k+1, n+1) = {n} ∪ S(k, n) = {n} ∪ S(k, m) ∖ {v} ∪ {w}` and the two differ exaclty in the elemetns `m` and `n` proving property 3.
Comment thread
aneubeck marked this conversation as resolved.
Outdated

If `u ≠ m`, then `consistent_hash(_, k, n) = m`, since that's the only way how the largest values in `S(k+1, n)` and `S(k, n)` can differ. In this case, `m ∉ S(k+1, n+1)`, since `n` (and not `m`) is the largest element of `S(k+1, n+1)`. Furthermore, `S(k, n) = S(k, m)`, since `consistent_hash(_, i, n) < m` for all `i < k` (otherwise there is a contradiction).
Putting it together leads to `S(k+1, n+1) = {n} ∪ S(k, m)` and `S(k+1, n) = {m} ∪ S(k, m)` which differ exactly in the elements `n` and `m` which concludes the proof.

Loading
Loading