Skip to content

Commit 170870e

Browse files
committed
remove slots mostly and update optimizations file
1 parent 34dad1a commit 170870e

2 files changed

Lines changed: 112 additions & 93 deletions

File tree

crates/hash-sorted-map/OPTIMIZATIONS.md

Lines changed: 102 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44

55
`HashSortedMap` is a Swiss-table-inspired hash map that uses **overflow
66
chaining** (instead of open addressing), **SIMD group scanning** (NEON/SSE2),
7-
a **slot-hint fast path**, and an **optimized growth strategy**. It is generic
8-
over key type, value type, and hash builder.
7+
and an **optimized growth strategy**. It is generic over key type, value type,
8+
and hash builder.
99

1010
This document analyzes the design trade-offs versus
1111
[hashbrown](https://github.com/rust-lang/hashbrown) and records the
@@ -38,7 +38,6 @@ experimental results that guided the current design.
3838
│ • Overflow chaining (linked groups) │
3939
│ • 8-byte groups with NEON/SSE2/scalar SIMD scan │
4040
│ • EMPTY / FULL tag states only (insertion-only, no deletion) │
41-
│ • Slot-hint fast path │
4241
└──────────────────────────────────────────────────────────────────┘
4342
```
4443

@@ -106,17 +105,32 @@ the overflow path.
106105
SIMD version** by pessimizing NEON code generation. Removed from the SIMD
107106
implementation, kept in the scalar version.
108107

109-
### 7. Slot Hint Fast Path (Unique to HashSortedMap)
108+
### 7. Slot Hint Fast Path ⚠️ Removed from Lookup Paths
110109

111-
HashSortedMap checks a preferred slot before scanning the group:
110+
Originally, HashSortedMap checked a preferred slot before scanning the group:
112111
```rust
113112
let hint = slot_hint(hash); // 3 bits from hash → slot index
114113
if ctrl[hint] == EMPTY { /* direct insert */ }
115114
if ctrl[hint] == tag && keys[hint] == key { /* direct hit */ }
116115
```
117116

118-
hashbrown does **not** have this optimization — it always does a full SIMD
119-
group scan. The reason why the performance is different is probably due to the different overflow strategies and the different load factors.
117+
**Experimental finding**: This scalar check **hurts performance** on random
118+
workloads. The branch predictor cannot help because random keys map to random
119+
slots, making the hint check a 50/50 branch that pollutes the branch
120+
predictor. SIMD-only scanning (match_tag + match_empty) is uniformly fast
121+
regardless of key distribution.
122+
123+
**Results of removing slot_hint from different paths:**
124+
- `find_or_insertion_slot` (entry API): **−25% latency** on merge benchmark
125+
- `get_hashed`: **−4.4%** improvement (SIMD scan is faster than branch+scalar)
126+
- `insert_hashed`: **+7%** regression on presized insert (the hint genuinely
127+
helps when inserting into a mostly-empty group), but accepted for code
128+
simplicity since the merge workload matters more
129+
130+
**Current state**: slot_hint is **only** used in `insert_for_grow()`, where
131+
the map is guaranteed sparse after a resize (groups are mostly empty, so the
132+
hint slot is very likely free). For all other paths, SIMD-only scanning is
133+
used.
120134

121135
### 8. Overflow Reserve Sizing ✅ Validated
122136

@@ -159,13 +173,85 @@ entropy in both halves. Also changed trigram generation to use
159173

160174
## Summary of Impact
161175

162-
| Change | Effect on insert time |
163-
|----------------------------|------------------------------|
164-
| Capacity sizing fix | **−50%** (biggest win) |
165-
| Optimized growth path | **−10%** on growth scenarios |
166-
| SIMD group scanning | **−5%** |
167-
| Branch hints (scalar only) | **−2–6%** |
168-
| IdentityHasher fix | Enabled fair comparison |
176+
| Change | Effect |
177+
|---------------------------------|-------------------------------------|
178+
| Capacity sizing fix | **−50%** insert time (biggest win) |
179+
| Optimized growth path | **2× faster** growth than hashbrown |
180+
| SIMD group scanning | **−5%** insert time |
181+
| Slot hint removal (entry/get) | **−25%** merge latency |
182+
| Branch hints (scalar only) | **−2–6%** |
183+
| IdentityHasher fix | Enabled fair comparison |
184+
185+
---
169186

170-
The current HashSortedMap **matches hashbrown+FxHash** on pre-sized inserts,
171-
**beats all hashbrown variants** on overwrites, and has **2× faster growth**.
187+
## Benchmark Results (Apple M-series, aarch64 NEON)
188+
189+
### Insert (1000 trigrams, pre-sized)
190+
191+
| Implementation | Time (µs) | vs hashbrown |
192+
|----------------------|-----------|--------------|
193+
| FoldHashMap | 2.44 | −11% |
194+
| FxHashMap | 2.61 | −5% |
195+
| hashbrown+Identity | 2.63 | baseline |
196+
| hashbrown::HashMap | 2.74 | +4% |
197+
| std::HashMap+FNV | 3.18 | +21% |
198+
| AHashMap | 3.38 | +29% |
199+
| **HashSortedMap** | **3.46** | **+32%** |
200+
| std::HashMap | 8.65 | +229% |
201+
202+
### Reinsert (1000 trigrams, all keys exist)
203+
204+
| Implementation | Time (µs) |
205+
|----------------------|-----------|
206+
| hashbrown+Identity | 2.50 |
207+
| **HashSortedMap** | **2.70** |
208+
209+
### Growth (128 → 1000 trigrams, 3 resize rounds)
210+
211+
| Implementation | Time (µs) |
212+
|----------------------|-----------|
213+
| **HashSortedMap** | **5.35** |
214+
| hashbrown+Identity | 10.12 |
215+
216+
### Count (4000 trigrams, mixed insert/update)
217+
218+
| Implementation | Time (µs) |
219+
|----------------------------------|-----------|
220+
| hashbrown+Identity entry() | 4.89 |
221+
| **HashSortedMap entry().or_default()** | **5.44** |
222+
| **HashSortedMap get_or_default** | **5.48** |
223+
224+
### Iteration (1000 trigrams)
225+
226+
| Implementation | Time (ns) |
227+
|-------------------------------|-----------|
228+
| **HashSortedMap iter()** | **794** |
229+
| **HashSortedMap into_iter()** | **998** |
230+
| hashbrown+Identity iter() | 1,067 |
231+
| hashbrown+Identity into_iter()| 1,060 |
232+
233+
### Sort (100K trigrams)
234+
235+
| Implementation | Time (µs) |
236+
|-----------------------------|-----------|
237+
| **HashSortedMap sort_by_hash** | **706** |
238+
| Vec::sort_unstable | 984 |
239+
240+
### Merge (100 maps × 100K keys each → sorted output)
241+
242+
| Implementation | Time (ms) | vs HSM merge+sort |
243+
|-----------------------------------|-----------|--------------------|
244+
| hashbrown merge presized | 30.4 | −46% |
245+
| **HashSortedMap merge presized** | **37.3** | **−33%** |
246+
| **HashSortedMap merge (no sort)** | **44.0** | **−21%** |
247+
| hashbrown merge | 45.4 | −19% |
248+
| **HashSortedMap merge + sort** | **55.9** | **baseline** |
249+
| hashbrown merge + Vec sort | 58.7 | +5% |
250+
| k-way merge sorted vecs | 445 | +696% |
251+
252+
**Key takeaways:**
253+
- HashSortedMap has **2× faster growth** than hashbrown
254+
- **25% faster iteration** than hashbrown (dense group layout)
255+
- **sort_by_hash is 28% faster** than Vec::sort_unstable (data is partially sorted by group)
256+
- **merge + sort is 5% faster** than hashbrown merge + Vec sort (the primary use case)
257+
- Pre-sized insert is 32% slower than hashbrown (trade-off for sort/merge efficiency)

crates/hash-sorted-map/src/hash_sorted_map.rs

Lines changed: 10 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -215,26 +215,11 @@ impl<K: Hash + Eq, V, S: BuildHasher> HashSortedMap<K, V, S> {
215215

216216
fn insert_hashed(&mut self, hash: u64, key: K, value: V) -> Option<V> {
217217
let tag = tag(hash);
218-
let hint = slot_hint(hash);
219218
let mut gi = self.container.group_index(hash);
220219
loop {
221220
let group = &mut self.container.groups[gi];
222-
// Fast path: check preferred slot.
223-
let c = group.ctrl[hint];
224-
if c == CTRL_EMPTY {
225-
group.ctrl[hint] = tag;
226-
group.keys[hint] = MaybeUninit::new(key);
227-
group.values[hint] = MaybeUninit::new(value);
228-
self.container.len += 1;
229-
return None;
230-
}
231-
if c == tag && unsafe { group.keys[hint].assume_init_ref() } == &key {
232-
let old = std::mem::replace(unsafe { group.values[hint].assume_init_mut() }, value);
233-
return Some(old);
234-
}
235-
// Slow path: SIMD scan group for tag match.
221+
// SIMD scan group for tag match.
236222
let mut tag_mask = group_ops::match_tag(&group.ctrl, tag);
237-
tag_mask = group_ops::clear_slot(tag_mask, hint);
238223
while let Some(i) = group_ops::next_match(&mut tag_mask) {
239224
if unsafe { group.keys[i].assume_init_ref() } == &key {
240225
let old =
@@ -267,9 +252,9 @@ impl<K: Hash + Eq, V, S: BuildHasher> HashSortedMap<K, V, S> {
267252
self.container.num_groups += 1;
268253
self.container.groups[gi].overflow = new_gi as u32;
269254
let group = &mut self.container.groups[new_gi];
270-
group.ctrl[hint] = tag;
271-
group.keys[hint] = MaybeUninit::new(key);
272-
group.values[hint] = MaybeUninit::new(value);
255+
group.ctrl[0] = tag;
256+
group.keys[0] = MaybeUninit::new(key);
257+
group.values[0] = MaybeUninit::new(value);
273258
self.container.len += 1;
274259
return None;
275260
}
@@ -282,31 +267,20 @@ impl<K: Hash + Eq, V, S: BuildHasher> HashSortedMap<K, V, S> {
282267
Q: Eq + ?Sized,
283268
{
284269
let tag = tag(hash);
285-
let hint = slot_hint(hash);
286270
let mut gi = self.container.group_index(hash);
287271

288272
loop {
289273
let group = &self.container.groups[gi];
290-
291-
// Fast path: preferred slot.
292-
let c = group.ctrl[hint];
293-
if c == tag && unsafe { group.keys[hint].assume_init_ref() }.borrow() == key {
294-
return Some(unsafe { group.values[hint].assume_init_ref() });
295-
}
296-
297-
// Slow path: SIMD scan group.
274+
// SIMD scan group for tag match.
298275
let mut tag_mask = group_ops::match_tag(&group.ctrl, tag);
299-
tag_mask = group_ops::clear_slot(tag_mask, hint);
300276
while let Some(i) = group_ops::next_match(&mut tag_mask) {
301277
if unsafe { group.keys[i].assume_init_ref() }.borrow() == key {
302278
return Some(unsafe { group.values[i].assume_init_ref() });
303279
}
304280
}
305-
306281
if group_ops::match_empty(&group.ctrl) != 0 {
307282
return None;
308283
}
309-
310284
if group.overflow == NO_OVERFLOW {
311285
return None;
312286
}
@@ -334,7 +308,6 @@ impl<K: Hash + Eq, V, S: BuildHasher> HashSortedMap<K, V, S> {
334308
return FindResult::Found(group.values[i].as_mut_ptr());
335309
}
336310
}
337-
338311
// Check for empty slot in this group.
339312
let empty_mask = group_ops::match_empty(&group.ctrl);
340313
if empty_mask != 0 {
@@ -344,7 +317,6 @@ impl<K: Hash + Eq, V, S: BuildHasher> HashSortedMap<K, V, S> {
344317
slot: i,
345318
});
346319
}
347-
348320
// Group full — follow or report end of chain.
349321
if group.overflow == NO_OVERFLOW {
350322
return FindResult::Vacant(Insertion::NeedsOverflow {
@@ -626,7 +598,7 @@ impl<'a, K: Hash + Eq, V, S: BuildHasher> VacantEntry<'a, K, V, S> {
626598
// `entry()` and now (we hold the only `&mut self`).
627599
(*tail).overflow = new_gi as u32;
628600
}
629-
(new_group, slot_hint(hash))
601+
(new_group, 0)
630602
}
631603
};
632604

@@ -644,57 +616,18 @@ impl<'a, K: Hash + Eq, V, S: BuildHasher> VacantEntry<'a, K, V, S> {
644616
}
645617

646618
/// Cold path: the chain was full, the table is at capacity, and we need to
647-
/// grow before inserting. Re-walks via the slow path after grow.
648-
///
649-
/// With clustered hash functions (e.g. identity hashing), the new primary
650-
/// group may still be full after grow, so we handle `NeedsOverflow` by
651-
/// allocating an overflow group.
619+
/// grow before inserting. Grows the map, then re-walks via `entry()` to find
620+
/// the new insertion slot.
652621
#[cold]
653622
#[inline(never)]
654623
fn insert_after_grow<K: Hash + Eq, V, S: BuildHasher>(
655624
map: &mut HashSortedMap<K, V, S>,
656-
hash: u64,
625+
_hash: u64,
657626
key: K,
658627
value: V,
659628
) -> &mut V {
660629
map.grow();
661-
let tag = tag(hash);
662-
match map.find_or_insertion_slot(hash, &key) {
663-
FindResult::Vacant(Insertion::Empty { group, slot }) => {
664-
// SAFETY: `group` points into `map.container.groups` and is valid for `'a`.
665-
unsafe {
666-
let g = &mut *group;
667-
g.ctrl[slot] = tag;
668-
g.keys[slot] = MaybeUninit::new(key);
669-
g.values[slot] = MaybeUninit::new(value);
670-
map.container.len += 1;
671-
g.values[slot].assume_init_mut()
672-
}
673-
}
674-
FindResult::Vacant(Insertion::NeedsOverflow { tail }) => {
675-
// Primary group chain is full even after grow (possible with
676-
// clustered identity hashes). Allocate an overflow group.
677-
debug_assert!(
678-
(map.container.num_groups as usize) < map.container.groups.len(),
679-
"overflow pool exhausted right after grow"
680-
);
681-
let new_gi = map.container.num_groups as usize;
682-
map.container.num_groups += 1;
683-
unsafe {
684-
(*tail).overflow = new_gi as u32;
685-
}
686-
let slot = slot_hint(hash);
687-
let group = &mut map.container.groups[new_gi];
688-
group.ctrl[slot] = tag;
689-
group.keys[slot] = MaybeUninit::new(key);
690-
group.values[slot] = MaybeUninit::new(value);
691-
map.container.len += 1;
692-
unsafe { group.values[slot].assume_init_mut() }
693-
}
694-
FindResult::Found(_) => {
695-
unreachable!("key was not in the table before grow")
696-
}
697-
}
630+
map.entry(key).or_insert(value)
698631
}
699632

700633
// No custom Drop needed for HashSortedMap — dropping `container` handles entries.

0 commit comments

Comments
 (0)