Name	Name	Last commit message	Last commit date
parent directory ..
benchmarks	benchmarks
src	src
Cargo.toml	Cargo.toml
LICENSE	LICENSE
OPTIMIZATIONS.md	OPTIMIZATIONS.md
README.md	README.md

hash-sorted-map

A hash map whose groups are ordered by hash prefix, enabling efficient sorted-order iteration and linear-time merging of two maps.

Motivation

In a search index, each document produces a term map (term → frequency). At index time, term maps from many documents must be merged into a single posting list, and the result is serialized in hash-key order so that lookups can use a skip-list approach, leveraging the hash ordering to efficiently jump to the right region of the serialized data.

A conventional hash map stores entries in arbitrary order, so merging two maps requires collecting, sorting, and reshuffling all entries — an expensive step that dominates indexing time for large term maps typical of code search, where documents contain massive numbers of tokens.

HashSortedMap avoids this by organizing its groups by hash prefix. Iterating through the groups in order yields entries sorted by their hashed keys, which means:

Merging two maps is a single linear scan (like merge-sort's merge step).
Serialization in hash-key order requires no extra sorting or copying.

Design

HashSortedMap<K, V, S> is a Swiss-table-inspired hash map that uses:

Overflow chaining instead of open addressing — groups that fill up link to overflow groups rather than probing into neighbours.
Contiguous packing — occupied slots are always packed from position 0 with no gaps, enabling a single leading_zeros() to find the next free slot.
SIMD group scanning — uses NEON on aarch64, SSE2 on x86_64, and a scalar fallback elsewhere to scan 8–16 control bytes in parallel.
AoS group layout — each group stores its control bytes, keys, and values together, keeping a single insert's data within 1–2 cache lines.
Optimized growth — during resize, elements are re-inserted without duplicate checking and copied via raw pointers.
Generic key/value/hasher — keys need only Eq (Ord to sort). Customize hashing with the single-method [SortingHash] trait; any standard S: BuildHasher works out of the box via a blanket impl, and Borrow<Q>-based lookups are supported.

Benchmark results

Latest local Criterion snapshot from this repository's target/criterion outputs (lower is better):

Hardware used for this snapshot:

CPU: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Architecture: x86_64
Topology: 1 socket, 1 core, 2 threads
CPU frequency range: 800 MHz to 2800 MHz
Memory: 7.8 GiB RAM

Scenario	HashSortedMap	Comparison	Result
Insert 1000 trigrams (pre-sized)	9.40 µs	`std::HashMap+FoldHash`: 14.55 µs	~35% faster
Grow from capacity 128	27.50 µs	`std::HashMap+Identity`: 26.66 µs	~3% slower
Count 4000 trigrams (`entry().or_default()`)	16.15 µs	`std::HashMap+Identity` `entry()`: 15.49 µs	~4% slower
Iterate 1000 trigrams (`iter()`)	3.02 µs	`std::HashMap+Identity` `iter()`: 3.04 µs	~1% faster
Sort 100000 trigrams by hash	1.66 ms	`Vec::sort_unstable`: 2.20 ms	~24% faster
Merge 100 sorted maps + final sort	152.34 ms	`std::HashMap+Identity` merge + vec sort: 193.37 ms	~21% faster

Note: std::collections::HashMap is hashbrown under the hood, so the benchmark drives std::collections::HashMap directly with the same custom BuildHashers that were previously passed to hashbrown.

Key takeaways:

Pre-sized inserts, sorting, and merge+sort remain the strongest paths.
Iteration is now roughly on par with std::HashMap+Identity.
Growth and count/update workloads are currently slightly slower than std::HashMap+Identity in this run.

Running

cargo bench --bench hashmap_insert

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

hash-sorted-map

Motivation

Design

Benchmark results

Running

Uh oh!

FilesExpand file tree

hash-sorted-map

Directory actions

More options

Directory actions

More options

Latest commit

History

hash-sorted-map

Folders and files

parent directory

README.md

hash-sorted-map

Motivation

Design

Benchmark results

Running