Skip to content

Commit 90ace87

Browse files
committed
Account for input hash vector in construction memory estimate
estimate_num_bytes_for_construction returned max(map_bytes, search_bytes), but the input hash vector lives across both phases. The partitioned external builder pushes the per-partition hashes (num_keys * sizeof(hash_type)) into in_memory_partitions and only adds the build estimate to its running 'bytes' counter, so the bytes < config.ram gate underestimated true residency by num_keys * sizeof(hash_type) per partition. For hash128 (16B) plus the search-phase peak (~25B/key), peak per- partition residency is ~41B/key while the gate believed ~25B/key, so config.ram=2GiB actually allowed ~3.2GB resident. Adding the input hashes outside the max makes config.ram match observed RSS.
1 parent e04a192 commit 90ace87

1 file changed

Lines changed: 7 additions & 1 deletion

File tree

include/builders/internal_memory_builder_single_phf.hpp

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,13 @@ struct internal_memory_builder_single_phf {
214214
+ num_keys * sizeof(uint64_t) // hashes
215215
+ table_size / 8; // bitmap taken
216216

217-
return std::max<uint64_t>(num_bytes_for_map, num_bytes_for_search);
217+
// The input hash vector lives across both the map and search phases,
218+
// so its footprint must be added outside the max, not folded into it.
219+
uint64_t num_bytes_for_input_hashes =
220+
num_keys * sizeof(typename hasher_type::hash_type);
221+
222+
return std::max<uint64_t>(num_bytes_for_map, num_bytes_for_search) +
223+
num_bytes_for_input_hashes;
218224
}
219225

220226
private:

0 commit comments

Comments
 (0)