Skip to content

Commit 4e91729

Browse files
authored
fix(ustring): protect against idempotent rehash upon collision (#5090)
Some fuzzing stumbled across a very interesting edge case in the rehashing logic of ustring internals: If the initial hash location of a newly ustring-ized string is already occupied, it rehashes. But we never considered what happens if it ends up rehashing to the same slot! And in fact, that can happen if the true hash is 0 (I'm not sure if it can/does for any other value). And we may have thought that the only string that would ever hash to 0 is the empty string (which is 0 hash by design, and doesn't need to rehash because the table is seeded with that entry already in its correct slot). So anyway, if there was some other string that happened to hash to 0, it would get caught in an infinite loop of rehashing but never finding a free slot, because it keeps rehashing to the very same occupied slot. And would you believe that there IS such a string, and it is this one: "\t\n\n\t\301\350`O\217c85?\202\200\251r|}\304\351\2517\337K'\217C\240" (unprintable chars expressed as escaped octal values) The crux of this patch is to notice when the old and new hash bit pattern has not changed, and bump it by 7 (a prime number, so it should not land on the same location again until every slot in the bin has been exhausted). Solution entirely from my own human brain, in case you're wondering. Signed-off-by: Larry Gritz <lg@larrygritz.com>
1 parent 9c7b98c commit 4e91729

File tree

2 files changed

+27
-4
lines changed

2 files changed

+27
-4
lines changed

src/libutil/ustring.cpp

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -522,7 +522,7 @@ ustring::make_unique(string_view strref)
522522
size_t bin = rm.lock_bin(hash);
523523

524524
hash_t orighash = hash;
525-
size_t binmask = orighash & (~rm.nobin_mask());
525+
size_t binbits = orighash & (~rm.nobin_mask());
526526
size_t num_rehashes = 0;
527527

528528
while (1) {
@@ -546,16 +546,31 @@ ustring::make_unique(string_view strref)
546546
break;
547547
}
548548
// Rehash, but keep the bin bits identical so we always rehash into
549-
// the same (locked) bin.
550-
hash = (hash & binmask)
551-
| (farmhash::Fingerprint(hash) & rm.nobin_mask());
549+
// the same (locked) bin. But watch out for rehashing that returns the
550+
// identical non-bin part as before -- that will enter an infinite
551+
// loop if we're not careful!
552+
hash_t old_nonbin_bits = hash & rm.nobin_mask();
553+
hash_t new_nonbin_bits = farmhash::Fingerprint(hash) & rm.nobin_mask();
554+
if (OIIO_UNLIKELY(old_nonbin_bits == new_nonbin_bits)) {
555+
new_nonbin_bits = (new_nonbin_bits + 7) & rm.nobin_mask();
556+
# ifndef NDEBUG
557+
std::string s = Strutil::escape_chars(strref);
558+
print(stderr, "IDEMPOTENT RE-HASH! |{}|\n", s);
559+
for (auto c : s)
560+
print(stderr, c > 0 ? "{:c}" : "\\{:03o}",
561+
static_cast<unsigned char>(c));
562+
print(stderr, "\n");
563+
# endif
564+
}
565+
hash = binbits | new_nonbin_bits;
552566
++num_rehashes;
553567
// Strutil::print("COLLISION \"{}\" {:08x} vs \"{}\"\n",
554568
// strref, orighash, rev->second);
555569
{
556570
std::lock_guard<std::mutex> lock(collision_mutex);
557571
all_hash_collisions.emplace_back(rev->second, rev->first);
558572
}
573+
OIIO_ASSERT(num_rehashes < 100000); // Something is very wrong
559574
}
560575
rm.unlock_bin(bin);
561576

src/libutil/ustring_test.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,14 @@ test_ustring()
173173
Strutil::print("Test print default initialized ustring: '{}'\n", uninit);
174174
OIIO_CHECK_EQUAL(Strutil::format("{}", empty),
175175
Strutil::format("{}", uninit));
176+
177+
// Regression test: This specific string hashes to 0! This once caused a
178+
// lot of trouble, because it couldn't rehash properly (every rehash also
179+
// ended up at 0). When broken, this made an infinite loop inside
180+
// ustring::make_unique().
181+
const char* hash0
182+
= "\t\n\n\t\301\350`O\217c85?\202\200\251r|}\304\351\2517\337K'\217C\240";
183+
OIIO_CHECK_EQUAL(ustring(hash0).hash(), 13226272983328805811ULL);
176184
}
177185

178186

0 commit comments

Comments
 (0)