Use independent seeds in makeSizedByteStrings#7735
Conversation
`makeSizedByteStrings` used the same `H.Seed` for every element, so each generated ByteString was a prefix of the same deterministic byte sequence. Use `unfoldr (Just . Seed.split)` to produce a stream of independent SplitMix seeds instead, giving uncorrelated content across sizes.
|
/benchmark nofib |
|
Click here to check the status of your benchmark. |
zliu41
left a comment
There was a problem hiding this comment.
Use independent seeds
Looks reasonable, but are you sure this is the reason for the "FIXME: this is terrible"? Who added the FIXME?
This has nothing to do with |
|
Comparing benchmark results of 'nofib' on '0468c1c57d' (base) and '5a6c9917d0' (PR) Results table
|
Unisay
left a comment
There was a problem hiding this comment.
Two suggestions, neither blocking:
-
Could
seedsbe a top-level binding?unfoldr (Just . Seed.split)is enough of a head-scratcher that I'd give it a name, and the same stream is useful for the next sized-list helper:-- | Infinite stream of independent seeds derived from a root seed. splitSeeds :: H.Seed -> [H.Seed] splitSeeds = unfoldr (Just . Seed.split)
-
The same bug is in
makeSizedTextStrings(line 108) andmakeSizedUtf8ByteStrings(line 118). Bothfmapthe same seed across all sizes, so string anddecodeUtf8benchmarks get correlated inputs too. WithsplitSeedsextracted, each fix is one line:makeSizedTextStrings :: H.Seed -> [Integer] -> [Text] makeSizedTextStrings seed sizes = zipWith makeSizedTextString (splitSeeds seed) (fmap fromInteger sizes) makeSizedUtf8ByteStrings :: H.Seed -> [Integer] -> [ByteString] makeSizedUtf8ByteStrings seed sizes = zipWith makeSizedUtf8ByteString (splitSeeds seed) (fmap fromInteger sizes)
If you'd rather keep this PR scoped, I'm happy to do these in a follow-up.
`makeSizedTextStrings` and `makeSizedUtf8ByteStrings` had the same bug as `makeSizedByteStrings`: they fmap'd a single seed across all sizes, producing correlated string and decodeUtf8 benchmark inputs. Extract the seed stream into a named top-level binding `splitSeeds` and reuse it across all three sized-list helpers.
That was me. It was a quick implementation and I meant to go back and improve it some time. This seems to do the job though. |
@Unisay Please do that! |
kwxm
left a comment
There was a problem hiding this comment.
Sorry for the long delay in reviewing this! It seems to do what's required. I don't think the original version was actually problematic, but it certainly wasn't ideal, and this should make it less easy to get things wrong in the future. I ran costing benchmarks for some of the bytestring functions with both this branch and the version in master and the results weren't siginficantly different.
The generators for the costing benchmarks have grown over time and they're a bit repetitive and unsystematic. There a long-standing issue to rework them all and we should do that when we get the chance. This improves things for the time being though.
Summary
Fixes the
-- FIXME: this is terribleinplutus-core/cost-model/budgeting-bench/Generators.hs.makeSizedByteStringspassed the sameH.Seedto every call ofmakeSizedByteString, so each generated ByteString was a prefix of the same deterministic byte sequence — smaller ones were literal prefixes of larger ones. This produced correlated inputs for benchmarks that depend on byte content (equality, hashing, string ops).