Skip to content

Update PhiloxRNG.jl to v1.1.1 and fix counter overflow with UInt64#717

Open
nhz2 wants to merge 3 commits intoJuliaGPU:masterfrom
nhz2:nz/rng-update
Open

Update PhiloxRNG.jl to v1.1.1 and fix counter overflow with UInt64#717
nhz2 wants to merge 3 commits intoJuliaGPU:masterfrom
nhz2:nz/rng-update

Conversation

@nhz2
Copy link
Copy Markdown

@nhz2 nhz2 commented Apr 28, 2026

Fixes #713
This PR uses the following scheme to avoid rng stream collisions.

Philox4x32-10 takes a 128-bit counter (ctr0, ctr1) and a 64-bit key. The scheme is:

  • key = the 64-bit seed (identifies the RNG instance).
  • ctr1 = a 64-bit per-launch counter, incremented host-side after each kernel that consumes the RNG.
  • ctr0 = a 64-bit counter gid + nthreads * localcounter, varying inside the kernel: gid partitions across threads, localcounter increments per Philox call from that thread. This allows for over 2^32 threads with fewer Philox calls per thread, or over 2^32 Philox calls per thread with fewer threads without having a collision.

ctr1 is also randomized when seeding from a RandomDevice to make collisions less likely.

CC @maleadt

@nhz2 nhz2 marked this pull request as ready for review April 28, 2026 19:24
Comment thread src/host/random.jl Outdated
Comment thread src/host/random.jl
Comment on lines +62 to +64
@inline function u01(::Type{F}, u::UInt64)::F where F
fma(F(u), F(2)^Int32(-64), F(2)^Int32(-65))
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this strictly more expensive than the bit-pattern there was before?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but it has a much larger dynamic range.
See the comment about this in https://github.com/DEShawResearch/random123/blob/9545ff6413f258be2f04c1d319d99aaef7521150/include/Random123/u01fixedpt.h#L43-L52
I also don't see a big impact on benchmarks, https://github.com/medyan-dev/PhiloxRNG.jl#gpu--nvidia-geforce-rtx-3080-nsvalue-n--100000000 RTX 3080 is still almost hitting its memory bandwidth. Does this show up as a problem on other devices?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrap around in rand_batched_kernel! can lead to repeating patterns

2 participants