I'm delighted to see the parallel branch run at 100fps on my 5800X3D. It's quite fun, and also fun being able to quickly vibe code experiments off of this code base.
GPT-5.2 tells me that it should be pretty practical to do a Barnes Hut GPU implementation. It says:
- Compute Morton code per particle (based on normalized position).
- Radix sort particles by Morton code.
- Build a binary tree over the sorted array using “longest common prefix” logic (Karras 2012 style).
- Compute internal node AABBs + mass + COM bottom-up.
Why this is nice:
- No dynamic allocation, no atomics-heavy “insert particle into tree” chaos.
- Deterministic structure, fast on GPU.
- Works for 2D or 3D.
Barnes–Hut acceptance test uses each node’s bounding box size and distance, same as quadtree/octree.
On the surface it sounds reasonable. I'll play around with it in my spare time, but I was curious if you had given this any thought and if you think this would be an efficient way to go about it.
Overall (esp in 2026) the quickest way to know if it will be efficient would probably be to just go build it and measure its performance...
I'm delighted to see the parallel branch run at 100fps on my 5800X3D. It's quite fun, and also fun being able to quickly vibe code experiments off of this code base.
GPT-5.2 tells me that it should be pretty practical to do a Barnes Hut GPU implementation. It says:
On the surface it sounds reasonable. I'll play around with it in my spare time, but I was curious if you had given this any thought and if you think this would be an efficient way to go about it.
Overall (esp in 2026) the quickest way to know if it will be efficient would probably be to just go build it and measure its performance...