|
| 1 | +# Cache Patterns Benchmark |
| 2 | + |
| 3 | +This crate demonstrates the performance impact of different data layouts on CPU cache utilization through a particle physics simulation. |
| 4 | + |
| 5 | +## Initial Assumption |
| 6 | + |
| 7 | +**Hypothesis**: Data layout significantly impacts CPU cache behavior. Specifically, organizing data as a Structure of Arrays (SoA) should show measurably better cache performance than Array of Structures (AoS) when operations only access a subset of fields. |
| 8 | + |
| 9 | +This benchmark is designed to validate this hypothesis using CodSpeed's walltime instrument, which provides hardware performance counters including cache hit/miss rates, memory bandwidth, and IPC (instructions per cycle). |
| 10 | + |
| 11 | +## The Problem: Array of Structures (AoS) vs Structure of Arrays (SoA) |
| 12 | + |
| 13 | +### Array of Structures (AoS) - Cache Unfriendly |
| 14 | +```rust |
| 15 | +struct Particle { |
| 16 | + position: Vec3, // 12 bytes |
| 17 | + velocity: Vec3, // 12 bytes |
| 18 | + mass: f32, // 4 bytes |
| 19 | +} // = 28 bytes per particle (40 with padding) |
| 20 | + |
| 21 | +particles: Vec<Particle> |
| 22 | +``` |
| 23 | + |
| 24 | +**Memory layout**: `[pos0, vel0, mass0, pos1, vel1, mass1, pos2, vel2, mass2, ...]` |
| 25 | + |
| 26 | +When we only need to update positions, we load entire cache lines containing velocity and mass data that we don't use, wasting bandwidth and cache space. |
| 27 | + |
| 28 | +### Structure of Arrays (SoA) - Cache Friendly |
| 29 | +```rust |
| 30 | +struct ParticleSystem { |
| 31 | + positions: Vec<Vec3>, |
| 32 | + velocities: Vec<Vec3>, |
| 33 | + masses: Vec<f32>, |
| 34 | +} |
| 35 | +``` |
| 36 | + |
| 37 | +**Memory layout**: |
| 38 | +- `positions: [pos0, pos1, pos2, ...]` |
| 39 | +- `velocities: [vel0, vel1, vel2, ...]` |
| 40 | +- `masses: [mass0, mass1, mass2, ...]` |
| 41 | + |
| 42 | +When we update positions, every byte in the cache line is useful data, maximizing cache efficiency. |
| 43 | + |
| 44 | +## Expected Performance Characteristics |
| 45 | + |
| 46 | +### AoS (Cache Unfriendly) |
| 47 | +- Higher L1/L2/L3 cache miss rates |
| 48 | +- Lower memory bandwidth utilization |
| 49 | +- More stalls waiting for memory |
| 50 | + |
| 51 | +### SoA (Cache Friendly) |
| 52 | +- Lower cache miss rates (better spatial locality) |
| 53 | +- Higher effective memory bandwidth |
| 54 | +- Better prefetcher efficiency |
| 55 | + |
| 56 | +## Running the Benchmarks |
| 57 | + |
| 58 | +```bash |
| 59 | +# Run with standard benchmarking |
| 60 | +cargo bench |
| 61 | + |
| 62 | +# Run with CodSpeed profiling to see cache counters |
| 63 | +# (requires CodSpeed setup with walltime instrument) |
| 64 | +codspeed run cargo bench |
| 65 | +``` |
| 66 | + |
| 67 | +## What to Look For in CodSpeed Profiling |
| 68 | + |
| 69 | +When comparing AoS vs SoA versions with CodSpeed's walltime instrument, you should see: |
| 70 | + |
| 71 | +1. **Cache Misses**: SoA should show significantly fewer L1/L2/L3 cache misses |
| 72 | +2. **Memory Operations**: Better cache line utilization in SoA version |
| 73 | +3. **Instructions Per Cycle (IPC)**: Higher IPC in SoA due to less memory stalls |
| 74 | +4. **Wall Time**: SoA should be faster, especially with larger datasets |
| 75 | + |
| 76 | +## Benchmark Operations |
| 77 | + |
| 78 | +Each version implements three operations: |
| 79 | + |
| 80 | +1. **update_positions**: `position = position + velocity * dt` |
| 81 | + - Tests spatial locality when accessing two arrays |
| 82 | + |
| 83 | +2. **compute_kinetic_energy**: `sum(0.5 * mass * velocity²)` |
| 84 | + - Tests cache behavior when skipping position data |
| 85 | + |
| 86 | +3. **apply_gravity**: `velocity = velocity + gravity * dt` |
| 87 | + - Tests cache behavior when accessing only one field |
| 88 | + |
| 89 | +## Dataset Sizes |
| 90 | + |
| 91 | +- **Small**: 1,000 particles (~40 KB for AoS, ~32 KB for SoA) |
| 92 | +- **Medium**: 10,000 particles (~400 KB for AoS, ~320 KB for SoA) |
| 93 | +- **Large**: 100,000 particles (~4 MB for AoS, ~3.2 MB for SoA) |
| 94 | + |
| 95 | +Different sizes stress different cache levels (L1/L2/L3). |
0 commit comments