Skip to content

Commit ffbd2e3

Browse files
committed
Initial commit: Cache patterns benchmark example
Add particle simulation demonstrating Array of Structures (AoS) vs Structure of Arrays (SoA) for cache-friendly data layouts. - Implement AoS and SoA particle systems - Add benchmarks for position updates, kinetic energy, and gravity - Include documentation on cache-friendly patterns - Pin Rust toolchain to 1.83.0
0 parents  commit ffbd2e3

8 files changed

Lines changed: 355 additions & 0 deletions

File tree

.gitignore

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Rust build artifacts
2+
/target/
3+
Cargo.lock
4+
5+
# IDE files
6+
.vscode/
7+
.idea/
8+
*.swp
9+
*.swo
10+
*~
11+
12+
# OS files
13+
.DS_Store
14+
Thumbs.db

Cargo.toml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
[package]
2+
name = "cache-patterns"
3+
version = "0.1.0"
4+
edition = "2021"
5+
6+
[dependencies]
7+
8+
[dev-dependencies]
9+
divan = "0.1"
10+
11+
[[bench]]
12+
name = "particle_simulation"
13+
harness = false

README.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Cache Patterns Benchmark
2+
3+
This crate demonstrates the performance impact of different data layouts on CPU cache utilization through a particle physics simulation.
4+
5+
## Initial Assumption
6+
7+
**Hypothesis**: Data layout significantly impacts CPU cache behavior. Specifically, organizing data as a Structure of Arrays (SoA) should show measurably better cache performance than Array of Structures (AoS) when operations only access a subset of fields.
8+
9+
This benchmark is designed to validate this hypothesis using CodSpeed's walltime instrument, which provides hardware performance counters including cache hit/miss rates, memory bandwidth, and IPC (instructions per cycle).
10+
11+
## The Problem: Array of Structures (AoS) vs Structure of Arrays (SoA)
12+
13+
### Array of Structures (AoS) - Cache Unfriendly
14+
```rust
15+
struct Particle {
16+
position: Vec3, // 12 bytes
17+
velocity: Vec3, // 12 bytes
18+
mass: f32, // 4 bytes
19+
} // = 28 bytes per particle (40 with padding)
20+
21+
particles: Vec<Particle>
22+
```
23+
24+
**Memory layout**: `[pos0, vel0, mass0, pos1, vel1, mass1, pos2, vel2, mass2, ...]`
25+
26+
When we only need to update positions, we load entire cache lines containing velocity and mass data that we don't use, wasting bandwidth and cache space.
27+
28+
### Structure of Arrays (SoA) - Cache Friendly
29+
```rust
30+
struct ParticleSystem {
31+
positions: Vec<Vec3>,
32+
velocities: Vec<Vec3>,
33+
masses: Vec<f32>,
34+
}
35+
```
36+
37+
**Memory layout**:
38+
- `positions: [pos0, pos1, pos2, ...]`
39+
- `velocities: [vel0, vel1, vel2, ...]`
40+
- `masses: [mass0, mass1, mass2, ...]`
41+
42+
When we update positions, every byte in the cache line is useful data, maximizing cache efficiency.
43+
44+
## Expected Performance Characteristics
45+
46+
### AoS (Cache Unfriendly)
47+
- Higher L1/L2/L3 cache miss rates
48+
- Lower memory bandwidth utilization
49+
- More stalls waiting for memory
50+
51+
### SoA (Cache Friendly)
52+
- Lower cache miss rates (better spatial locality)
53+
- Higher effective memory bandwidth
54+
- Better prefetcher efficiency
55+
56+
## Running the Benchmarks
57+
58+
```bash
59+
# Run with standard benchmarking
60+
cargo bench
61+
62+
# Run with CodSpeed profiling to see cache counters
63+
# (requires CodSpeed setup with walltime instrument)
64+
codspeed run cargo bench
65+
```
66+
67+
## What to Look For in CodSpeed Profiling
68+
69+
When comparing AoS vs SoA versions with CodSpeed's walltime instrument, you should see:
70+
71+
1. **Cache Misses**: SoA should show significantly fewer L1/L2/L3 cache misses
72+
2. **Memory Operations**: Better cache line utilization in SoA version
73+
3. **Instructions Per Cycle (IPC)**: Higher IPC in SoA due to less memory stalls
74+
4. **Wall Time**: SoA should be faster, especially with larger datasets
75+
76+
## Benchmark Operations
77+
78+
Each version implements three operations:
79+
80+
1. **update_positions**: `position = position + velocity * dt`
81+
- Tests spatial locality when accessing two arrays
82+
83+
2. **compute_kinetic_energy**: `sum(0.5 * mass * velocity²)`
84+
- Tests cache behavior when skipping position data
85+
86+
3. **apply_gravity**: `velocity = velocity + gravity * dt`
87+
- Tests cache behavior when accessing only one field
88+
89+
## Dataset Sizes
90+
91+
- **Small**: 1,000 particles (~40 KB for AoS, ~32 KB for SoA)
92+
- **Medium**: 10,000 particles (~400 KB for AoS, ~320 KB for SoA)
93+
- **Large**: 100,000 particles (~4 MB for AoS, ~3.2 MB for SoA)
94+
95+
Different sizes stress different cache levels (L1/L2/L3).

benches/particle_simulation.rs

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
use cache_patterns::{aos, soa, Vec3};
2+
3+
fn main() {
4+
divan::main();
5+
}
6+
7+
// ============================================================================
8+
// Array of Structures (AoS) - Cache Unfriendly
9+
// ============================================================================
10+
11+
#[divan::bench(args = [1_000, 10_000, 100_000])]
12+
fn aos_update_positions(bencher: divan::Bencher, count: usize) {
13+
bencher
14+
.with_inputs(|| aos::ParticleSystem::new(count))
15+
.bench_values(|mut system| {
16+
system.update_positions(0.016);
17+
});
18+
}
19+
20+
#[divan::bench(args = [1_000, 10_000, 100_000])]
21+
fn aos_kinetic_energy(bencher: divan::Bencher, count: usize) {
22+
bencher
23+
.with_inputs(|| aos::ParticleSystem::new(count))
24+
.bench_values(|system| {
25+
divan::black_box(system.compute_kinetic_energy());
26+
});
27+
}
28+
29+
#[divan::bench(args = [1_000, 10_000, 100_000])]
30+
fn aos_apply_gravity(bencher: divan::Bencher, count: usize) {
31+
bencher
32+
.with_inputs(|| aos::ParticleSystem::new(count))
33+
.bench_values(|mut system| {
34+
system.apply_gravity(Vec3::new(0.0, -9.81, 0.0), 0.016);
35+
});
36+
}
37+
38+
// ============================================================================
39+
// Structure of Arrays - Cache Friendly
40+
// ============================================================================
41+
42+
#[divan::bench(args = [1_000, 10_000, 100_000])]
43+
fn soa_update_positions(bencher: divan::Bencher, count: usize) {
44+
bencher
45+
.with_inputs(|| soa::ParticleSystem::new(count))
46+
.bench_values(|mut system| {
47+
system.update_positions(0.016);
48+
});
49+
}
50+
51+
#[divan::bench(args = [1_000, 10_000, 100_000])]
52+
fn soa_kinetic_energy(bencher: divan::Bencher, count: usize) {
53+
bencher
54+
.with_inputs(|| soa::ParticleSystem::new(count))
55+
.bench_values(|system| {
56+
divan::black_box(system.compute_kinetic_energy());
57+
});
58+
}
59+
60+
#[divan::bench(args = [1_000, 10_000, 100_000])]
61+
fn soa_apply_gravity(bencher: divan::Bencher, count: usize) {
62+
bencher
63+
.with_inputs(|| soa::ParticleSystem::new(count))
64+
.bench_values(|mut system| {
65+
system.apply_gravity(Vec3::new(0.0, -9.81, 0.0), 0.016);
66+
});
67+
}

rust-toolchain.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[toolchain]
2+
channel = "1.92.0"

src/aos.rs

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
/// Array of Structures (AoS) - Cache Unfriendly
2+
/// When we iterate to update positions, we skip over velocity and mass data,
3+
/// leading to poor cache utilization
4+
5+
use crate::Vec3;
6+
7+
#[derive(Clone, Debug)]
8+
pub struct Particle {
9+
pub position: Vec3,
10+
pub velocity: Vec3,
11+
pub mass: f32,
12+
}
13+
14+
impl Particle {
15+
pub fn new(position: Vec3, velocity: Vec3, mass: f32) -> Self {
16+
Self {
17+
position,
18+
velocity,
19+
mass,
20+
}
21+
}
22+
}
23+
24+
pub struct ParticleSystem {
25+
pub particles: Vec<Particle>,
26+
}
27+
28+
impl ParticleSystem {
29+
pub fn new(count: usize) -> Self {
30+
let mut particles = Vec::with_capacity(count);
31+
for i in 0..count {
32+
let fi = i as f32;
33+
particles.push(Particle::new(
34+
Vec3::new(fi, fi * 2.0, fi * 3.0),
35+
Vec3::new(fi * 0.1, fi * 0.2, fi * 0.3),
36+
1.0 + fi * 0.01,
37+
));
38+
}
39+
Self { particles }
40+
}
41+
42+
/// Update particle positions based on velocity
43+
/// Poor cache behavior: we load entire Particle struct (40 bytes) but only need
44+
/// position (12 bytes) and velocity (12 bytes)
45+
pub fn update_positions(&mut self, dt: f32) {
46+
for particle in &mut self.particles {
47+
particle.position = particle.position.add(&particle.velocity.scale(dt));
48+
}
49+
}
50+
51+
/// Compute total kinetic energy
52+
/// Poor cache behavior: we access velocity and mass, skipping position data
53+
pub fn compute_kinetic_energy(&self) -> f32 {
54+
let mut total = 0.0;
55+
for particle in &self.particles {
56+
let v2 = particle.velocity.x * particle.velocity.x
57+
+ particle.velocity.y * particle.velocity.y
58+
+ particle.velocity.z * particle.velocity.z;
59+
total += 0.5 * particle.mass * v2;
60+
}
61+
total
62+
}
63+
64+
/// Apply gravity to all particles
65+
/// Poor cache behavior: we only need to modify velocity, but load entire struct
66+
pub fn apply_gravity(&mut self, gravity: Vec3, dt: f32) {
67+
for particle in &mut self.particles {
68+
particle.velocity = particle.velocity.add(&gravity.scale(dt));
69+
}
70+
}
71+
}

src/lib.rs

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
/// Particle simulation demonstrating cache-friendly vs cache-unfriendly data layouts
2+
pub mod aos;
3+
pub mod soa;
4+
5+
#[derive(Clone, Copy, Debug)]
6+
pub struct Vec3 {
7+
pub x: f32,
8+
pub y: f32,
9+
pub z: f32,
10+
}
11+
12+
impl Vec3 {
13+
pub fn new(x: f32, y: f32, z: f32) -> Self {
14+
Self { x, y, z }
15+
}
16+
17+
pub fn add(&self, other: &Vec3) -> Vec3 {
18+
Vec3 {
19+
x: self.x + other.x,
20+
y: self.y + other.y,
21+
z: self.z + other.z,
22+
}
23+
}
24+
25+
pub fn scale(&self, factor: f32) -> Vec3 {
26+
Vec3 {
27+
x: self.x * factor,
28+
y: self.y * factor,
29+
z: self.z * factor,
30+
}
31+
}
32+
}

src/soa.rs

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
/// Structure of Arrays - Cache Friendly
2+
/// Data is organized so that accessing positions only touches position data,
3+
/// leading to excellent cache utilization
4+
5+
use crate::Vec3;
6+
7+
pub struct ParticleSystem {
8+
pub positions: Vec<Vec3>,
9+
pub velocities: Vec<Vec3>,
10+
pub masses: Vec<f32>,
11+
}
12+
13+
impl ParticleSystem {
14+
pub fn new(count: usize) -> Self {
15+
let mut positions = Vec::with_capacity(count);
16+
let mut velocities = Vec::with_capacity(count);
17+
let mut masses = Vec::with_capacity(count);
18+
19+
for i in 0..count {
20+
let fi = i as f32;
21+
positions.push(Vec3::new(fi, fi * 2.0, fi * 3.0));
22+
velocities.push(Vec3::new(fi * 0.1, fi * 0.2, fi * 0.3));
23+
masses.push(1.0 + fi * 0.01);
24+
}
25+
26+
Self {
27+
positions,
28+
velocities,
29+
masses,
30+
}
31+
}
32+
33+
/// Update particle positions based on velocity
34+
/// Excellent cache behavior: positions and velocities are contiguous,
35+
/// all data in cache lines is useful
36+
pub fn update_positions(&mut self, dt: f32) {
37+
for i in 0..self.positions.len() {
38+
self.positions[i] = self.positions[i].add(&self.velocities[i].scale(dt));
39+
}
40+
}
41+
42+
/// Compute total kinetic energy
43+
/// Good cache behavior: sequential access to velocities and masses
44+
pub fn compute_kinetic_energy(&self) -> f32 {
45+
let mut total = 0.0;
46+
for i in 0..self.velocities.len() {
47+
let v = &self.velocities[i];
48+
let v2 = v.x * v.x + v.y * v.y + v.z * v.z;
49+
total += 0.5 * self.masses[i] * v2;
50+
}
51+
total
52+
}
53+
54+
/// Apply gravity to all particles
55+
/// Excellent cache behavior: only touching velocity array
56+
pub fn apply_gravity(&mut self, gravity: Vec3, dt: f32) {
57+
for velocity in &mut self.velocities {
58+
*velocity = velocity.add(&gravity.scale(dt));
59+
}
60+
}
61+
}

0 commit comments

Comments
 (0)