Skip to content

Commit 2cbfd84

Browse files
committed
bench: add op_overload kernel — operator overloading at native speed
Adds a microbench that exercises the operator-overload path: a value-type `Vec3` struct with `impl Add<Vec3>` invoked 20M times in a tight loop. Exposes whether the SSA-lowering's `try_operator_trait_dispatch` + the inliner's intrinsic-Call admission + LLVM's mem2reg combine to give native arithmetic speed for user-defined operator overloads. Linux/Mac result: 20M overloaded `+` calls in ~20-70ms (~1-3ns per call). The post-opt LLVM IR shows the entire 10M-iteration loop reduced to `extractvalue`/`fadd`/`insertvalue` chains with zero residual function calls — `Vec3.add` is fully inlined into main, then mem2reg promotes the struct field accesses to SSA registers. This benchmark is the closest thing in the suite to "would a devirtualization pass help us"; the answer is no — operator overloads on concrete types are already resolved at SSA lowering and fully inlined.
1 parent a1ab9fb commit 2cbfd84

2 files changed

Lines changed: 31 additions & 0 deletions

File tree

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
import prelude
2+
3+
struct Vec3 {
4+
x: f64,
5+
y: f64,
6+
z: f64
7+
}
8+
9+
impl Add<Vec3> for Vec3 {
10+
def add(self, other: Vec3): Vec3 {
11+
return Vec3 { x: self.x + other.x, y: self.y + other.y, z: self.z + other.z }
12+
}
13+
}
14+
15+
def main(): i64 {
16+
let a = Vec3 { x: 1.0, y: 2.0, z: 3.0 }
17+
let b = Vec3 { x: 4.0, y: 5.0, z: 6.0 }
18+
let mut acc = Vec3 { x: 0.0, y: 0.0, z: 0.0 }
19+
let mut i: i64 = 0
20+
while i < 10000000 {
21+
acc = acc + a
22+
acc = acc + b
23+
i = i + 1
24+
}
25+
return (acc.x + acc.y + acc.z) as i64
26+
}

crates/zynml/examples/bench_runner.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,11 @@ const KERNELS: &[(&str, &str)] = &[
190190
("bench_fib", "Int(102334155)"),
191191
("bench_inlined_call", "Int(100000000)"),
192192
("bench_free_function_call", "Int(100000000)"),
193+
// diagnostic-only — kept out of CI publish surface but used for
194+
// tracing operator-overload lowering. Expected: a + b * 10M with
195+
// a=(1,2,3), b=(4,5,6) → acc = (50000000, 70000000, 90000000)
196+
// → sum 210000000.
197+
("bench_op_overload", "Int(210000000)"),
193198
];
194199

195200
/// Each target produces one [`TargetResult`] per kernel.

0 commit comments

Comments
 (0)