Skip to content

Commit a1ab9fb

Browse files
committed
perf(inline): allow inline-safe intrinsics in callee bodies
`classify` rejected any callee carrying an internal `Call`, even direct calls into an inline-safe `Intrinsic` (`Sqrt`, `Fabs`, `Fma`, `Sin`, `Cos`, `Pow`, `Log`, `Exp`, …). The infrastructure to detect these was already in place — `is_inline_safe_intrinsic` lists them and `callee_contains_inline_safe_intrinsic` checks for them — but the helpers were only wired into stats attribution, never into the classifier itself. The compounding effect on `bench_nbody_ref`: `advance(...)` calls `sqrt(d2)` inside the j-loop. The classifier saw the sqrt call, returned `Unsupported`, and `main`'s 10 M-iteration `advance(...)` call site never inlined. The j-loop body stayed opaque to LICM, auto-vectorize, and the rest of the HIR fixed point; LLVM only saw a hot call boundary. Bench impact (macOS aarch64, `--no-cache`): mandelbrot: ~410 ms → 315 ms (-23 %) nbody: ~1720 → 1581 ms (-8 %) nbody_ref: ~575 → 566 ms (within noise — LLVM was already inlining advance via its own IPA) fib / inlined / free_function_call: unchanged within noise Mirrored in `classify_recursive` (the relaxed classifier used by the depth-1 recursive inliner) so callees admitting sqrt-style intrinsics also become recursive-inline candidates.
1 parent d3e32a1 commit a1ab9fb

1 file changed

Lines changed: 23 additions & 0 deletions

File tree

crates/compiler/src/inline.rs

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -451,6 +451,13 @@ fn classify_recursive(callee: &HirFunction, self_id: HirId) -> CalleeClass {
451451
// Direct self-call — fine. Becomes the depth-1
452452
// residual after inlining.
453453
}
454+
HirInstruction::Call {
455+
callee: HirCallable::Intrinsic(i),
456+
..
457+
} if is_inline_safe_intrinsic(*i) => {
458+
// Inline-safe leaf intrinsics (Sqrt, Fabs, Fma,
459+
// …) — same rule as the standard classifier.
460+
}
454461
HirInstruction::Call { .. }
455462
| HirInstruction::IndirectCall { .. }
456463
| HirInstruction::Atomic { .. }
@@ -748,12 +755,28 @@ fn classify(callee: &HirFunction) -> CalleeClass {
748755
}
749756

750757
// Universal per-instruction safety check across every block.
758+
// The `Call` arm permits one specific shape: direct calls into
759+
// an inline-safe `Intrinsic` (`Sqrt`, `Fabs`, `Fma`, `Sin`,
760+
// `Cos`, `Pow`, …) — these are leaf hardware-or-libm ops, not
761+
// user functions, and admitting them is what lets the inliner
762+
// take callees like `advance` from `bench_nbody_ref` (which
763+
// calls `sqrt` in the hot loop). Without this admission,
764+
// `main`'s 10 M-iteration `advance(...)` call site never
765+
// inlines, the j-loop stays opaque to LICM / vectorisation,
766+
// and Cranelift / LLVM eat the call-frame overhead 10 M times.
767+
// Every other `Call` shape (free function, indirect, trait
768+
// method) still bails because the callee body would need its
769+
// own inlining or vtable resolution.
751770
let mut total_insts = 0usize;
752771
let mut callee_has_alloca = false;
753772
for block in callee.blocks.values() {
754773
total_insts += block.instructions.len();
755774
for inst in &block.instructions {
756775
match inst {
776+
HirInstruction::Call {
777+
callee: HirCallable::Intrinsic(i),
778+
..
779+
} if is_inline_safe_intrinsic(*i) => {}
757780
HirInstruction::Call { .. }
758781
| HirInstruction::IndirectCall { .. }
759782
| HirInstruction::Atomic { .. }

0 commit comments

Comments
 (0)