Accumulate CPU half-precision sums in float32 by sofinvalery · Pull Request #3488 · ml-explore/mlx

sofinvalery · 2026-05-06T09:20:31Z

Proposed changes

Accumulate CPU float16 and bfloat16 sum reductions in float32, while preserving the output dtype. This fixes precision loss in ops that use sum().

Fixes #3326.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

sofinvalery · 2026-05-06T20:05:55Z

Made it inline. Feels cleaner.

zcbenz

Allocating a new array would have heavy performance penalty, the correct way would be refactoring strided_reduce/contiguous_reduce to accumulate in float32 rather than the output type.

sofinvalery · 2026-05-07T17:54:58Z

Refactored strided_reduce/contiguous_reduce to support accumulation in a separate type.
float16/bfloat16 sums now accumulate in float32 without allocating a temp array.
Also added tests for reductions across full, contiguous, and strided axes.

zcbenz · 2026-05-08T02:02:57Z

    for (int i = 0; i < out.size(); i++, out_ptr++, in_ptr += reduction_size) {
-      *out_ptr = init;
-      contiguous_reduce(in_ptr, out_ptr, reduction_size, Op{}, init);
+      *out_ptr = contiguous_reduce(in_ptr, reduction_size, Op{}, init, init);


Is it supposed to be:

Suggested change

*out_ptr = contiguous_reduce(in_ptr, reduction_size, Op{}, init, init);

*out_ptr = contiguous_reduce(in_ptr, reduction_size, Op{}, *out_ptr, init);

zcbenz · 2026-05-08T02:06:21Z

  constexpr int N = std::min(simd::max_size<T>, simd::max_size<U>);
  simd::Simd<U, N> accumulator_v(init);
  while (size >= N) {
    accumulator_v = op(accumulator_v, simd::Simd<U, N>(simd::load<T, N>(x)));


When building for macOS 14 it seems that we can't simply convert Simd<float16> to Simd<float>:

/Users/runner/actions-runner/_work/mlx/mlx/mlx/backend/cpu/simd/accelerate_simd.h:63:40: error: no matching function for call to 'convert' 63 | Simd<T, N>(Simd<U, N> other) : value(asd::convert<scalar_t>(other.value)) {} | ^~~~~~~~~~~~~~~~~~~~~~ /Users/runner/actions-runner/_work/mlx/mlx/mlx/backend/cpu/reduce.cpp:113:39: note: in instantiation of function template specialization 'mlx::core::simd::Simd<float, 8>::Simd<__fp16>' requested here 113 | accumulator_v = op(accumulator_v, simd::Simd<U, N>(simd::load<T, N>(x))); | ^

zcbenz · 2026-05-08T02:09:18Z

    const std::vector<int>& axes) {
  if (rtype == Reduce::And) {
-    reduction_op<InT, bool, AndReduce>(in, out, axes, true);
+    reduction_op<InT, bool, AndReduce, bool>(in, out, axes, true);


I prefer omitting the optional AccT so it would be obvious when the code is accumulating in a different type.

sofinvalery force-pushed the sum-float-accum branch from 6aee023 to 10188e2 Compare May 6, 2026 19:58

zcbenz requested changes May 6, 2026

View reviewed changes

Accumulate CPU half-precision sums in float32

f869884

sofinvalery force-pushed the sum-float-accum branch from 10188e2 to f869884 Compare May 7, 2026 17:32

zcbenz reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accumulate CPU half-precision sums in float32#3488

Accumulate CPU half-precision sums in float32#3488
sofinvalery wants to merge 1 commit intoml-explore:mainfrom
sofinvalery:sum-float-accum

sofinvalery commented May 6, 2026

Uh oh!

sofinvalery commented May 6, 2026

Uh oh!

zcbenz left a comment

Uh oh!

sofinvalery commented May 7, 2026

Uh oh!

zcbenz May 8, 2026

Uh oh!

zcbenz May 8, 2026

Uh oh!

zcbenz May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	*out_ptr = contiguous_reduce(in_ptr, reduction_size, Op{}, init, init);
	out_ptr = contiguous_reduce(in_ptr, reduction_size, Op{}, out_ptr, init);

Conversation

sofinvalery commented May 6, 2026

Proposed changes

Checklist

Uh oh!

sofinvalery commented May 6, 2026

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

sofinvalery commented May 7, 2026

Uh oh!

zcbenz May 8, 2026

Choose a reason for hiding this comment

Uh oh!

zcbenz May 8, 2026

Choose a reason for hiding this comment

Uh oh!

zcbenz May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants