Add contrib intdiv: fast integer division by invariant scalars using multiplication#2875
Add contrib intdiv: fast integer division by invariant scalars using multiplication#2875abhishek-iitmadras wants to merge 2 commits into
Conversation
jan-wassenberg
left a comment
There was a problem hiding this comment.
A bunch of comments :) Some influence others, so please read them all before addressing any.
| * | ||
| * We split the work into two steps: | ||
| * 1) Precompute parameters from the scalar divisor (multiplier + shifts). | ||
| * DivisorParams{U,S}<T> ComputeDivisorParams(T divisor); |
There was a problem hiding this comment.
Can we reuse any of the existing logic from base.h Divisor[64]?
There was a problem hiding this comment.
OK, good to use pre-exist API so:
-
now reuse the existing Divisor64 128-bit division primitive instead of duplicating it in our intdiv-inl.h
-
But to work correct we need to do chnage in base.h is to make Divisor64::Div128 public in the HWY_HAVE_DIV128 implementation [Currently it is private]. Then only intdiv-inl.h can call: hwy::Divisor64::Div128(high, divisor).
-
Also keep the full DivisorParamsU / DivisorParamsS logic in intdiv-inl.h because Divisor / Divisor64 do not cover signed params, 8/16-bit widening params, pow2 metadata, divisor storage for remainder, or floor-division correction. But the 128-bit high-half division primitive should definitely be shared.
| }; | ||
|
|
||
| template <> | ||
| struct MulType<uint8_t> { |
There was a problem hiding this comment.
We can use existing functionality: base.h also has
template <>
struct Relations<uint8_t> {
using Unsigned = uint8_t;
using Signed = int8_t;
using Wide = uint16_t;
};
etc, so we could use MulType = Relations::Wide.
There was a problem hiding this comment.
I tried but it partially applicable but not a direct replacement as mapping diverge for 32-bit and 64-bit
| return HWY_NAMESPACE::ComputeDivisorParams<T>(d); | ||
| } | ||
|
|
||
| template <typename T, HWY_IF_T_SIZE(T, 1), HWY_IF_SIGNED(T)> |
There was a problem hiding this comment.
Rather than SFINAE for T size 1..8, we can HWY_IF_T_SIZE_ONE_OF(T, (1 << 1) | (1 << 2) | (1 << 4) | (1 << 8)), or better yet, just static_assert IsPow2(sizeof(T)) within one function.
There was a problem hiding this comment.
ok got it , now collapse repeated size-based SFINAE overloads into single signed and unsigned overload using static_assert for supported integer sizes
| return HWY_NAMESPACE::ComputeDivisorParams<T>(d); | ||
| } | ||
|
|
||
| template <class D, class V = VecD<D>, typename T = TFromD_<D>, HWY_IF_UNSIGNED_D(D)> |
There was a problem hiding this comment.
Here also we could static_assert(IsUnsigned()) inside the function, given that you have a DivisorParamsU argument.
There was a problem hiding this comment.
same here also
| if constexpr (sizeof(T) <= 4) { | ||
| return static_cast<T>(Random32(&rng)); | ||
| } else { | ||
| const uint64_t hi = Random32(&rng); |
There was a problem hiding this comment.
We do have a Random64().
There was a problem hiding this comment.
Thanks
Done , i have used Random64() now
| } | ||
|
|
||
| template <typename T> | ||
| bool IsPow2(T x) { |
There was a problem hiding this comment.
Already defined in intdiv-inl.h?
There was a problem hiding this comment.
DOne, Removed from here
9b1a2f6 to
d728618
Compare
|
|
||
| template <> | ||
| struct MultiplierType<uint32_t> { | ||
| using type = uint32_t; |
There was a problem hiding this comment.
We can use using type = If<(sizeof(T) < 4), Relations::Wide, T>.
There was a problem hiding this comment.
As per your suggestion , i agree with reusing the existing Highway type relation, but i don't think If<(sizeof(T) < 4), MakeWide<T>, T> is safe here.
SO Looking at hwy/base.h, Relations<int64_t> has no Wide member (only Unsigned/Signed/Float/Narrow), so MakeWide<int64_t> is ill-formed.
SO And if i am not wrong then the main issue is that the type arguments to If still have to be well-formed before selection.
SO For int64_t, MakeWide<int64_t> is ill-formed because Relations<int64_t> does not define Wide. so even though sizeof(int64_t) < 4 is false, the expression still fails during substitution.
The partial specialization keeps the selection lazy:
template <typename T, bool kNeedsWiden = (sizeof(T) < 4)>
struct MultiplierType {
using type = T;
};
template <typename T>
struct MultiplierType<T, true> {
using type = MakeWide<T>;
};
Now this still reuses Highway's MakeWide<T> / Relations<T>::Wide for 8- and 16-bit types, but avoids instantiating MakeWide<T> for 32- and 64-bit types. That is also intentional because the 32/64-bit paths use same-width multiplier storage and have separate MulHigh / Div128 handling.
Correct me if i am wrong
d728618 to
44c3e62
Compare
44c3e62 to
964a47d
Compare
Signed-off-by: Abhishek Kumar <abhishek.r.kumar@fujitsu.com>
964a47d to
78b390e
Compare
|
Benchmarking script : https://gist.github.com/abhishek-iitmadras/169f995df2bf9b1e7827c712528af0c2 Machine Detail : https://instances.vantage.sh/aws/ec2/c7g.metal?currency=USD (SVE_256 Machine) Benchmarks were run on Arm SVE-256. The divisor is supplied via Google Benchmark arguments ( Measured sizes:
Runtime divisor sets:
Result :Table below reports All result numbers are in ns
Raw benchmark logs
|
|
little detail about benchmarking script : https://gist.github.com/abhishek-iitmadras/169f995df2bf9b1e7827c712528af0c2
|
|
|
||
| constexpr int kShift = static_cast<int>(sizeof(T) * 8); | ||
|
|
||
| #if defined(HWY_HAVE_ORDEREDDEMOTE2TO) |
There was a problem hiding this comment.
What's the issue here? I'd think/hope this is always available. What are the circumstances when it isn't? Can we remove the #if?
There was a problem hiding this comment.
What's the issue here? I'd think/hope this is always available. What are the circumstances when it isn't? Can we remove the
#if?
Thanks , you are right
I confused myself — #if is to false everywhere and every build has been silently going through the Combine + DemoteTo fallback path.
As per the docs, OrderedDemote2To is available whenever HWY_TARGET != HWY_SCALAR.
But i already did if constexpr (D::kPrivateLanes < 2) already routes HWY_SCALAR (and any other single-lane tag) to ScalarDivPerLane, so by the time we reach this line we're guaranteed to be on a target where OrderedDemote2To exists.
So the answer to "when isn't it available?" is: only on HWY_SCALAR, which can't reach this code path anyway.
SO i am removing the #if and calling OrderedDemote2To unconditionally in both the signed and unsigned IntDiv branches.
I guess numbers should improve slightly since they'll now actually hit the optimized demote path.
Thanks again to point out this
jan-wassenberg
left a comment
There was a problem hiding this comment.
I'll be out from Sat to May26, so let's try to land this first and address the #if and branch later.
|
Sorry , @jan-wassenberg |
This change adds a contrib module implementing fast integer division by invariant (loop-constant) divisors using multiplication and shifts, following Granlund & Montgomery, “Division by Invariant Integers Using Multiplication” (PLDI 1994).
Supports all scalar lane widths and signs:
This contrib module provides general-purpose, cross-architecture implementation of division by invariant scalars using multiplication, suitable for vectorized code built on Highway. It mirrors the GM(Algo) scheme and is conceptually similar to the integer SIMD division intrinsics used in NumPy’s npyv_intdiv, but expressed purely in Highway’s portable SIMD API.
Benchmarking Script : https://gist.github.com/abhishek-iitmadras/169f995df2bf9b1e7827c712528af0c2