Skip to content

Fix windows on Neon#1317

Merged
JohanMabille merged 28 commits intoxtensor-stack:masterfrom
AntoinePrv:fix-win-arm-bis
Apr 27, 2026
Merged

Fix windows on Neon#1317
JohanMabille merged 28 commits intoxtensor-stack:masterfrom
AntoinePrv:fix-win-arm-bis

Conversation

@AntoinePrv
Copy link
Copy Markdown
Contributor

@AntoinePrv AntoinePrv commented Apr 23, 2026

Basically Windows ARM64 was never built. This is a first.

This removed all the wrapper macros and dispatcher hierarchy. The manual wrapper are a bit longer but at least are transparent for the reader, easier to search, and portable (also apache/arrow#49756 is reporting the previous implentation may not nicely inline).

The wrapper also now start with x_ because MSVC does not allow ::intrinsic so we needed to disambiguate names.

Beyond that, there was other small fixes where MSVC was not happy. I had to move some code around as well to reused other functions.

Close #1288 #1285 #1250 #612 #611

@AntoinePrv AntoinePrv marked this pull request as draft April 23, 2026 17:22
@AntoinePrv AntoinePrv force-pushed the fix-win-arm-bis branch 21 times, most recently from 5d0d6d8 to 9952436 Compare April 27, 2026 07:17
@AntoinePrv AntoinePrv marked this pull request as ready for review April 27, 2026 12:54
@JohanMabille JohanMabille merged commit c1e6636 into xtensor-stack:master Apr 27, 2026
79 checks passed
@AntoinePrv AntoinePrv deleted the fix-win-arm-bis branch April 27, 2026 15:34
@serge-sans-paille
Copy link
Copy Markdown
Contributor

@JohanMabille : I'm quite unhappy with that merge: the patch stack is not clean at all, which leads to a commit message that's full of noise.
It's quite a lot of changes and I would rather had time to review it properly ;-)
I'll do a post-merge review this evening.

@JohanMabille
Copy link
Copy Markdown
Member

Hey, sorry for that, I was a little too enthusiastic about having NEON working on Windows, and the possibility to release and move to C++17. We can still amend the commit message and make it cleaner; happy to commit any change according to your review before releasing.

@serge-sans-paille
Copy link
Copy Markdown
Contributor

Hey, sorry for that, I was a little too enthusiastic about having NEON working on Windows, and the possibility to release and move to C++17. We can still amend the commit message and make it cleaner; happy to commit any change according to your review before releasing.

I can understand this, I'm very happy with that update too. Between this, the support for s390x and the recent improvements, we're heading toward something very nice. And once C++17 makes the cut, it will be

\o\ |o| /o/

Copy link
Copy Markdown
Contributor

@serge-sans-paille serge-sans-paille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done my review as I would have done it if we were earlier in the process. Feel free to apply some or all of the comments.

uses: ilammy/msvc-dev-cmd@v1
with:
arch: amd64
arch: arm64
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have this change in a seperate commit (in the same patch stack, that's fine)

template <class R, class T, std::enable_if_t<std::is_same<R, uint8_t>::value && std::is_same<T, int64_t>::value, int> = 0>
XSIMD_INLINE uint8x16_t x_vreinterpretq(int64x2_t a) noexcept { return vreinterpretq_u8_s64(a); }
template <class R, class T, std::enable_if_t<std::is_same<R, uint8_t>::value && std::is_same<T, float>::value, int> = 0>
XSIMD_INLINE uint8x16_t x_vreinterpretq(float32x4_t a) noexcept { return vreinterpretq_u8_f32(a); }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would have been simpler to write (and faster to compile) through class specialization.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had in mind a single function with if constexpr, and backported it to C++14.
This way we do not have to change the call sites.

wrap::rotate_left_f32<N % 4>)
};
return dispatcher.apply(register_type(a), register_type(a));
return wrap::x_rotate_left<N, project_num_t<T>>(register_type(a), register_type(a));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to that PR, but I'm not a big fan of than function name project_num_t. I do understand why it's there though. Maybe map_to_sized_type_t ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it not the best name, especially since the pattern shows up a lot.
It's a mathematical projection, in that project_num_t<project_num_t<T>> is project_num_t<T>.
map_to_sized_type_t is good though.

{
XSIMD_INLINE uint64_t vmaxvq_u64(uint64x2_t a) noexcept
// TODO(c++17): Make a single function with if constexpr switch
template <class T, std::enable_if_t<std::is_same<T, uint8_t>::value, int> = 0>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this T parameter + the enable_if ? it looks like you could just write

XSIMD_INLINE uint8_t x_vmaxvq(uint8x16_t a) noexcept { return vmaxvq_u8(a); }

oh, of course that's becuase of windows that does not have strong types for each uint8x16_t etc type.`

I think there should be a big comment on top of the file to explain this.

Comment thread test/test_utils.hpp
std::string to_string_full_precision(T value)
{
// TODO(C++17): use std::to_chars
char buf[64];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this 64 be std::numeric_limits<T>::max_digits10 + 1/*sign, unsure of that one*/ +1 /*null*/ ?

Comment thread test/test_utils.hpp
// TODO(C++17): use std::to_chars
char buf[64];
std::snprintf(
buf, sizeof(buf),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::size(buf)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants