Fix windows on Neon#1317
Conversation
5d0d6d8 to
9952436
Compare
9952436 to
893cd31
Compare
6fc9157 to
1d044b4
Compare
This reverts commit 15745ae.
|
@JohanMabille : I'm quite unhappy with that merge: the patch stack is not clean at all, which leads to a commit message that's full of noise. |
|
Hey, sorry for that, I was a little too enthusiastic about having NEON working on Windows, and the possibility to release and move to C++17. We can still amend the commit message and make it cleaner; happy to commit any change according to your review before releasing. |
I can understand this, I'm very happy with that update too. Between this, the support for s390x and the recent improvements, we're heading toward something very nice. And once C++17 makes the cut, it will be \o\ |o| /o/ |
serge-sans-paille
left a comment
There was a problem hiding this comment.
I've done my review as I would have done it if we were earlier in the process. Feel free to apply some or all of the comments.
| uses: ilammy/msvc-dev-cmd@v1 | ||
| with: | ||
| arch: amd64 | ||
| arch: arm64 |
There was a problem hiding this comment.
I'd rather have this change in a seperate commit (in the same patch stack, that's fine)
| template <class R, class T, std::enable_if_t<std::is_same<R, uint8_t>::value && std::is_same<T, int64_t>::value, int> = 0> | ||
| XSIMD_INLINE uint8x16_t x_vreinterpretq(int64x2_t a) noexcept { return vreinterpretq_u8_s64(a); } | ||
| template <class R, class T, std::enable_if_t<std::is_same<R, uint8_t>::value && std::is_same<T, float>::value, int> = 0> | ||
| XSIMD_INLINE uint8x16_t x_vreinterpretq(float32x4_t a) noexcept { return vreinterpretq_u8_f32(a); } |
There was a problem hiding this comment.
this would have been simpler to write (and faster to compile) through class specialization.
There was a problem hiding this comment.
I had in mind a single function with if constexpr, and backported it to C++14.
This way we do not have to change the call sites.
| wrap::rotate_left_f32<N % 4>) | ||
| }; | ||
| return dispatcher.apply(register_type(a), register_type(a)); | ||
| return wrap::x_rotate_left<N, project_num_t<T>>(register_type(a), register_type(a)); |
There was a problem hiding this comment.
not related to that PR, but I'm not a big fan of than function name project_num_t. I do understand why it's there though. Maybe map_to_sized_type_t ?
There was a problem hiding this comment.
I agree it not the best name, especially since the pattern shows up a lot.
It's a mathematical projection, in that project_num_t<project_num_t<T>> is project_num_t<T>.
map_to_sized_type_t is good though.
| { | ||
| XSIMD_INLINE uint64_t vmaxvq_u64(uint64x2_t a) noexcept | ||
| // TODO(c++17): Make a single function with if constexpr switch | ||
| template <class T, std::enable_if_t<std::is_same<T, uint8_t>::value, int> = 0> |
There was a problem hiding this comment.
why do you need this T parameter + the enable_if ? it looks like you could just write
XSIMD_INLINE uint8_t x_vmaxvq(uint8x16_t a) noexcept { return vmaxvq_u8(a); }
oh, of course that's becuase of windows that does not have strong types for each uint8x16_t etc type.`
I think there should be a big comment on top of the file to explain this.
| std::string to_string_full_precision(T value) | ||
| { | ||
| // TODO(C++17): use std::to_chars | ||
| char buf[64]; |
There was a problem hiding this comment.
shouldn't this 64 be std::numeric_limits<T>::max_digits10 + 1/*sign, unsure of that one*/ +1 /*null*/ ?
| // TODO(C++17): use std::to_chars | ||
| char buf[64]; | ||
| std::snprintf( | ||
| buf, sizeof(buf), |
There was a problem hiding this comment.
nit: std::size(buf)
Basically Windows ARM64 was never built. This is a first.
This removed all the wrapper macros and dispatcher hierarchy. The manual wrapper are a bit longer but at least are transparent for the reader, easier to search, and portable (also apache/arrow#49756 is reporting the previous implentation may not nicely inline).
The wrapper also now start with
x_because MSVC does not allow::intrinsicso we needed to disambiguate names.Beyond that, there was other small fixes where MSVC was not happy. I had to move some code around as well to reused other functions.
Close #1288 #1285 #1250 #612 #611