native-cpu: verify NEON kernels on aarch64, add linuxArm64 test path#786
Merged
Merged
Conversation
Add a shared `nativeTest` source set so the NativeKn matmul parity tests run on both linuxX64 and linuxArm64 (same suite, two codegens), and a custom linuxArm64Test Exec task that runs the cross-built test binary under the Kotlin/Native-bundled qemu-aarch64 (overridable via -PskainetQemu / -PskainetAarch64Sysroot / -PskainetAarch64Cc). Extend coverage from q5k/q6k to all NEON kernels: fp32, q4k (dotprod), q5k, q6k, q8_0, plus the provider registry test. All 23 pass on linuxArm64 under QEMU and on the physical SL2610 (Cortex-A55: asimddp + fphp present, i8mm absent, matching -march=armv8.2-a+fp16+dotprod). objdump confirms genuine SIMD (udot/sdot/fmla), not the scalar fallback. Clear the BOARD-VERIFY-PENDING banner in skainet_simd.h -> AARCH64-VERIFIED, and fix a pre-existing config-cache violation in the crossArm64 onlyIf closures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
michalharakal
added a commit
that referenced
this pull request
Jul 5, 2026
…nt 2.07× Bumps VERSION_NAME 0.33.0 -> 0.34.0. Bundles the develop changes since 0.33.0: the new skainet-data-source module (URI-backed sources, HF auth, raw format parsers, suspend data pipeline DSL) + dataset operation views and richer batches (#784/#785), the bf16-native DSL -> StableHLO export path and the pluggable per-phase/per-target compile-optimization seam (#788/#791), NEON K-quant matmul perf (block-outer order + fused Q8 int8 dot, 2.07x Q4_K on Cortex-A55) with aarch64 board verification (#786/#787), LayerNorm f32 normalization + rank-0 tensor-type emission fixes, macOS host build fix (#789), Code of Conduct (#790), and the offline markup-antora docs image (#781). Minor bump (not patch): new published module skainet-data-source; all data-api additions are default-bearing (no source-incompatible changes). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a shared
nativeTestsource set so the NativeKn matmul parity tests run on both linuxX64 and linuxArm64 (same suite, two codegens), and a custom linuxArm64Test Exec task that runs the cross-built test binary under the Kotlin/Native-bundled qemu-aarch64 (overridable via -PskainetQemu / -PskainetAarch64Sysroot / -PskainetAarch64Cc).Extend coverage from q5k/q6k to all NEON kernels: fp32, q4k (dotprod), q5k, q6k, q8_0, plus the provider registry test. All 23 pass on linuxArm64 under QEMU and on the physical SL2610 (Cortex-A55: asimddp + fphp present, i8mm absent, matching -march=armv8.2-a+fp16+dotprod). objdump confirms genuine SIMD (udot/sdot/fmla), not the scalar fallback.
Clear the BOARD-VERIFY-PENDING banner in skainet_simd.h -> AARCH64-VERIFIED, and fix a pre-existing config-cache violation in the crossArm64 onlyIf closures.