Skip to content

native-cpu: verify NEON kernels on aarch64, add linuxArm64 test path#786

Merged
michalharakal merged 1 commit into
developfrom
feature/native-cpu-arm64-neon-verify
Jul 2, 2026
Merged

native-cpu: verify NEON kernels on aarch64, add linuxArm64 test path#786
michalharakal merged 1 commit into
developfrom
feature/native-cpu-arm64-neon-verify

Conversation

@michalharakal

Copy link
Copy Markdown
Contributor

Add a shared nativeTest source set so the NativeKn matmul parity tests run on both linuxX64 and linuxArm64 (same suite, two codegens), and a custom linuxArm64Test Exec task that runs the cross-built test binary under the Kotlin/Native-bundled qemu-aarch64 (overridable via -PskainetQemu / -PskainetAarch64Sysroot / -PskainetAarch64Cc).

Extend coverage from q5k/q6k to all NEON kernels: fp32, q4k (dotprod), q5k, q6k, q8_0, plus the provider registry test. All 23 pass on linuxArm64 under QEMU and on the physical SL2610 (Cortex-A55: asimddp + fphp present, i8mm absent, matching -march=armv8.2-a+fp16+dotprod). objdump confirms genuine SIMD (udot/sdot/fmla), not the scalar fallback.

Clear the BOARD-VERIFY-PENDING banner in skainet_simd.h -> AARCH64-VERIFIED, and fix a pre-existing config-cache violation in the crossArm64 onlyIf closures.

Add a shared `nativeTest` source set so the NativeKn matmul parity tests run
on both linuxX64 and linuxArm64 (same suite, two codegens), and a custom
linuxArm64Test Exec task that runs the cross-built test binary under the
Kotlin/Native-bundled qemu-aarch64 (overridable via -PskainetQemu /
-PskainetAarch64Sysroot / -PskainetAarch64Cc).

Extend coverage from q5k/q6k to all NEON kernels: fp32, q4k (dotprod), q5k,
q6k, q8_0, plus the provider registry test. All 23 pass on linuxArm64 under
QEMU and on the physical SL2610 (Cortex-A55: asimddp + fphp present, i8mm
absent, matching -march=armv8.2-a+fp16+dotprod). objdump confirms genuine
SIMD (udot/sdot/fmla), not the scalar fallback.

Clear the BOARD-VERIFY-PENDING banner in skainet_simd.h -> AARCH64-VERIFIED,
and fix a pre-existing config-cache violation in the crossArm64 onlyIf closures.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@michalharakal michalharakal merged commit a3a22aa into develop Jul 2, 2026
6 of 7 checks passed
@michalharakal michalharakal deleted the feature/native-cpu-arm64-neon-verify branch July 2, 2026 21:11
michalharakal added a commit that referenced this pull request Jul 5, 2026
…nt 2.07×

Bumps VERSION_NAME 0.33.0 -> 0.34.0. Bundles the develop changes since 0.33.0:
the new skainet-data-source module (URI-backed sources, HF auth, raw format
parsers, suspend data pipeline DSL) + dataset operation views and richer
batches (#784/#785), the bf16-native DSL -> StableHLO export path and the
pluggable per-phase/per-target compile-optimization seam (#788/#791), NEON
K-quant matmul perf (block-outer order + fused Q8 int8 dot, 2.07x Q4_K on
Cortex-A55) with aarch64 board verification (#786/#787), LayerNorm f32
normalization + rank-0 tensor-type emission fixes, macOS host build fix
(#789), Code of Conduct (#790), and the offline markup-antora docs image (#781).

Minor bump (not patch): new published module skainet-data-source; all data-api
additions are default-bearing (no source-incompatible changes).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant