Add ASIMD widening/narrowing, FP vec, and LD1/ST1 multi-reg instructions by dop-amin · Pull Request #433 · slothy-optimizer/slothy

dop-amin · 2026-04-02T14:11:01Z

Add new instruction classes to aarch64_neon.py: uaddl/uaddw/saddl family, rshrn/sqxtun/sqrshrun, urhadd, fmla/fmls/fmul/fadd/fsub vec and by-element forms, faddp vec/scalar, vmovi, Stp_Q, subs_imm, sub_shifted, sxtw, vdup_lane, Vrev, mov_vtov_s, vins_d_from_v, q_ld1_2/q_ld1_4 families (as independent classes, not Ldr_Q subclasses)
Add timing (execution units, inverse throughput, latency) for all new instructions in cortex_a55, cortex_a72_frontend, neoverse_n1_experimental
Update is_vector_load to include q_ld1_2/q_ld1_4 families
Add test coverage in instructions.s

* Add fast mul->mla forwarding (accumulate_latency=1): vmul/vmul_lane result forwarding to the accumulate operand of vmla/vmla_lane/vmls/vmls_lane * Add fast mla->mla forwarding (accumulate_latency=1): chained vmla/vmla_lane/vmls/vmls_lane accumulate operand forwarding * Add fast mull->mlal forwarding (accumulate_latency=1): Vmull result forwarding to the accumulate operand of Vmlal subclasses

- Add new instruction classes to aarch64_neon.py: uaddl/uaddw/saddl family, rshrn/sqxtun/sqrshrun, urhadd, fmla/fmls/fmul/fadd/fsub vec and by-element forms, faddp vec/scalar, vmovi, Stp_Q, subs_imm, sub_shifted, sxtw, vdup_lane, Vrev, mov_vtov_s, vins_d_from_v, q_ld1_2/q_ld1_4 families (as independent classes, not Ldr_Q subclasses) - Add timing (execution units, inverse throughput, latency) for all new instructions in cortex_a55, cortex_a72_frontend, neoverse_n1_experimental - Update is_vector_load to include q_ld1_2/q_ld1_4 families - Add test coverage in instructions.s - Add subs_wform to cortex_a72_frontend (latency=1, ITP=1, units=INT)

dop-amin force-pushed the extend-aarch64-model branch 6 times, most recently from 9cbbe84 to c39d2c9 Compare April 3, 2026 15:59

dop-amin added 2 commits April 16, 2026 20:40

dop-amin force-pushed the extend-aarch64-model branch from c39d2c9 to d05c706 Compare April 16, 2026 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ASIMD widening/narrowing, FP vec, and LD1/ST1 multi-reg instructions#433

Add ASIMD widening/narrowing, FP vec, and LD1/ST1 multi-reg instructions#433
dop-amin wants to merge 2 commits into
mainfrom
extend-aarch64-model

dop-amin commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dop-amin commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant