Skip to content

Add ASIMD widening/narrowing, FP vec, and LD1/ST1 multi-reg instructions#433

Draft
dop-amin wants to merge 2 commits into
mainfrom
extend-aarch64-model
Draft

Add ASIMD widening/narrowing, FP vec, and LD1/ST1 multi-reg instructions#433
dop-amin wants to merge 2 commits into
mainfrom
extend-aarch64-model

Conversation

@dop-amin
Copy link
Copy Markdown
Collaborator

@dop-amin dop-amin commented Apr 2, 2026

  • Add new instruction classes to aarch64_neon.py: uaddl/uaddw/saddl family, rshrn/sqxtun/sqrshrun, urhadd, fmla/fmls/fmul/fadd/fsub vec and by-element forms, faddp vec/scalar, vmovi, Stp_Q, subs_imm, sub_shifted, sxtw, vdup_lane, Vrev, mov_vtov_s, vins_d_from_v, q_ld1_2/q_ld1_4 families (as independent classes, not Ldr_Q subclasses)
  • Add timing (execution units, inverse throughput, latency) for all new instructions in cortex_a55, cortex_a72_frontend, neoverse_n1_experimental
  • Update is_vector_load to include q_ld1_2/q_ld1_4 families
  • Add test coverage in instructions.s

@dop-amin dop-amin force-pushed the extend-aarch64-model branch 6 times, most recently from 9cbbe84 to c39d2c9 Compare April 3, 2026 15:59
* Add fast mul->mla forwarding (accumulate_latency=1): vmul/vmul_lane
  result forwarding to the accumulate operand of vmla/vmla_lane/vmls/vmls_lane
* Add fast mla->mla forwarding (accumulate_latency=1): chained
  vmla/vmla_lane/vmls/vmls_lane accumulate operand forwarding
* Add fast mull->mlal forwarding (accumulate_latency=1): Vmull result
  forwarding to the accumulate operand of Vmlal subclasses
- Add new instruction classes to aarch64_neon.py: uaddl/uaddw/saddl
  family, rshrn/sqxtun/sqrshrun, urhadd, fmla/fmls/fmul/fadd/fsub vec
  and by-element forms, faddp vec/scalar, vmovi, Stp_Q, subs_imm,
  sub_shifted, sxtw, vdup_lane, Vrev, mov_vtov_s, vins_d_from_v,
  q_ld1_2/q_ld1_4 families (as independent classes, not Ldr_Q subclasses)
- Add timing (execution units, inverse throughput, latency) for all new
  instructions in cortex_a55, cortex_a72_frontend, neoverse_n1_experimental
- Update is_vector_load to include q_ld1_2/q_ld1_4 families
- Add test coverage in instructions.s
- Add subs_wform to cortex_a72_frontend (latency=1, ITP=1, units=INT)
@dop-amin dop-amin force-pushed the extend-aarch64-model branch from c39d2c9 to d05c706 Compare April 16, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant