Skip to content

AArch64: Missing instructions: asr, csel_xzr_ne, cmp_xzr#444

Open
willieyz wants to merge 3 commits into
mainfrom
aarch64-sup-asr-csel-cmp
Open

AArch64: Missing instructions: asr, csel_xzr_ne, cmp_xzr#444
willieyz wants to merge 3 commits into
mainfrom
aarch64-sup-asr-csel-cmp

Conversation

@willieyz
Copy link
Copy Markdown
Collaborator

@willieyz willieyz commented May 10, 2026

asr x11, x12, x7
asr x11, x12, #7
csel x11, x10, xzr, eq
csel x11, x10, xzr, ne
csel x11, x10, xzr, lt
csel x11, x10, xzr, gt
cmp x3, xzr

to support aarch64, and uArch model a55/a72.

asr(register) and asr(immediate)

According to A64 Base Instruction Descriptions, page C6-1820 and
page C6-1822, these two instructions are the aliases of:
asr (register) ---> ASRV,
asr (immediate) ---> SBFM,

  • a55 SWOG ASRV(page: 19/48)

    • latency: 1
    • Inverse throughput: 2/1 = 2
    • ExecutionUnit: SCALAR (ALU0, ALU1)
  • a55 SWOG SBFM (page: 21/48)

    • latency: 2
    • Inverse throughput: 2/2 = 1
    • ExecutionUnit: SCALAR (ALU0, ALU1)
  • a72 SWOG ASRV (page: 9/42)

    • latency: 1
    • Inverse throughput: 2/2 = 1
    • ExecutionUnit: INT (INT0, INT1)
  • a72 SWOG SBFM (page: 12/42)

    • latency: 1
    • Inverse throughput: 2/2 = 1
    • ExecutionUnit: INT (INT0, INT1)

csel_xzr_ne

This pattern is a variant of csel using the zero register xzr.
This commit reuses the existing csel uArch model definition for the
uArch model

  • a55 SWOG CSEL(page: 18/48)

    • latency: 1
    • Inverse throughput: 2/2 = 1
    • ExecutionUnit: SCALAR (ALU0, ALU1)
  • a72 SWOG CSEL(page: 8/42)

    • latency: 1
    • Inverse throughput: 2/2 = 1
    • ExecutionUnit: INT (INT0, INT1)

cmp_xzr

This pattern is a variant of cmp using the zero register xzr.

This commit reuses the existing cmp uArch model definition for the
uArch model, and cmp is alias of SUBS, so we reference the SUBS to
model this instruction.

  • a55 SWOG SUBS(page: 18/48)

    • latency: 2
    • Inverse throughput: 2/2 = 1
    • ExecutionUnit: SCALAR (ALU0, ALU1)
  • a72 SWOG SUBS(page: 8/48)

    • latency: 2
    • Inverse throughput: 1/1 = 1
    • ExecutionUnit: MINT(M)

willieyz added 2 commits May 12, 2026 00:11
This commit add the asr (register) and asr (immediate)instruction
support the a55, a72 model.
According to A64 Base Instruction Descriptions, page C6-1820 and
page C6-1822, these two instructions are the aliases of:
asr (register)  ---> ASRV,
asr (immediate) ---> SBFM,

- a55 SWOG ASRV(page: 19/48)
  - latency: 1
  - Inverse throughput: 2/1 = 2
  - ExecutionUnit: SCALAR (ALU0, ALU1)

- a55 SWOG SBFM (page: 21/48)
  - latency: 2
  - Inverse throughput: 2/2 = 1
  - ExecutionUnit: SCALAR (ALU0, ALU1)

- a72 SWOG ASRV (page: 9/42)
  - latency: 1
  - Inverse throughput: 2/2 = 1
  - ExecutionUnit: INT (INT0, INT1)

- a72 SWOG SBFM (page: 12/42)
  - latency: 1
  - Inverse throughput: 2/2 = 1
  - ExecutionUnit: INT (INT0, INT1)

- This commit also refactor the existed asr_wform's latency, it should
  be 2 instead of 1.(reference from SBFM, since asr (immediate) is alias
  of SBFM)

Signed-off-by: willieyz <willie.zhao@chelpis.com>
This commit adds support for the csel_xzr_ne instruction to the
A55, A72 uArch model.

This pattern is a variant of csel using the zero register xzr.
This commit reuses the existing csel uArch model definition for the
uArch model

- a55 SWOG CSEL(page: 18/48)
  - latency: 1
  - Inverse throughput: 2/2 = 1
  - ExecutionUnit: SCALAR (ALU0, ALU1)

- a72 SWOG CSEL(page: 8/42)
  - latency: 1
  - Inverse throughput: 2/2 = 1
  - ExecutionUnit: INT (INT0, INT1)

Signed-off-by: willieyz <willie.zhao@chelpis.com>
@willieyz willieyz force-pushed the aarch64-sup-asr-csel-cmp branch 2 times, most recently from 172c4ce to 6a6aac3 Compare May 11, 2026 16:42
This commit adds support for the `cmp_xzr` instruction to the A55, A72
 uArch model.
This pattern is a variant of cmp using the zero register xzr.

This commit reuses the existing cmp uArch model definition for the
uArch model, and cmp (shift register) is alias of
SUBS(according to page C6-1953 of Aarch64 Base Instruction
Descriptions), so we reference the SUBS to model this instruction.

- a55 SWOG SUBS(page: 18/48)
  - latency: 2
  - Inverse throughput: 2/2 = 1
  - ExecutionUnit: SCALAR (ALU0, ALU1)

- a72 SWOG SUBS(page: 8/48)
  - latency: 2
  - Inverse throughput: 1/1 = 1
  - ExecutionUnit: MINT(M)

Signed-off-by: willieyz <willie.zhao@chelpis.com>
@willieyz willieyz force-pushed the aarch64-sup-asr-csel-cmp branch from 6a6aac3 to ee10676 Compare May 11, 2026 16:56
@willieyz willieyz requested a review from mkannwischer May 11, 2026 17:31
@willieyz willieyz marked this pull request as ready for review May 11, 2026 17:31
@dop-amin
Copy link
Copy Markdown
Collaborator

dop-amin commented May 21, 2026

Hi @willieyz,

Thanks for the changes!
Almost everything looks fine to me, one tiny nit:
The commit message for ee10676 says

- a72 SWOG SUBS(page: 8/48)
  - latency: 2
  - Inverse throughput: 1/1 = 1
  - ExecutionUnit: MINT(M)

the PR says:

a72 SWOG SUBS(page: 8/48)
    latency: 1
    Inverse throughput: 2/2 = 1
    ExecutionUnit: INT (INT0, INT1)

I think the former should be correct and is also what the code reflects. Could you, for the sake of consistency, update the description if you agree with me on this one?

Amin

@willieyz
Copy link
Copy Markdown
Collaborator Author

Hello @dop-amin,

Thank you very much for the review, I really appreciate it.

Yes, I totally agree with you. The PR description is inconsistent and should align with the commit message.
I apologize for this mistake I made, I've updated the PR description accordingly. Thanks again!

Willie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AArch64: Add support for asr and csel, cmp involving xzr

2 participants