Skip to content

optimise dot using thread throttling for NEOVERSE V1#5242

Merged
martin-frbg merged 2 commits intoOpenMathLib:developfrom
abhishek-iitmadras:abhishekk_dot
Apr 23, 2025
Merged

optimise dot using thread throttling for NEOVERSE V1#5242
martin-frbg merged 2 commits intoOpenMathLib:developfrom
abhishek-iitmadras:abhishekk_dot

Conversation

@abhishek-iitmadras
Copy link
Copy Markdown
Contributor

@abhishek-iitmadras abhishek-iitmadras commented Apr 22, 2025

optimise dot kernel using thread throttling give upto 2x to 3x perf boost.

SDOT : 10000 < SIZE<= 100000L

image

SDOT : 100000 < SIZE<= 1000000L

image

DDOT : 10000 < SIZE<= 100000L

image

DDOT : 100000 < SIZE<= 1000000L

image

@abhishek-iitmadras
Copy link
Copy Markdown
Contributor Author

Please help with review
cc @martin-frbg @annop-w

@annop-w
Copy link
Copy Markdown
Contributor

annop-w commented Apr 23, 2025

LGTM 👍

@abhishek-iitmadras
Copy link
Copy Markdown
Contributor Author

Is the OpenBLAS pipeline on Azure private or public repo (https://dev.azure.com/martinkroeker/martinkroeker)?

@martin-frbg
Copy link
Copy Markdown
Collaborator

Public, but the only NeoverseV1 job we currently have is an AWS c7g instance managed through Cirun.io (and sponsored by Quansight)

@martin-frbg martin-frbg added this to the 0.3.30 milestone Apr 23, 2025
@martin-frbg martin-frbg merged commit 70dff3b into OpenMathLib:develop Apr 23, 2025
83 of 86 checks passed
@abhishek-iitmadras abhishek-iitmadras deleted the abhishekk_dot branch April 24, 2025 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants