Add thread throttling profile for DGEMV on NEOVERSEV1 by shubhamsvc · Pull Request #5175 · OpenMathLib/OpenBLAS

shubhamsvc · 2025-03-12T07:10:54Z

This PR introduces thread throttling for DGEMV on Neoverse V1.

Benchmarking results for matrix sizes [2,1024]:
Machine: AWS Graviton3 Processor

- dgemv_n: Geometric mean speedup of 2.2x

- dgemv_t: Geometric mean speedup of 2.7x

shubhamsvc · 2025-03-13T05:53:46Z

Please help with review
cc @martin-frbg @annop-w @michalowski-arm

annop-w · 2025-03-13T09:29:47Z


+//thread throttling for dgemv
+#if defined(DYNAMIC_ARCH) || defined(NEOVERSEV1)
+static inline int get_dgemv_optimal_nthreads_neoversev1(BLASLONG MN, int ncpu) {


Instead of defining a new function, I think it is cleaner to just use get_gemv_optimal_nthreads_<uarch>.
Inside get_gemv_optimal_nthreads_<uarch>, we can #ifdef DOUBLE.

Yes, please keep this inside the existing function.

Done as suggested

annop-w · 2025-03-13T09:31:37Z

@@ -98,6 +116,8 @@ static inline int get_gemv_optimal_nthreads(BLASLONG MN) {
 #endif
 #if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)


Suggested change

#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)

#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(BFLOAT16)

annop-w · 2025-03-13T09:32:59Z

 #if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)
  return get_gemv_optimal_nthreads_neoversev1(MN, ncpu);
+#elif defined(NEOVERSEV1) && !defined(COMPLEX)  && defined(DOUBLE) && !defined(BFLOAT16)
+  return get_dgemv_optimal_nthreads_neoversev1(MN, ncpu);


We can remove.

Removed as suggested

annop-w · 2025-03-13T09:33:33Z

+  return get_dgemv_optimal_nthreads_neoversev1(MN, ncpu);
 #elif defined(NEOVERSEV2) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)
  return get_gemv_optimal_nthreads_neoversev2(MN, ncpu);
 #elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)


Suggested change

#elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)

#elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(BFLOAT16)

Done as suggested

annop-w · 2025-03-13T09:33:48Z

+  if (strcmp(gotoblas_corename(), "neoversev1") == 0) {
+    return get_dgemv_optimal_nthreads_neoversev1(MN, ncpu);
+  }
+


We can remove.

Removed as suggested

annop-w · 2025-03-13T09:34:52Z

+: MN < 435600L    ? MIN(ncpu, 24)        
+: MN < 810000L    ? MIN(ncpu, 32)        
+: MN < 1050625   ? MIN(ncpu, 40)        
+: ncpu;                    


Could we please move this inside get_gemv_optimal_nthreads_neoversev1 and use #ifdef DOUBLE ?

annop-w · 2025-03-18T10:20:03Z

@shubhamsvc Thank you for the changes. LGTM.

shubhamsvc force-pushed the dgemv_thread_throttling branch from e311258 to d7a2b6b Compare March 13, 2025 05:58

annop-w suggested changes Mar 13, 2025

View reviewed changes

shubham-fujitsu added 3 commits March 18, 2025 13:14

Add thread throttling profile for DGEMV on NEOVERSEV1

b6cb5ec

Add thread throttling for dynamic arch neoversev1

189dbbc

Simplified thread throttling function in gemv

8e289ec

shubhamsvc force-pushed the dgemv_thread_throttling branch from d7a2b6b to 8e289ec Compare March 18, 2025 07:54

martin-frbg added this to the 0.3.30 milestone Mar 25, 2025

martin-frbg merged commit 4e3afa7 into OpenMathLib:develop Mar 25, 2025
86 checks passed

		@@ -98,6 +116,8 @@ static inline int get_gemv_optimal_nthreads(BLASLONG MN) {
		#endif
		#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)

	#elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16)
	#elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(BFLOAT16)

Conversation

shubhamsvc commented Mar 12, 2025

Uh oh!

shubhamsvc commented Mar 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

annop-w commented Mar 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants