Optimize gemv_n_sve_v1x3 kernel#5292
Conversation
| pg00 = svand_z(SV_TRUE(), pg0, pg00); | ||
| pg01 = svand_z(SV_TRUE(), pg0, pg01); | ||
| pg02 = svand_z(SV_TRUE(), pg0, pg02); | ||
| svbool_t pg_tail = SV_WHILE(i, m); |
There was a problem hiding this comment.
Is it better to pre-calculate this predicate outside of the loop ?
This is re-used again below.
There was a problem hiding this comment.
I think since we are calculating the predicate for the tail elements , it depends on i value , so if we remove outside of the loop then we have to calculate for (0 , m % sve_size) but that can go wrong sometime , since we want from (i, m) and not from 0 , whats your thought on this?
There was a problem hiding this comment.
I don't see why (0 , m % sve_size) wouldn't work since we increment i by sve_size in the main loop. Please also soo https://github.com/OpenMathLib/OpenBLAS/pull/5089/files#diff-d0b63f332b08eef9b57a1eec785ff43afc468108c60f237b0c4e9401df08b510R68
There was a problem hiding this comment.
Yes it will work, but its just that it will give the predicate for (0 , m % sve_size) index rather than the correct index as (i , m) , sure will make this change , Thanks.
CodSpeed Performance ReportMerging #5292 will improve performances by 10.54%Comparing Summary
Benchmarks breakdown
|
- Calculate predicate outside the loop - Divide matrix in blocks of 3
1ed7eb6 to
8279e68
Compare
|
LGTM. @martin-frbg any further comment ? |
x-axis -> M = N
y-axis -> GFLOPS (timing)