[SM6.10][Exec][Bugfix] Fix OuterProduct/AccumulateToDescriptor Smoke Tests for Thread Matrices#8387
[SM6.10][Exec][Bugfix] Fix OuterProduct/AccumulateToDescriptor Smoke Tests for Thread Matrices#8387V-FEXrt wants to merge 2 commits intomicrosoft:mainfrom
Conversation
| Device, DxcSupport, std::move(Op), | ||
| [NumElements, Params, FillValue](LPCSTR Name, std::vector<BYTE> &Data, | ||
| st::ShaderOp *) { | ||
| VERIFY_IS_TRUE(fillInputBuffer(Name, Data, Params.CompType, NumElements, |
There was a problem hiding this comment.
If the layout isn't RowMajor, this needs to run a ConvertLinearAlgebraMatrix
There was a problem hiding this comment.
<edit: the comment here was in the wrong place>
|
The title is a little misleading, AccumulateToDescriptor for Thread Scope matrices require OuterProductOptimal layouts, not all thread matrices. |
|
@anupamachandra yep, my bad. I threw the first draft of this together a bit too quickly. I'm working on updating it now. Thanks for pointing that out! |
| SS << " -DUSE=" << static_cast<int>(Params.Use); | ||
| SS << " -DSCOPE=" << static_cast<int>(Params.Scope); | ||
| SS << " -DSTRIDE=" << Params.strideBytes(); | ||
| SS << " -DSTRIDE=" << Params.rowStride(); |
There was a problem hiding this comment.
The stride is a problem for group shared load and store, from spec, the stride of group shared is the count of elements, so it should be N or M for group shared.
it needs to fix:
__builtin_LinAlg_MatrixLoadFromMemory(
Mat, GsData, OFFSET, STRIDE, LAYOUT);
__builtin_LinAlg_MatrixStoreToMemory(
Mat, GsData, OFFSET, STRIDE, LAYOUT);
also, group shared offset is set to 0 from test, it's okay here, but I guess the offset for group shared also the count of elements?
| // flatten the 2D index into a 1D index then scale by element size | ||
| // Always store row-major and work it out in the test runner | ||
| uint coordToByteOffset(uint2 coord) { | ||
| return (coord.y * N_DIM + coord.x) * ELEM_SIZE; |
There was a problem hiding this comment.
This is not related to this PR, but I guess coordToByteOffset should be this?
return (coord.x * N_DIM + coord.y) * ELEM_SIZE;
Fixes #8386