[SYCL][Matrix] Add get-coord API and general query example by dkhaldi · Pull Request #7964 · intel/llvm

dkhaldi · 2023-01-09T21:47:31Z

Remove the general query from TODO list as an example is added to the llvm-test-suite ([SYCL][Matrix] Add a more general query example llvm-test-suite#1492)
Add get coord API and remove it from TODO list
Remove the local memory future API looking as it is no more relevant

…he llvm-test-suite - Add get coord API and remove it from TODO list - Remove the local memory future API looking as it is no more relevant

bader

A few fixes: markdown linter issues and one typo.

JackAKirk · 2023-01-11T11:55:44Z

Looks pretty good to me. For the Query interface, I think it would be good to try to get some community feedback if possible. I suppose the argument for the general query is that with several backends, it could be easier to ask the API for the set of valid combinations rather than search for the documentation. This is fair. Although I think we should still make an effort to make documentation of supported sizes/types for different backends as accessible and clear as possible; so that people are not forced to use the general query when they may prefer just looking at the docs.

At the moment the documentation for supported types is in e.g. sycl_ext_intel_matrix doc. For the Nvidia case a current slight problem is that we don't actually have any Nvidia only features at the moment, so it is a bit of a misnomer to have a e.g. sycl_ext_cuda_matrix doc similar to what I have here (https://github.com/intel/llvm/pull/6968/files) which currently only lists the currently supported values of sycl_ext_oneapi_matrix APIs in the ext_oneapi_cuda backend. In the future even if we do add the cuda only matrix features there can be other backends that encounter the situation where they need to document supported values of sycl_ext_oneapi_matrix APIs in that backend but don't have a backend specific matrix features extension.

I thought there could be two better options.

a) I can rename sycl_ext_oneapi_matrix_cuda.asciidoc sycl_ext_oneapi_matrix_cuda_supported_vals.asciidoc or similar, remove all the dpc++ extension boilerplate docs from that doc, just leaving the supported value information. Then move the "Supported Combinations Per Hardware" section in sycl_ext_intel_matrix.asciidoc to a similar file like sycl_ext_oneapi_matrix_intel_supported_vals.asciidoc.

OR

b) we just move the "Supported Combinations Per Hardware" section for all backends to the main sycl_ext_oneapi_matrix.asciidoc doc and I just delete this file completely: https://github.com/intel/llvm/blob/e50a2f5f97acb12db1de78c9ad739b931c77b03f/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_cuda_matrix.asciidoc.

What do you think? cc @gmlueck also.

gmlueck · 2023-01-11T18:32:44Z

we just move the "Supported Combinations Per Hardware" section for all backends to the main sycl_ext_oneapi_matrix.asciidoc doc

If we want to document all the matrix constraints for each device, I think it probably makes sense to document them all in a single table (i.e. in the same document). For now, this could be a non-normative appendix in the main "sycl_ext_oneapi_matrix.asciidoc" document. If the matrix API is eventually adopted into the core SYCL language (and the extension goes away), we will need to find some other place to list these constraints, but we can worry about this later.

JackAKirk · 2023-01-12T10:03:01Z

we just move the "Supported Combinations Per Hardware" section for all backends to the main sycl_ext_oneapi_matrix.asciidoc doc

If we want to document all the matrix constraints for each device, I think it probably makes sense to document them all in a single table (i.e. in the same document). For now, this could be a non-normative appendix in the main "sycl_ext_oneapi_matrix.asciidoc" document. If the matrix API is eventually adopted into the core SYCL language (and the extension goes away), we will need to find some other place to list these constraints, but we can worry about this later.

Sounds good to me.

dkhaldi · 2023-01-12T15:39:10Z

@JackAKirk

At the moment the documentation for supported types is in e.g. sycl_ext_intel_matrix doc. For the Nvidia case a current slight problem is that we don't actually have any Nvidia only features at the moment, so it is a bit of a misnomer to have a e.g. sycl_ext_cuda_matrix doc similar to what I have here (https://github.com/intel/llvm/pull/6968/files) which currently only lists the currently supported values of sycl_ext_oneapi_matrix APIs in the ext_oneapi_cuda backend. In the future even if we do add the cuda only matrix features there can be other backends that encounter the situation where they need to document supported values of sycl_ext_oneapi_matrix APIs in that backend but don't have a backend specific matrix features extension.

You should document what your implementation is supporting, not what Nvidia hardware supports. In the joint matrix code in the CUDA backend, there are very specific combinations that are allowed, this is what should be documented and returned by this query. It is worth mentioning that what we specify in the documentation and the query is not what the hardware supports (note that the XMX sizes are disclosed information). We document what the implementation can do in an optimal way. You can refer to them as logical sizes rather than hardware sizes. In all cases, performance kernels should care about the maximum load it can do at a time not about the matrix hardware mad instruction. Then, reuse that in an optimal way and feet it to mad instruction.

A specific use case appears in one of our performance kernels: a SG should do more than one DPAS instruction to get optimal results. In most cases, especially when matrix sizes are large, the optimal size MxN is 32x64 on PVC, so instead of the user having to fully unroll 32x64 loop and then create multiple joint_matrix_mad operations, the implementation can provide such combination, document it in the document and in the query. In this case, the user will have one iteration in the SG to worry about.

I thought there could be two better options.

Having all the combinations per backend (AMX, XMX8, XMX16, different SM versions for Nvidia) in the main document is fine, especially that the query interface is in the main document. So the combinations will complement the query API so the user knows what to expect when they use the query interface.

- Put all combinations in appendix - move get_coord to the main document - Correct the example by converting USM pointers to multi_ptr

JackAKirk · 2023-01-13T13:49:02Z

You should document what your implementation is supporting, not what Nvidia hardware supports. In the joint matrix code in the CUDA backend, there are very specific combinations that are allowed, this is what should be documented and returned by this query.

Yes the table here, https://github.com/intel/llvm/blob/e50a2f5f97acb12db1de78c9ad739b931c77b03f/sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_cuda_matrix.asciidoc#valid-joint_matrix-types-and-shapes, is up to date with what the implementation is supporting. I can add it to the Appendix following what you did here in a subsequent PR. Or if you prefer to add it directly in this PR, feel free.

It is worth mentioning that what we specify in the documentation and the query is not what the hardware supports (note that the XMX sizes are disclosed information). We document what the implementation can do in an optimal way. You can refer to them as logical sizes rather than hardware sizes. In all cases, performance kernels should care about the maximum load it can do at a time not about the matrix hardware mad instruction. Then, reuse that in an optimal way and feet it to mad instruction.

A specific use case appears in one of our performance kernels: a SG should do more than one DPAS instruction to get optimal results. In most cases, especially when matrix sizes are large, the optimal size MxN is 32x64 on PVC, so instead of the user having to fully unroll 32x64 loop and then create multiple joint_matrix_mad operations, the implementation can provide such combination, document it in the document and in the query. In this case, the user will have one iteration in the SG to worry about.

I see what you mean by logical vs hardware sizes. The initial sycl-blas commit on joint_matrix also has some relevance to your point I think: https://github.com/codeplaysoftware/sycl-blas . BTW the initial sycl-dnn joint_matrix accelerated commit will follow shortly (it is quite a bit larger so review takes a while).

…oint_matrix_apply map function

JackAKirk · 2023-02-10T10:52:47Z

+
+While this document presents the core API that unifies Intel AMX,
+Intel XMX, and Nvidia Tensor Cores, the implementations support
+slightly different versions of the API. For this reason, we introduce


The situation is different now from what this paragraph states; because this document is specifically for the unified matrix interfaces which are portable. So I think it is best to replace this paragraph completely with e.g. the standard template for feature macro versioning.

@JackAKirk, the standard template for feature macro versioning if not for experimental feature.
Once this moves out of experimental and becomes supported, this whole section "Matrix API versions" and SYCL_EXT_ONEAPI_MATRIX_VERSION macro will be removed. We won't need to keep the legacy API and tests. Right now, we only keep them to ensure current users have something working while we guide them through all these changes until we have something final (hopefully in this PR).

Do you suggest I remove this now?

I just think that anyone reading this now as documentation on joint_matrix will be thrown by this (wrong) statement

"the implementations support
slightly different versions of the API"

And since this does seem to be the main place that people will arrive at for joint_matrix documentation currently, it makes sense to address this now.

I will remove it especially that we made value 4 as the default already.

JackAKirk · 2023-02-10T11:25:49Z


-IMPORTANT: Matrix layout defaulting to `layout::dynamic` applies only to matrix with `use::accumulator`
+IMPORTANT: Matrix layout defaulting to `layout::dynamic` applies only
+to matrix with `use::accumulator`


Suggested change

to matrix with `use::accumulator`

to `joint_matrix` with `use::accumulator`

JackAKirk · 2023-02-10T11:28:08Z

-#### Use
-Specifying the usage of the matrix: matrix left (A), matrix right (B) or accumulator +(C)+ is required by backend implementations to reason about the layout of the matrix in registers.
+==== Use
+Specifying the usage of the matrix: matrix left (A), matrix right (B)


matrix left and matrix right are not defined (or equivalently A / B aren't defined).

JackAKirk · 2023-07-03T13:24:47Z

+} // namespace sycl::ext::oneapi::experimental::matrix
+```
+This function copies `Rows x Cols` elements of type `T` from joint
+matrix `src` to joint matrix `dest`. The two matrcies must have the


Suggested change

matrix `src` to joint matrix `dest`. The two matrcies must have the

matrix `src` to joint matrix `dest`. The two matrices must have the

JackAKirk · 2023-07-03T13:37:08Z

+  the user whether a specific combination is valid or not. This takes
+  place when the user specifies all template parameters.
+
+- Default values: this provides a default shape if the user does not


Can you make the implementation choose default values with portability in mind? e.g return a value for XMX that matches the default value for AMX (If I remember correctly there is a unique case satisfying this?).

There is currently no such case that satisfies XMX of DG2 and XMX of PVC. But it can be added. Currently the default is the max.

bader

sycl/ReleaseNotes.md changes look good to me.

gmlueck · 2023-07-31T20:51:25Z

Hi @dkhaldi. I just wanted to let you know that I have 4 unresolved comments above. I'm not pushing to resolve them faster, but I wanted to make sure you weren't waiting for me to do something. Some of the comments are hidden and you need to click "Load more" to see them. They are:

Two broken links
One table formatting problem
An unresolved issue with the TF32 overload of joint_matrix_load

dkhaldi · 2023-08-02T19:06:33Z

Hi @dkhaldi. I just wanted to let you know that I have 4 unresolved comments above. I'm not pushing to resolve them faster, but I wanted to make sure you weren't waiting for me to do something. Some of the comments are hidden and you need to click "Load more" to see them. They are:

Two broken links

One table formatting problem

An unresolved issue with the TF32 overload of joint_matrix_load

Hi @gmlueck, I just fixed the 4 unresolved comments and added clarifications for joint_matrix_copy.
Do they look good to you now?

…ptr is not supported

dkhaldi · 2023-08-28T15:41:53Z

@intel/llvm-gatekeepers can you please merge?

- Remove the general query from TODO list as an example is added to t…

9628bd0

…he llvm-test-suite - Add get coord API and remove it from TODO list - Remove the local memory future API looking as it is no more relevant

bader reviewed Jan 9, 2023

View reviewed changes

dkhaldi added 4 commits January 9, 2023 14:13

add an other distribution example

39875df

add revision history

e42ef4a

Bader comments

8bb98c1

better wording

48386d6

dkhaldi marked this pull request as ready for review January 10, 2023 16:41

dkhaldi requested a review from a team as a code owner January 10, 2023 16:41

dkhaldi requested review from JackAKirk and gmlueck January 10, 2023 16:44

gmlueck reviewed Jan 11, 2023

View reviewed changes

Incorporate Greg comments and other improvements, specifically:

1e85155

- Put all combinations in appendix - move get_coord to the main document - Correct the example by converting USM pointers to multi_ptr

gmlueck reviewed Jan 19, 2023

View reviewed changes

Comment thread sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc Outdated

Comment thread sycl/doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc Outdated

dkhaldi added 5 commits January 30, 2023 10:38

Update the specification document to follow the formal template

6f91525

add tf32 type and conversion function

cdcab5a

correct the matrix types in the appendix

04e18fe

correct the matrix types in the appendix

9403a38

remove _t from the types

ddb87f1

gmlueck mentioned this pull request Feb 2, 2023

[SYCL][Matrix] Add initial get_coord API #7851

Merged

dkhaldi added 2 commits February 4, 2023 12:34

Specify in Status that joint matrix is an optional kernel feature

8a8e0a9

Move the iteration-style EWOps to the Intel extension and introduce j…

7e610aa

…oint_matrix_apply map function

JackAKirk reviewed Feb 10, 2023

View reviewed changes

Address Jack's comments

509056c

JackAKirk reviewed Jul 3, 2023

View reviewed changes

dkhaldi added 2 commits July 28, 2023 07:56

address Greg, Jack, and Alexey comments

08fd2db

Clarify use of must when referring to the query interface

d7d0a70

bader approved these changes Jul 31, 2023

View reviewed changes

dkhaldi added 2 commits August 2, 2023 11:49

Address Greg's comments: fix 2 broken lines, const multi_ptr, line wrap

bf8e00c

Add clarifications about joint_matrix_copy

84af291

gmlueck reviewed Aug 2, 2023

View reviewed changes

Comment thread sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

gmlueck reviewed Aug 2, 2023

View reviewed changes

Comment thread sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc Outdated

dkhaldi added 2 commits August 7, 2023 09:11

Add non const overload to tf32 load as implicit conversion for multi_…

2c2af7d

…ptr is not supported

minor clarification

e8bde89

dkhaldi mentioned this pull request Aug 16, 2023

[SYCL][matrix] Update the query interface with the latest joint matrix approved syntax #10847

Closed

fix width of query table

a7f92ce

gmlueck reviewed Aug 25, 2023

View reviewed changes

Comment thread sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc Outdated

Comment thread sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc Outdated

dkhaldi and others added 2 commits August 25, 2023 12:47

fix the width for the right table

789b593

Avoid line breaks in table by using source block

ee28250

gmlueck approved these changes Aug 25, 2023

View reviewed changes

dkhaldi added 2 commits August 28, 2023 08:25

add the conflicted file first in order to resolve the conflict

2d80d16

Merge branch 'intel:sycl' into get-coord-doc

901252b

dm-vodopyanov merged commit 38ac212 into intel:sycl Aug 28, 2023

This was referenced Aug 29, 2023

[SYCL][Matrix spec] keep deletion of assign op and copy ctor but change signature of joint_matrix_mad #11007

Merged

[SYCL][Matrix spec] Remove deleted folder that came back by accident in a merge #11015

Merged

yubingex007-a11y mentioned this pull request Sep 19, 2023

[SYCL][Matrix] syntax changes as preparation before moving joint matrix from experimental namespace #11215

Merged

	to matrix with `use::accumulator`
	to `joint_matrix` with `use::accumulator`

	matrix `src` to joint matrix `dest`. The two matrcies must have the
	matrix `src` to joint matrix `dest`. The two matrices must have the

Uh oh!

Conversation

dkhaldi commented Jan 9, 2023

Uh oh!

bader left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JackAKirk commented Jan 11, 2023

Uh oh!

gmlueck commented Jan 11, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JackAKirk commented Jan 12, 2023

Uh oh!

dkhaldi commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackAKirk commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackAKirk Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bader left a comment

Choose a reason for hiding this comment

Uh oh!

gmlueck commented Jul 31, 2023

Uh oh!

dkhaldi commented Aug 2, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dkhaldi commented Aug 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dkhaldi commented Jan 12, 2023 •

edited

Loading

JackAKirk commented Jan 13, 2023 •

edited

Loading

JackAKirk Feb 10, 2023 •

edited

Loading