22
33## Sample Purpose
44
5- This sample demonstrates various techniques to perform a large matrix multiplcation where the matrix elements contain 8-bit integer data.
5+ This sample demonstrates various techniques to perform a large matrix multiplication where the matrix elements contain 8-bit integer data.
66The sample includes many different implementations:
77
881 . The "naive" implementation is a very simple implementation.
@@ -40,13 +40,13 @@ This sample will optionally use the following OpenCL extensions:
4040| :--| :-:| :--|
4141| ` -p <index> ` | 0 | Specify the index of the OpenCL platform to execute the sample on.
4242| ` -d <index> ` | 0 | Specify the index of the OpenCL device in the platform to execute on the sample on.
43- | ` --file <string> ` | ` matrix_kernels_bf16 .cl` | Specify the name of the file with the OpenCL kernel source.
43+ | ` --file <string> ` | ` matrix_kernels_i8 .cl` | Specify the name of the file with the OpenCL kernel source.
4444| ` --options <string> ` | None | Specify optional program build options.
4545| ` --matrixsize <int> ` | 512 | Specify the dimensions of the matrix.
4646| ` --iterations <int> ` | 16 | Specify the number of iterations for performance testing.
4747| ` --validate ` | n/a | Validate results for correctness.
4848| ` --zero ` | n/a | Initialize all matrices to zero.
49- | ` --identity ` | n/a | Initialize all matrices to to one.
49+ | ` --identity ` | n/a | Initialize all matrices to one.
5050| ` --fixed ` | n/a | Initialize all matrices to values computed from the matrix row and column.
5151| ` --emulate ` | n/a | Do not use specialized matrix multiplication extensions.
5252| ` --wallclock ` | n/a | Measure performance using wallclock time instead of event profiling.
@@ -57,4 +57,4 @@ This sample will optionally use the following OpenCL extensions:
5757
5858By default, the source matrices are populated with random data.
5959When validating results, it is recommended to use either "fixed" or "identity" data.
60- For best performance, use "zero" data" .
60+ For best performance, use "zero" data.
0 commit comments