Skip to content

Commit 500e328

Browse files
authored
Merge pull request #5697 from martin-frbg/ext_doc
Update documentation of BLAS extensions
2 parents f6d5eb7 + 496af0d commit 500e328

1 file changed

Lines changed: 13 additions & 1 deletion

File tree

docs/extensions.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@ This page documents those non-standard APIs.
1313
| ?omatcopy | s,d,c,z | out-of-place transposition/copying |
1414
| ?geadd | s,d,c,z | ATLAS-like matrix add `B = α*A+β*B` |
1515
| ?gemmt | s,d,c,z | `gemm` but only a triangular part updated |
16-
16+
| cblas_?gemm_batch | s,d,c,z,b | `gemm` with several groups of input data
17+
|
18+
| cblas_?gemm_batch_strided | s,d,c,z,b | `gemm` with groups of data stored at fixed offsets in the input arrays
1719

1820
## bfloat16 functionality
1921

@@ -26,6 +28,15 @@ BLAS-like and conversion functions for `bfloat16` (available when OpenBLAS was c
2628
* `float cblas_sbdot` computes the dot product of two bfloat16 arrays
2729
* `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16
2830
* `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16
31+
* `void cblas_bgemv` performs the matrix-vector operations of GEMV with the input matrix, X vector and result as bfloat16
32+
* `void cblas_bgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16 and the output being bfloat16 as well
33+
34+
## half-precision float or fp16 functionality
35+
36+
BLAS-like and conversion functions for `hfloat16` (available when OpenBLAS was compiled with `BUILD_HFLOAT16=1`):
37+
38+
* `void cblas_shgemm` performs the matrix-matrix operations of GEMM with both input arrays containing hfloat16
39+
2940

3041
## Utility functions
3142

@@ -36,4 +47,5 @@ BLAS-like and conversion functions for `bfloat16` (available when OpenBLAS was c
3647
* `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell`
3748
* `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the CPU affinity mask of the given thread
3849
to the provided cpuset. Only available on Linux, with semantics identical to `pthread_setaffinity_np`.
50+
* `openblas_set_thread_callback_function` overrides the default multithreading backend with the provided argument
3951

0 commit comments

Comments
 (0)