Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion docs/extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ This page documents those non-standard APIs.
| ?omatcopy | s,d,c,z | out-of-place transposition/copying |
| ?geadd | s,d,c,z | ATLAS-like matrix add `B = α*A+β*B` |
| ?gemmt | s,d,c,z | `gemm` but only a triangular part updated |

| cblas_?gemm_batch | s,d,c,z,b | `gemm` with several groups of input data
|
| cblas_?gemm_batch_strided | s,d,c,z,b | `gemm` with groups of data stored at fixed offsets in the input arrays

## bfloat16 functionality

Expand All @@ -26,6 +28,15 @@ BLAS-like and conversion functions for `bfloat16` (available when OpenBLAS was c
* `float cblas_sbdot` computes the dot product of two bfloat16 arrays
* `void cblas_sbgemv` performs the matrix-vector operations of GEMV with the input matrix and X vector as bfloat16
* `void cblas_sbgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16
* `void cblas_bgemv` performs the matrix-vector operations of GEMV with the input matrix, X vector and result as bfloat16
* `void cblas_bgemm` performs the matrix-matrix operations of GEMM with both input arrays containing bfloat16 and the output being bfloat16 as well

## half-precision float or fp16 functionality

BLAS-like and conversion functions for `hfloat16` (available when OpenBLAS was compiled with `BUILD_HFLOAT16=1`):

* `void cblas_shgemm` performs the matrix-matrix operations of GEMM with both input arrays containing hfloat16


## Utility functions

Expand All @@ -36,4 +47,5 @@ BLAS-like and conversion functions for `bfloat16` (available when OpenBLAS was c
* `char * openblas_get_config()` returns the options OpenBLAS was built with, something like `NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Haswell`
* `int openblas_set_affinity(int thread_index, size_t cpusetsize, cpu_set_t *cpuset)` sets the CPU affinity mask of the given thread
to the provided cpuset. Only available on Linux, with semantics identical to `pthread_setaffinity_np`.
* `openblas_set_thread_callback_function` overrides the default multithreading backend with the provided argument

Loading