LibCEED has a single user facing API for creating and using the libCEED objects ({ref}CeedVector, {ref}CeedBasis, etc).
Different Ceed backends are selected by instantiating a different {ref}Ceed object to create the other libCEED objects, in a bridge pattern.
At runtime, the user can select the different backend implementations to target different hardware, such as CPUs or GPUs.
When designing new features, developers should place the function definitions for the user facing API in the header /include/ceed/ceed.h.
The basic implementation of these functions should typically be placed in /interface/*.c files.
The interface should pass any computationally expensive or hardware specific operations to a backend implementation.
A new method for the associated libCEED object can be added in /include/ceed-impl.h, with a corresponding CEED_FTABLE_ENTRY in /interface/ceed.c to allow backends to set their own implementations of this method.
Then in the creation of the backend specific implementation of the object, typically found in /backends/[impl]/ceed-[impl]-[object].c, the developer creates the backend implementation of the specific method and calls {c:func}CeedSetBackendFunction to set this implementation of the method for the backend.
Any supplemental functions intended to be used in the interface or by the backends may be added to the backend API in the header /include/ceed/backend.h.
The basic implementation of these functions should also be placed in /interface/*.c files.
LibCEED generally follows a "CPU first" implementation strategy when adding new functionality to the user facing API.
If there are no performance specific considerations, it is generally recommended to include a basic CPU default implementation in /interface/*.c.
Any new functions must be well documented and tested.
Once the user facing API and the default implementation are in place and verified correct via tests, then the developer can focus on hardware specific implementations (AVX, CUDA, HIP, etc.) as necessary.
A Ceed backend is not required to implement all libCEED objects or {ref}CeedOperator methods.
There are three mechanisms by which a Ceed backend can inherit implementations from another Ceed backend.
-
Delegation - Developers may use {c:func}
CeedSetDelegateto set a general delegate {ref}Ceedobject. This delegate {ref}Ceedwill provide the implementation of any libCEED objects that parent backend does not implement. For example, the/cpu/self/xsmm/serialbackend implements theCeedTensorContractobject itself but delegates all other functionality to the/cpu/self/opt/serialbackend. -
Object delegation - Developers may use {c:func}
CeedSetObjectDelegateto set a delegate {ref}Ceedobject for a specific libCEED object. This delegate {ref}Ceedwill only provide the implementation of that specific libCEED object for the parent backend. Object delegation has higher precedence than delegation. -
Operator fallback - Developers may use {c:func}
CeedSetOperatorFallbackCeedto set a {ref}Ceedobject to provide any unimplemented {ref}CeedOperatormethods that support preconditioning, such as {c:func}CeedOperatorLinearAssemble. The parent backend must implement the basic {ref}CeedOperatorfunctionality. Like the delegates above, this fallback {ref}Ceedobject should be created and set in the backendCeedInitfunction. In order to use operator fallback, the parent backend and fallback backend must use compatible E-vector and Q-vector layouts. For example,/gpu/cuda/genfalls back to/gpu/cuda/reffor missing {ref}CeedOperatorpreconditioning support methods. If an unimplemented method is called, then the parent/gpu/cuda/gen{ref}Ceedobject uses its fallback/gpu/cuda/ref{ref}Ceedobject to create a clone of the {ref}CeedOperator. This clone {ref}CeedOperatoris then used for the unimplemented preconditioning support methods.
There are 4 general 'families' of backend implementations. As internal data layouts are specific to backend families, it is generally not possible to delegate between backend families.
The basic CPU with the simplest implementation is /cpu/self/ref/serial.
This backend contains the basic implementations of most objects that other backends rely upon.
Most of the other CPU backends only update the {ref}CeedOperator and CeedTensorContract objects.
The /cpu/self/ref/blockend and /cpu/self/opt/* backends delegate to the /cpu/self/ref/serial backend.
The /cpu/self/ref/blocked backend updates the {ref}CeedOperator to use an E-vector and Q-vector ordering when data for 8 elements are interlaced to provide better vectorization.
The /cpu/self/opt/* backends update the {ref}CeedOperator to apply the action of the operator in 1 or 8 element batches, depending upon if the blocking strategy is used.
This reduced the memory required to utilize this backend significantly.
The /cpu/self/avx/* and /cpu/self/xsmm/* backends delegate to the corresponding /cpu/self/opt/* backends.
These backends update the CeedTensorContract objects using AVX intrinsics and libXSMM functions, respectively.
The /cpu/self/memcheck/* backends delegate to the /cpu/self/ref/* backends.
These backends replace many of the implementations with methods that include more verification checks and a memory management model that more closely matches the memory management for GPU backends.
These backends rely upon the Valgrind Memcheck tool and Valgrind headers.
The CUDA, HIP, and SYCL backend families all follow similar designs. The CUDA and HIP backends are very similar, with minor differences. While the SYCL backend was based upon the CUDA and HIP backends, there are more internal differences to accommodate OpenCL and Intel hardware.
The /gpu/*/ref backends provide basic functionality.
In these backends, the operator is applied in multiple separate kernel launches, following the libCEED operator decomposition, where first {ref}CeedElemRestriction kernels map from the L-vectors to E-vectors, then {ref}CeedBasis kernels map from the E-vectors to Q-vectors, then the {ref}CeedQFunction kernel provides the action of the user quadrature point function, and the transpose {ref}CeedBasis and {ref}CeedElemRestriction kernels are applied to go back to the E-vectors and finally the L-vectors.
These kernels apply to all points across all elements in order to maximize the amount of work each kernel launch has.
Some of these kernels are compiled at runtime via NVRTC, HIPRTC, or OpenCL RTC.
The /gpu/*/shared backends delegate to the corresponding /gpu/*/ref backends.
These backends use shared memory to improve performance for the {ref}CeedBasis kernels.
All other libCEED objects are delegated to /gpu/*/ref.
These kernels are compiled at runtime via NVRTC, HIPRTC, or OpenCL RTC.
The /gpu/*/gen backends delegate to the corresponding /gpu/*/shared backends.
These backends write a single comprehensive kernel to apply the action of the {ref}CeedOperator, significantly improving performance by eliminating intermediate data structures and reducing the total number of kernel launches required.
This kernel is compiled at runtime via NVRTC, HIPRTC, or OpenCL RTC.
The /gpu/*/magma backends delegate to the corresponding /gpu/cuda/ref and /gpu/hip/ref backends.
These backends provide better performance for {ref}CeedBasis kernels but do not have the improvements from the /gpu/*/gen backends for {ref}CeedOperator.
Ceed backends are free to use any E-vector and Q-vector data layout (including never fully forming these vectors) so long as the backend passes the t5** series tests and all examples.
There are several common layouts for L-vectors, E-vectors, and Q-vectors, detailed below:
-
L-vector layouts
- L-vectors described by a standard {ref}
CeedElemRestrictionhave a layout described by theoffsetsarray andcomp_strideparameter. Data for nodei, componentj, elementkcan be found in the L-vector at indexoffsets[i + k*elem_size] + j*comp_stride. - L-vectors described by a strided {ref}
CeedElemRestrictionhave a layout described by thestridesarray. Data for nodei, componentj, elementkcan be found in the L-vector at indexi*strides[0] + j*strides[1] + k*strides[2].
- L-vectors described by a standard {ref}
-
E-vector layouts
- If possible, backends should use {c:func}
CeedElemRestrictionSetELayout()to use thet2**tests. If the backend uses a strided E-vector layout, then the data for nodei, componentj, elementkin the E-vector is given byi*layout[0] + j*layout[1] + k*layout[2]. - Backends may choose to use a non-strided E-vector layout; however, the
t2**tests will not function correctly in this case and these tests will need to be marked as allowable failures for this backend in the test suite.
- If possible, backends should use {c:func}
-
Q-vector layouts
- When the size of a {ref}
CeedQFunctionfield is greater than1, data for quadrature pointicomponentjcan be found in the Q-vector at indexi + Q*j, whereQis the total number of quadrature points in the Q-vector. Backends are free to provide the quadrature points in any order. - When the {ref}
CeedQFunctionfield hasemodeCEED_EVAL_GRAD, data for quadrature pointi, componentj, derivativekcan be found in the Q-vector at indexi + Q*j + Q*num_comp*k. - Backend developers must take special care to ensure that the data in the Q-vectors for a field with
emodeCEED_EVAL_NONEis properly ordered when the backend uses different layouts for E-vectors and Q-vectors.
- When the size of a {ref}
Backend implementations are expected to separately track 'owned' and 'borrowed' memory locations. Backends are responsible for freeing 'owned' memory; 'borrowed' memory is set by the user and backends only have read/write access to 'borrowed' memory. For any given precision and memory type, a backend should only have 'owned' or 'borrowed' memory, not both.
Backends are responsible for tracking which memory locations contain valid data.
If the user calls {c:func}CeedVectorTakeArray on the only memory location that contains valid data, then the {ref}CeedVector is left in an invalid state.
To repair an invalid state, the user must set valid data by calling {c:func}CeedVectorSetValue, {c:func}CeedVectorSetArray, or {c:func}CeedVectorGetArrayWrite.
Some checks for consistency and data validity with {ref}CeedVector array access are performed at the interface level.
All backends may assume that array access will conform to these guidelines:
-
Borrowed memory
- {ref}
CeedVectoraccess to borrowed memory is set with {c:func}CeedVectorSetArraywithcopy_mode = CEED_USE_POINTERand revoked with {c:func}CeedVectorTakeArray. The user must first call {c:func}CeedVectorSetArraywithcopy_mode = CEED_USE_POINTERfor the appropriate precision and memory type before calling {c:func}CeedVectorTakeArray. - {c:func}
CeedVectorTakeArraycannot be called on a vector in a invalid state.
- {ref}
-
Owned memory
- Owned memory can be allocated by calling {c:func}
CeedVectorSetValueor by calling {c:func}CeedVectorSetArraywithcopy_mode = CEED_COPY_VALUES. - Owned memory can be set by calling {c:func}
CeedVectorSetArraywithcopy_mode = CEED_OWN_POINTER. - Owned memory can also be allocated by calling {c:func}
CeedVectorGetArrayWrite. The user is responsible for manually setting the contents of the array in this case.
- Owned memory can be allocated by calling {c:func}
-
Data validity
- Internal synchronization and user calls to {c:func}
CeedVectorSynccannot be made on a vector in an invalid state. - Calls to {c:func}
CeedVectorGetArrayand {c:func}CeedVectorGetArrayReadcannot be made on a vector in an invalid state. - Calls to {c:func}
CeedVectorSetArrayand {c:func}CeedVectorSetValuecan be made on a vector in an invalid state. - Calls to {c:func}
CeedVectorGetArrayWritecan be made on a vector in an invalid state. Data synchronization is not required for the memory location returned by {c:func}CeedVectorGetArrayWrite. The caller should assume that all data at the memory location returned by {c:func}CeedVectorGetArrayWriteis invalid.
- Internal synchronization and user calls to {c:func}
Backends often manipulate tensors of dimension greater than 2. It is awkward to pass fully-specified multi-dimensional arrays using C99 and certain operations will flatten/reshape the tensors for computational convenience. We frequently use comments to document shapes using a lexicographic ordering. For example, the comment
// u has shape [dim, num_comp, Q, num_elem]means that it can be traversed as
for (d = 0; d < dim; d++) {
for (c = 0; c < num_comp; c++) {
for (q = 0; q < Q; q++) {
for (e = 0; e < num_elem; e++) {
u[((d*num_comp + c)*Q + q)*num_elem + e] = ...This ordering is sometimes referred to as row-major or C-style. Note that flattening such as
// u has shape [dim, num_comp, Q*num_elem]and
// u has shape [dim*num_comp, Q, num_elem]are purely implicit -- one just indexes the same array using the appropriate convention.
QFunction arguments can be assumed to have restrict semantics.
That is, each input and output array must reside in distinct memory without overlap.
Please check your code for style issues by running
make format
In addition to those automatically enforced style rules, libCEED tends to follow the following code style conventions:
- Variable names:
snake_case - Strut members:
snake_case - Function and method names:
PascalCaseor language specific style - Type names:
PascalCaseor language specific style - Constant names:
CAPS_SNAKE_CASEor language specific style
In general, variable and function names should avoid abbreviations and err on the side of verbosity to improve readability.
Also, documentation files should have one sentence per line to help make git diffs clearer and less disruptive.
Single line if statements are acceptable, but if clang-format forces it to be on two lines or if there are any else if or else blocks, each block must be enclosed in brackets.
This is enforced automatically for all enabled backends via make format, with the exception of header files in the include/ceed/jit-source directory.
These files cannot be automatically formatted by clang-tidy, as they cannot be compiled as standalone sources.
All functions in the libCEED library should be prefixed by Ceed and generally take a Ceed object as its first argument.
If a function takes, for example, a CeedOperator as its first argument, then it should be prefixed with CeedOperator.
Functions should adhere mostly to the PETSc function style, specifically:
- All local variables of a particular type (for example,
CeedInt) should be listed on the same line if possible; otherwise, they should be listed on adjacent lines. For example,
// Correct
CeedInt a, b, c;
CeedInt *d, *e;
CeedInt **f;
// Incorrect
CeedInt a, b, c, *d, *e, **f;- Local variables should be initialized in their declaration when possible.
- Nearly all functions should have a return type of
intand return aCeedErrorTypeto allow for error checking. - All functions must start with a single blank line after the local variable declarations.
- All libCEED function calls must have their return value checked for errors using the
CeedCall()orCeedCallBackend()macro. This should be wrapped around the function in question. - In libCEED functions, variables must be declared at the beginning of the code block (C90 style), never mixed in with code. However, when variables are only used in a limited scope, it is encouraged to declare them in that scope.
- Do not put a blank line immediately before
return CEED_ERROR_SUCCESS;. - All libCEED functions must use Doxygen comment blocks before their definition (not declaration).
The block should begin with
/**and end with**/, each on their own line. The block should be indented by two spaces and should contain an@brieftag and description, a newline, a line stating whether the function is collective, a newline,@paramtags for each parameter, a newline, and a@returnline formatted exactly as in the example below. All parameter lines in the Doxygen block should be formatted such that parameter names and descriptions are aligned. There should be a exactly one space between@param[dir](wheredirisin,out, orin,out) and the parameter name for the closest pair, as well as between the parameter name and description. For example:
/**
@brief Initialize a `Ceed` context to use the specified resource.
Note: Prefixing the resource with "help:" (e.g. "help:/cpu/self") will result in @ref CeedInt() printing the current libCEED version number and a list of current available backend resources to `stderr`.
@param[in] resource Resource to use, e.g., "/cpu/self"
@param[out] ceed The library context
@return An error code: 0 - success, otherwise - failure
@ref User
@sa CeedRegister() CeedDestroy()
**/
int CeedInit(const char *resource, Ceed *ceed) {- Function declarations should include parameter names, which must exactly match those in the function definition.
- External functions, i.e. those used in tests or examples, must have their declarations prefixed with
CEED_EXTERN. All other functions should have their declarations prefixed withCEED_INTERN. Function definitions should have neither.
Please check your code for common issues by running
make tidy
which uses the clang-tidy utility included in recent releases of Clang.
This tool is much slower than actual compilation (make -j8 parallelism helps).
To run on a single file, use
make interface/ceed.c.tidy
for example.
All issues reported by make tidy should be fixed.
Header inclusion for source files should follow the principal of 'include what you use' rather than relying upon transitive #include to define all symbols.
Every symbol that is used in the source file foo.c should be defined in foo.c, foo.h, or in a header file #included in one of these two locations.
Please check your code by running the tool include-what-you-use to see recommendations for changes to your source.
Most issues reported by include-what-you-use should be fixed; however this rule is flexible to account for differences in header file organization in external libraries.
If you have include-what-you-use installed in a sibling directory to libCEED or set the environment variable IWYU_CC, then you can use the makefile target make iwyu.
Header files should be listed in alphabetical order, with installed headers preceding local headers and ceed headers being listed first.
The ceed-f64.h and ceed-f32.h headers should only be included in ceed.h.
#include <ceed.h>
#include <ceed/backend.h>
#include <stdbool.h>
#include <string.h>
#include "ceed-avx.h"There are two main types of CI for libCEED - GitHub Actions and GitLab Runners.
The GitHub Actions focus on CPU based tests without any external dependencies, for the C, Fortran, Rust, Python, and Julia interfaces.
The GitHub Actions can be updated as required by editing the appropriate .yml file in .github/workflows.
The GitLab Runners focus on GPU based tests and external dependencies.
The GitLab Runners can be updated as required by editing .gitlab-ci.yml and updating the dependencies installed on Noether.
The following dependencies are installed on Noether:
-
MAGMA:
/projects/MAGMAand/projects/hipMAGMA -
PETSc:
/projects/petsc -
deal.II:
/projects/dealii
When managing the installed dependencies, use the phypid account.
The following dependencies are built on the fly and cached:
-
LIBXSMM
-
Nek5000
-
MFEM