- The benchmarks provide code that exercises microcontroller performance.
- Various efficiency aspects are emphasized such as integral and floating-point calculations, looping, branching, etc.
- Each benchmark is implemented as a single callable function to be called from a scheduled task in the multitasking scheduler configuration.
- Every benchmark file can also be compiled separately as a standalone C++14, 17, 20, 23 and beyond project.
- A benchmark digital I/O pin is toggled hi/lo at begin/end of the benchmark run providing for oscilloscope real-time measurement.
- The benchmarks provide scalable, portable means for identifying the performance class of the microcontroller.
Executing the benchmarks is straightforward. Select the desired benchmark and
activate its corresponding flag in
app_benchmark.h.
In particular, #define the flag APP_BENCHMARK_TYPE to be one of the pre-defined benchmark types.
This is typically done by simply un-commenting one of the easily-found relevant lines around
line 34 here.
Compile the reference application and run on the target.
The benchmark timing will be reflected on microcontroller's corresponding
benchmark port pin (the definition of which can be found in its target-specific MCAL).
Individual benchmarks can also be run standalone on any C++ platform.
In the following short link
to godbolt, for instance, we have adapted the
APP_BENCHMARK_TYPE_TRAPEZOID_INTEGRAL benchmark for standalone use.
The main() subroutine in the benchmark source files is activated
with the compiler definition APP_BENCHMARK_STANDALONE_MAIN.
-
via
#define APP_BENCHMARK_TYPE_NONEis an empty benchmark with merely a Boolean function call returningtrue. -
via
#define APP_BENCHMARK_TYPE_COMPLEXcomputes a floating-point complex-valued trigonometric sine function using theextended_complex::complextemplate class. -
via
#define APP_BENCHMARK_TYPE_CRCcalculates a$32$ -bit, byte-oriented CRC result described in Sect. 6.1 of the book. -
via
#define APP_BENCHMARK_TYPE_FAST_MATHcalculates reduced, time-optimized floating-point elementary transcendental functions. -
via
#define APP_BENCHMARK_TYPE_FILTERcalculates an integral FIR filter sampling result. -
via
#define APP_BENCHMARK_TYPE_FIXED_POINTcalculates the first derivative of an elementary function using the self-writtenfixed_pointtemplate class in Chap. 13 of the book. -
via
#define APP_BENCHMARK_TYPE_FLOATimplements the floating-point examples detailed in Sect. 12.4 of the book. -
via
#define APP_BENCHMARK_TYPE_WIDE_INTEGERperforms$256$ -bit unsigned big integer calculations using theuintwide_tclass. -
via
#define APP_BENCHMARK_TYPE_PI_SPIGOTperforms a pi calculation using a template-based spigot algorithm with calculation steps divided among the slices of the idle task. -
via
#define APP_BENCHMARK_TYPE_PI_SPIGOT_SINGLEdoes the same pi calculation as above implemented as a single function call. -
via
#define APP_BENCHMARK_TYPE_HASHcomputes a$160$ -bit hash checksum of a$3$ -byte character-based message. -
via
#define APP_BENCHMARK_TYPE_WIDE_DECIMALcomputes a$100$ decimal digit square root using thedecwide_ttemplate class. -
via
#define APP_BENCHMARK_TYPE_TRAPEZOID_INTEGRALcomputes the numerical floating-point result of a Bessel function using a recursive trapezoid integration routine. -
via
#define APP_BENCHMARK_TYPE_PI_AGMcomputes$53$ decimal digits of pi (or optionally$101$ decimal digits of pi) using a Gauss AGM method with thedecwide_ttemplate class having a so-called limb type ofstd::uint16_t. -
via
#define APP_BENCHMARK_TYPE_BOOST_MATH_CBRT_TGAMMAuses Boost.Math to compute the cube root of various Gamma functions values. -
via
#define APP_BENCHMARK_TYPE_BOOST_MATH_CYL_BESSEL_Jalso uses Boost.Math to calculate cylindrical Bessel functions of small, non-integer order. -
via
#define APP_BENCHMARK_TYPE_CNL_SCALED_INTEGERbrings a small subset of the CNL Library onto the metal by exercising various elementary quadratic calculations with the fixed-point representations ofcnl::scaled_integer. -
via
#define APP_BENCHMARK_TYPE_SOFT_DOUBLE_H2F1calculates an${\approx}~{15}$ decimal digit hypergeometric function value using a classic iterative rational approximation scheme. This calculation is also included as an example in the soft_double project. -
via
#define APP_BENCHMARK_TYPE_BOOST_MULTIPRECISION_CBRTuses Boost.Multiprecision in combination with Boost.Math to compute$101$ decimal digits of a cube root function. -
via
#define APP_BENCHMARK_TYPE_HASH_SHA256computes a$256$ -bit hash checksum of a short$3$ -byte character-based message. -
via
#define APP_BENCHMARK_TYPE_ECC_GENERIC_ECCprovides an intuitive view on elliptic-curve algebra, depicting a well-known$256$ -bit cryptographic key-gen/sign/verify method. This benchmark is actually too lengthy to run on most of our embedded targets (other than BBB or RPI-zero) and adaptions of OS/watchdog are required in order to run this benchmark on the metal. -
via
#define APP_BENCHMARK_TYPE_NON_STD_DECIMALcarries out a$64$ -bit decimal-floating-point calculation of the exponential function using the contemporary cpplliance/decimal library. This benchmark does not, at the moment, run on the AVR target, but requires a larger microcontroller such as one of the$32$ -bit ARM(R) devices.
Most of the benchmarks run on each supported target system. Experience with runs on the individual target systems reveal a wide range of microcontroller performance classes.
Consider, for instance,
app_benchmark_pi_agm.cpp
which exercises the benchmark of type APP_BENCHMARK_TYPE_PI_AGM.
This benchmark computes decwide_t
template class.
A very wide range of microcontroller performance classes is shown in the following table.
The benchmark used is a
| Target | runtime |
relative |
|---|---|---|
am6254_soc |
0.37 | 1.0 |
am335x |
1.5 | 4.1 |
esp32p4_riscv_soc |
2.5 | 6.8 |
stm32f446 |
5.1 | 14 |
rpi_pico2_rp2350 |
6.3 | 17 |
wch_ch32v307 |
8.0 | 22 |
xtensa_esp32_s3 |
9.1 | 25 |
bl602_sifive_e24_riscv |
10 | 27 |
rpi_pico_rp2040 |
19 | 51 |
avr |
410 | 1100 |
There are strikingly differing performance classes
for the
The
The stm32f446 board performs the calculation in
the middle of these extreme performance classes,
with a result of
The wch_ch32v307 board boasts a quite respectable
time of bl602_sifive_e24_riscv has similar performance,
running the benchmark in about
Running on only one core (core0) of the xtensa_esp32_s3 board results in
a runtime of esp32p4_riscv_soc with a dual RISC-V core
architecture is significantly faster coming in at
Using only one core (core1) on the rpi_pico_rp2040 board results in a calculation
time of rpi_pico2_rp2350
with dual ARM(R) Cortex(R) M33 cores definitively improves on this
(still using only core1). It has a calculation time of