Name	Name	Last commit message	Last commit date
parent directory ..
app_benchmark.cpp	app_benchmark.cpp
app_benchmark.h	app_benchmark.h
app_benchmark_boost_crypt_hasher.cpp	app_benchmark_boost_crypt_hasher.cpp
app_benchmark_boost_math_cbrt_tgamma.cpp	app_benchmark_boost_math_cbrt_tgamma.cpp
app_benchmark_boost_math_cyl_bessel_j.cpp	app_benchmark_boost_math_cyl_bessel_j.cpp
app_benchmark_boost_multiprecision_cbrt.cpp	app_benchmark_boost_multiprecision_cbrt.cpp
app_benchmark_cnl_scaled_integer.cpp	app_benchmark_cnl_scaled_integer.cpp
app_benchmark_complex.cpp	app_benchmark_complex.cpp
app_benchmark_crc.cpp	app_benchmark_crc.cpp
app_benchmark_detail.h	app_benchmark_detail.h
app_benchmark_ecc_generic_ecc.cpp	app_benchmark_ecc_generic_ecc.cpp
app_benchmark_fast_math.cpp	app_benchmark_fast_math.cpp
app_benchmark_filter.cpp	app_benchmark_filter.cpp
app_benchmark_fixed_point.cpp	app_benchmark_fixed_point.cpp
app_benchmark_float.cpp	app_benchmark_float.cpp
app_benchmark_hash.cpp	app_benchmark_hash.cpp
app_benchmark_hash_sha256.cpp	app_benchmark_hash_sha256.cpp
app_benchmark_non_std_decimal.cpp	app_benchmark_non_std_decimal.cpp
app_benchmark_none.cpp	app_benchmark_none.cpp
app_benchmark_pi_agm.cpp	app_benchmark_pi_agm.cpp
app_benchmark_pi_spigot.cpp	app_benchmark_pi_spigot.cpp
app_benchmark_pi_spigot_single.cpp	app_benchmark_pi_spigot_single.cpp
app_benchmark_soft_double_h2f1.cpp	app_benchmark_soft_double_h2f1.cpp
app_benchmark_std_big_int.cpp	app_benchmark_std_big_int.cpp
app_benchmark_trapezoid_integral.cpp	app_benchmark_trapezoid_integral.cpp
app_benchmark_wide_decimal.cpp	app_benchmark_wide_decimal.cpp
app_benchmark_wide_integer.cpp	app_benchmark_wide_integer.cpp
readme.md	readme.md

Real-Time-C++ - Benchmarks

Implementation details

The benchmarks provide code that exercises microcontroller performance.
Various efficiency aspects are emphasized such as integral and floating-point calculations, looping, branching, etc.
Each benchmark is implemented as a single callable function to be called from a scheduled task in the multitasking scheduler configuration.
Every benchmark file can also be compiled separately as a standalone C++14, 17, 20, 23 and beyond project.
A benchmark digital I/O pin is toggled hi/lo at begin/end of the benchmark run providing for oscilloscope real-time measurement.
The benchmarks provide scalable, portable means for identifying the performance class of the microcontroller.

Executing the benchmarks

Executing the benchmarks is straightforward. Select the desired benchmark and activate its corresponding flag in app_benchmark.h. In particular, #define the flag APP_BENCHMARK_TYPE to be one of the pre-defined benchmark types. This is typically done by simply un-commenting one of the easily-found relevant lines around line 34 here. Compile the reference application and run on the target. The benchmark timing will be reflected on microcontroller's corresponding benchmark port pin (the definition of which can be found in its target-specific MCAL).

Individual benchmarks can also be run standalone on any C++ platform. In the following short link to godbolt, for instance, we have adapted the APP_BENCHMARK_TYPE_TRAPEZOID_INTEGRAL benchmark for standalone use. The main() subroutine in the benchmark source files is activated with the compiler definition APP_BENCHMARK_STANDALONE_MAIN.

Individual benchmarks

app_benchmark_none.cpp via #define APP_BENCHMARK_TYPE_NONE is an empty benchmark with merely a Boolean function call returning true.
app_benchmark_complex.cpp via #define APP_BENCHMARK_TYPE_COMPLEX computes a floating-point complex-valued trigonometric sine function using the extended_complex::complex template class.
app_benchmark_crc.cpp via #define APP_BENCHMARK_TYPE_CRC calculates a $32$-bit, byte-oriented CRC result described in Sect. 6.1 of the book.
app_benchmark_fast_math.cpp via #define APP_BENCHMARK_TYPE_FAST_MATH calculates reduced, time-optimized floating-point elementary transcendental functions.
app_benchmark_filter.cpp via #define APP_BENCHMARK_TYPE_FILTER calculates an integral FIR filter sampling result.
app_benchmark_fixed_point.cpp via #define APP_BENCHMARK_TYPE_FIXED_POINT calculates the first derivative of an elementary function using the self-written fixed_point template class in Chap. 13 of the book.
app_benchmark_float.cpp via #define APP_BENCHMARK_TYPE_FLOAT implements the floating-point examples detailed in Sect. 12.4 of the book.
app_benchmark_wide_integer.cpp via #define APP_BENCHMARK_TYPE_WIDE_INTEGER performs $256$-bit unsigned big integer calculations using the uintwide_t class.
app_benchmark_pi_spigot.cpp via #define APP_BENCHMARK_TYPE_PI_SPIGOT performs a pi calculation using a template-based spigot algorithm with calculation steps divided among the slices of the idle task.
app_benchmark_pi_spigot_single.cpp via #define APP_BENCHMARK_TYPE_PI_SPIGOT_SINGLE does the same pi calculation as above implemented as a single function call.
app_benchmark_hash.cpp via #define APP_BENCHMARK_TYPE_HASH computes a $160$-bit hash checksum of a $3$-byte character-based message.
app_benchmark_wide_decimal.cpp via #define APP_BENCHMARK_TYPE_WIDE_DECIMAL computes a $100$ decimal digit square root using the decwide_t template class.
app_benchmark_trapezoid_integral.cpp via #define APP_BENCHMARK_TYPE_TRAPEZOID_INTEGRAL computes the numerical floating-point result of a Bessel function using a recursive trapezoid integration routine.
app_benchmark_pi_agm.cpp via #define APP_BENCHMARK_TYPE_PI_AGM computes $53$ decimal digits of pi (or optionally $101$ decimal digits of pi) using a Gauss AGM method with the decwide_t template class having a so-called limb type of std::uint16_t.
app_benchmark_boost_math_cbrt_tgamma.cpp via #define APP_BENCHMARK_TYPE_BOOST_MATH_CBRT_TGAMMA uses Boost.Math to compute the cube root of various Gamma functions values.
app_benchmark_boost_math_cyl_bessel_j.cpp via #define APP_BENCHMARK_TYPE_BOOST_MATH_CYL_BESSEL_J also uses Boost.Math to calculate cylindrical Bessel functions of small, non-integer order.
app_benchmark_cnl_scaled_integer.cpp via #define APP_BENCHMARK_TYPE_CNL_SCALED_INTEGER brings a small subset of the CNL Library onto the metal by exercising various elementary quadratic calculations with the fixed-point representations of cnl::scaled_integer.
app_benchmark_soft_double_h2f1.cpp via #define APP_BENCHMARK_TYPE_SOFT_DOUBLE_H2F1 calculates an ${\approx}~{15}$ decimal digit hypergeometric function value using a classic iterative rational approximation scheme. This calculation is also included as an example in the soft_double project.
app_benchmark_boost_multiprecision_cbrt.cpp via #define APP_BENCHMARK_TYPE_BOOST_MULTIPRECISION_CBRT uses Boost.Multiprecision in combination with Boost.Math to compute $101$ decimal digits of a cube root function.
app_benchmark_hash_sha256.cpp via #define APP_BENCHMARK_TYPE_HASH_SHA256 computes a $256$-bit hash checksum of a short $3$-byte character-based message.
app_benchmark_ecc_generic_ecc.cpp via #define APP_BENCHMARK_TYPE_ECC_GENERIC_ECC provides an intuitive view on elliptic-curve algebra, depicting a well-known $256$-bit cryptographic key-gen/sign/verify method. This benchmark is actually too lengthy to run on most of our embedded targets (other than BBB or RPI-zero) and adaptions of OS/watchdog are required in order to run this benchmark on the metal.
app_benchmark_non_std_decimal.cpp via #define APP_BENCHMARK_TYPE_NON_STD_DECIMAL carries out a $64$-bit decimal-floating-point calculation of the exponential function using the contemporary Boost.Decimal library. This benchmark does not, at the moment, run on the AVR target, but requires a larger microcontroller such as one of the $32$-bit ARM(R) devices.
app_benchmark_std_big_int.cpp via #define APP_BENCHMARK_TYPE_STD_BIG_INT benchmarks big multiprecision integers performing $2048{\times}2048$ bit multiplication with a $4096$-bit result. It uses the eisenwave/std-big-int library. This benchmark does not, at the moment, run on the AVR target, but requires a larger microcontroller such as one of the $32$-bit ARM(R) devices.

Performance classes

Most of the benchmarks run on each supported target system. Experience with runs on the individual target systems reveal a wide range of microcontroller performance classes.

Consider, for instance, app_benchmark_pi_agm.cpp which exercises the benchmark of type APP_BENCHMARK_TYPE_PI_AGM. This benchmark computes ${\sim}50{\ldots}100$ decimal digits of the mathematical constant $\pi$ using a Gauss AGM method with help from the decwide_t template class.

A very wide range of microcontroller performance classes is shown in the following table. The benchmark used is a ${\sim}100$ decimal digit AGM $\pi$ calculation.

Target	runtime $[ms]$	relative
`am6254_soc`	0.37	1.0
`am335x`	1.5	4.1
`esp32p4_riscv_soc`	2.5	6.8
`stm32f446`	5.1	14
`rpi_pico2_rp2350`	6.3	17
`wch_ch32v307`	8.0	22
`xtensa_esp32_s3`	9.1	25
`bl602_sifive_e24_riscv`	10	27
`rpi_pico_rp2040`	19	51
`avr`	410	1100

There are strikingly differing performance classes for the $8$-bit MICROCHIP(R) AVR controller of the ARDUINO and the $32$-bit ARM(R) 8 controller of the BeagleBone Black Edition, Rev. C. The $\pi$ calculation requires approximately $410~{\text{ms}}$ and $1.5~{\text{ms,}}$ respectively, on these two microcontroller systems.

The $64$-bit ARM(R) v8-a (i.e., Cortex(R) A53) performs the calculation in $0.37~{\text{ms}}$. This benchmark runs on one single A53 core of the PocketBeagle2 board and there are $3$ additional identical A53 cores (and $2$ smaller ones) simply waiting in idle loops.

The $32$-bit ARM(R) Cortex(R) M4F controller on the stm32f446 board performs the calculation in the middle of these extreme performance classes, with a result of $5.1~{\text{ms}}$.

The $32$-bit RISC-V controller (having a novel open-source core) on the wch_ch32v307 board boasts a quite respectable time of $8.0~{\text{ms}}$. A different $32$-bit RISC-V controller on target bl602_sifive_e24_riscv has similar performance, running the benchmark in about $10~{\text{ms}}$.

Running on only one core (core0) of the $32$-bit controller of the xtensa_esp32_s3 board results in a runtime of $9.1~{\text{ms}}$ for the calculation. The next generation esp32p4_riscv_soc with a dual RISC-V core architecture is significantly faster coming in at $2.5~{\text{ms}}$ (running the benchmark on one core).

Using only one core (core1) on the $32$-bit ARM(R) Cortex(R) M0+ controller of the rpi_pico_rp2040 board results in a calculation time of $19~{\text{ms}}$. The next generation rpi_pico2_rp2350 with dual ARM(R) Cortex(R) M33 cores definitively improves on this (still using only core1). It has a calculation time of $6.3~{\text{ms}}$, which is slightly more than $3$ times faster than its predecessor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

Real-Time-C++ - Benchmarks

Implementation details

Executing the benchmarks

Individual benchmarks

Performance classes

FilesExpand file tree

benchmark

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmark

Folders and files

parent directory

readme.md

Real-Time-C++ - Benchmarks

Implementation details

Executing the benchmarks

Individual benchmarks

Performance classes