Is it possible to run a benchmark to compare the performance of the native binding (builtin NATIVE_RULE) with that of the new binding API (b2::jam::jam_binder)? I think so, and here's how I did it.
The idea is to call a rule implemented in both ways in a Jam loop and measure the time taken to perform the same number of loops, i.e., use the following jam file (legacy.jam) to benchmark the native binding
NATIVE_RULE benchmark : timed ;
IMPORT benchmark : timed : : benchmark.timed ;
while true { benchmark.timed ; }
and the Jamroot
import benchmark ;
while true { benchmark.timed ; }
for API binding benchmarking, this of course requires the build-system (unlike NATIVE_RULE.)
The probe
For the measurements I used
#include "startup.h"
#include "output.h"
#include <ctime>
#include <cstdlib>
/*
* Report collected data on stderr every 1<<shift tick() calls.
*/
template<unsigned shift>
struct WatchedCounter
{
static_assert(shift < 32, "");
static constexpr size_t mask = 1 << shift;
bool last_masked = false;
size_t counter = 0;
clock_t t0;
size_t exit_count;
/*
* Exit program after num reports.
*/
WatchedCounter(size_t num = 1) : exit_count(num << shift) { t0 = clock(); }
void tick()
{
if (bool(++counter & mask) != last_masked)
{
last_masked = !last_masked;
clock_t t1 = clock();
double dur = 1000.0 * (t1 - t0) / CLOCKS_PER_SEC;
err_printf("%ld\t%.3f\n", counter, dur);
}
if (counter == exit_count) b2::clean_exit(EXIT_SUCCESS);
}
};
Native binding implementation
The Jam timed rule is implemented by the function
LIST * native_timed(FRAME * frame, int flags)
{
static WatchedCounter<13> wc(2);
wc.tick();
return L0;
}
whose binding is done by the
/*
* Legacy binding style.
*/
void init_benchmark()
{
//char const * args[] = { "any", "*", 0 }; // only used to check for call syntax
char const * * args = nullptr; // do not care of args
declare_native_rule(
"benchmark",
"timed",
args,
native_timed,
1
);
}
which is called at the end of load_builtins in builtins.cpp.
New binding API implementation
I added the following mod_bind_benchmark.h
namespace b2 {
//void timed_no_args(); // alternate version
value_ref timed_no_args();
/*
* New binding style.
*/
struct benchmark_module : b2::bind::module_<benchmark_module>
{
const char * module_name = "benchmark";
template <class Binder>
void def(Binder & binder)
{
binder.def(&b2::timed_no_args, "timed");
binder.loaded();
}
};
} // namespace b2
which is included and used in bindjam.cpp, while the function that implements timed is in the same source along with everything else.
/*
* New binding style.
*/
namespace b2 {
// NOTE: returning void seems slower
//void timed_no_args()
value_ref timed_no_args()
{
static WatchedCounter<13> wc(2);
wc.tick();
return value_ref();
}
} // namespace b2
Results
The key parameters are the shift template argument and the number of reports requested to the WatchedCounter constructor, with the values reported, here's what I get on my poor laptop (b2 release gcc 10.3.1), average values over 10 runs:
> b2 -flegacy.jam
8192 15.786
16384 30.463
> b2
8192 25.695
16384 52.523
If I try to pass an argument into the Jam, thus I use the following implementation (binding API) for the timed rule
value_ref timed_any_args(list_cref args)
{
static WatchedCounter<13> wc(2);
wc.tick();
return value_ref();
}
and in the Jams I pass an argument (qwerty in both cases) I get (average values over 10 runs):
> b2 -flegacy.jam
8192 25.876
16384 43.933
> b2
8192 28.125
16384 57.759
Well, it seems that tons of templates don't make the code faster (they certainly don't make it simpler or more readable), although the binding API seems to scale better when passing arguments to rules.
Of course, this data is insufficient to draw any conclusions.
Is it possible to run a benchmark to compare the performance of the native binding (builtin
NATIVE_RULE) with that of the new binding API (b2::jam::jam_binder)? I think so, and here's how I did it.The idea is to call a rule implemented in both ways in a Jam loop and measure the time taken to perform the same number of loops, i.e., use the following jam file (
legacy.jam) to benchmark the native bindingand the
Jamrootfor API binding benchmarking, this of course requires the build-system (unlike
NATIVE_RULE.)The probe
For the measurements I used
Native binding implementation
The Jam
timedrule is implemented by the functionwhose binding is done by the
which is called at the end of
load_builtinsinbuiltins.cpp.New binding API implementation
I added the following
mod_bind_benchmark.hwhich is included and used in
bindjam.cpp, while the function that implementstimedis in the same source along with everything else.Results
The key parameters are the
shifttemplate argument and the number of reports requested to theWatchedCounterconstructor, with the values reported, here's what I get on my poor laptop (b2 release gcc 10.3.1), average values over 10 runs:If I try to pass an argument into the Jam, thus I use the following implementation (binding API) for the
timedruleand in the Jams I pass an argument (
qwertyin both cases) I get (average values over 10 runs):Well, it seems that tons of templates don't make the code faster (they certainly don't make it simpler or more readable), although the binding API seems to scale better when passing arguments to rules.
Of course, this data is insufficient to draw any conclusions.