Skip to content

Commit 6780d4b

Browse files
authored
Make everything 4% faster by skipping empty tasks [NFC] (#8571)
When a visitor is the original ```cpp void visitFoo(Foo* curr) {}` ``` (that is, empty), and the doVisit is also unchanged, ```cpp static void doVisitFoo(Self* self, Foo* curr) { self->visitFoo(curr); } ``` (that is, it just calls the visitor), then we do not need to queue such tasks for execution at all. Measurements show a 2.5%-5% speedup, average 4%.
1 parent 6c70e2c commit 6780d4b

File tree

1 file changed

+36
-2
lines changed

1 file changed

+36
-2
lines changed

src/wasm-traversal.h

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,10 @@ namespace wasm {
3636

3737
// A generic visitor, defaulting to doing nothing on each visit
3838

39-
template<typename SubType, typename ReturnType = void> struct Visitor {
39+
template<typename SubType, typename ReturnType_ = void> struct Visitor {
40+
// Capture the parameter in something we can access later.
41+
using ReturnType = ReturnType_;
42+
4043
// Expression visitors
4144
#define DELEGATE(CLASS_TO_VISIT) \
4245
ReturnType visit##CLASS_TO_VISIT(CLASS_TO_VISIT* curr) { \
@@ -351,9 +354,40 @@ struct PostWalker : public Walker<SubType, VisitorType> {
351354

352355
#define DELEGATE_ID curr->_id
353356

357+
// Don't push empty tasks, that is, functions that we just push to the
358+
// stack, pop, and then nothing happens when we call the empty function. The
359+
// default visitFoo() in Visitor is empty, and the static doVisitFoo() in
360+
// Walker just calls it, so if neither have been changed, we know that
361+
// nothing will run.
362+
//
363+
// Note that we check Visitor<..> and not VisitorType. Only Visitor is the
364+
// actual top type we know has empty visitors, while VisitorType could be
365+
// anything.
366+
//
367+
// Unfortunately we must avoid this in gcc 11 and earlier, as they error on
368+
// these function pointers not being constexpr. Remove the constexpr there.
369+
// Note that even if this ends up being a runtime check, it should be faster
370+
// than pushing empty tasks, as the check is much faster than the push/pop/
371+
// call, and a large number of our calls (most, perhaps) are not overridden.
372+
#if defined(__GNUC__) && !defined(__clang__) && __GNUC__ <= 11
373+
#define DELEGATE_START(id) \
374+
if (&SubType::visit##id != \
375+
&Visitor<SubType, typename SubType::ReturnType>::visit##id || \
376+
&SubType::doVisit##id != &Walker<SubType, VisitorType>::doVisit##id) { \
377+
self->pushTask(SubType::doVisit##id, currp); \
378+
} \
379+
[[maybe_unused]] auto* cast = curr->cast<id>();
380+
#else
354381
#define DELEGATE_START(id) \
355-
self->pushTask(SubType::doVisit##id, currp); \
382+
if constexpr (&SubType::visit##id != \
383+
&Visitor<SubType, \
384+
typename SubType::ReturnType>::visit##id || \
385+
&SubType::doVisit##id != \
386+
&Walker<SubType, VisitorType>::doVisit##id) { \
387+
self->pushTask(SubType::doVisit##id, currp); \
388+
} \
356389
[[maybe_unused]] auto* cast = curr->cast<id>();
390+
#endif
357391

358392
#define DELEGATE_GET_FIELD(id, field) cast->field
359393

0 commit comments

Comments
 (0)