f6f9689 in #3 uses std::copy() which is helpful for two reasons:
- It assumes that memory regions don't overlap so we don't have a performance penalty from aliasing
- It usually resolves to a vectorized (AVX) version of
memcpy()
We could probably improve performance across-the-board by using aligned memory everywhere and then annotating our loops (in C++ or Fortran) as such, e.g., with OpenMP's 'this memory is aligned' pragmas.
f6f9689 in #3 uses
std::copy()which is helpful for two reasons:memcpy()We could probably improve performance across-the-board by using aligned memory everywhere and then annotating our loops (in C++ or Fortran) as such, e.g., with OpenMP's 'this memory is aligned' pragmas.