Skip to content

Optimize include time, push definitions to object files#1774

Merged
ax3l merged 23 commits intoopenPMD:devfrom
franzpoeschel:optimize-include-time
Aug 6, 2025
Merged

Optimize include time, push definitions to object files#1774
ax3l merged 23 commits intoopenPMD:devfrom
franzpoeschel:optimize-include-time

Conversation

@franzpoeschel
Copy link
Copy Markdown
Contributor

@franzpoeschel franzpoeschel commented Aug 4, 2025

Fix BLAST-WarpX/warpx#5984 (comment)

Including openPMD currently takes too much time. The main culprit seems to be our use of std::variant for all datatypes in the Datatype enum.

This PR is somewhat macro-heavy to keep down compile times on our end.

  • Do not use std::variant in public headers of openPMD::Variant, move all use of it to object files, use std::any publically. This breaks the API, but I think that this part of the API was rarely used.
  • Move most storeChunk/loadChunk implementations to object files
  • Add an extra header to bring back the removed functionality?
  • Instantiate more things in object files, removing them from public includes

It is somewhat likely that std::variant was very costly due to things like Attribute::get<T>(): The requested type T must be matched against the actual type A, leading to 38²=1444 instantiations. If naively pushing templates to the object files, this leads to very big compiled binaries. Some short-circuiting logic (pre-filling matrixes of which types are actually convertible and only instantiating in those cases when they are) helped in there.

** Results **

The cut on compile times is significant. The SerialIOTests (which have gotten too huge and take ages to compile) used to compile in ~ a minute, now it's half a minute:
Bildschirmfoto vom 2025-08-05 19-18-25
Bildschirmfoto vom 2025-08-05 19-18-12

For a more normal example, I'll compile 10_streaming_write, which includes openPMD.hpp and does some basic transforma
Bildschirmfoto vom 2025-08-05 19-23-17
tions. Down to 2 seconds from 5 seconds:
Bildschirmfoto vom 2025-08-05 19-18-46

Funny enough, the library size slightly shrinks from 9.8MB (Clang) / 8.5MB (g++) to 8.5MB (Clang) / 7.6MB (g++), despite pulling definitions into object files.
My guess is that most definitions were instantiated anyway, but the additional logic to cut away type conversion instantiations brings some size benefit.


template <typename Visitor, typename... Args>
auto RecordComponent::visit(Args &&...args)
inline auto RecordComponent::visit(Args &&...args)

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable args is never read.

template <typename Visitor, typename... Args>
auto RecordComponent::visit(Args &&...args)
inline auto RecordComponent::visit(Args &&...args)

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable args is not used.
Comment thread src/RecordComponent.cpp
}

template <typename T>
RecordComponent &RecordComponent::makeConstant(T value)

Check notice

Code scanning / CodeQL

No raw arrays in interfaces Note

Raw arrays should not be used in interfaces. A container class should be used instead.
@franzpoeschel
Copy link
Copy Markdown
Contributor Author

franzpoeschel commented Aug 5, 2025

Headers whose implementations could be moved to a .cpp file:

include/openPMD/Span.hpp
include/openPMD/backend/PatchRecordComponent.hpp
✓ include/openPMD/backend/Container.hpp
✓ include/openPMD/backend/BaseRecord.hpp
✓ include/openPMD/auxiliary/UniquePtr.hpp // except those templates with a Del parameter
✓ include/openPMD/auxiliary/Memory.hpp
include/openPMD/auxiliary/StringManip.hpp
include/openPMD/auxiliary/ShareRawInternal.hpp
include/openPMD/auxiliary/OutOfRangeMsg.hpp
include/openPMD/auxiliary/Environment.hpp

include/openPMD/Datatype.hpp // but keep constexpr implementations public

some single functions:
include/openPMD/ParticleSpecies.hpp
include/openPMD/backend/Attributable.hpp
include/openPMD/backend/BaseRecordComponent.hpp

Comment thread src/auxiliary/Memory.cpp
namespace openPMD::auxiliary
{
std::unique_ptr<void, std::function<void(void *)>>
allocatePtr(Datatype dtype, uint64_t numPoints)

Check warning

Code scanning / CodeQL

Poorly documented large function Warning

Poorly documented function: fewer than 2% comments for a function of 115 lines.
Comment on lines +712 to +719
// for (auto it = this->container().begin(); it != end; ++it)
// {
// if (it->first == RecordComponent::SCALAR)
// {
// this->container().erase(it);
// throw error::WrongAPIUsage(detail::NO_SCALAR_INSERT);
// }
// }

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +731 to +738
// for (auto it = this->container().begin(); it != end; ++it)
// {
// if (it->first == RecordComponent::SCALAR)
// {
// this->container().erase(it);
// throw error::WrongAPIUsage(detail::NO_SCALAR_INSERT);
// }
// }

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
@franzpoeschel franzpoeschel changed the title WIP: Optimize include time, push definitions to object files Optimize include time, push definitions to object files Aug 6, 2025
@franzpoeschel franzpoeschel requested a review from ax3l August 6, 2025 13:11
@ax3l ax3l self-assigned this Aug 6, 2025
Copy link
Copy Markdown
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for digging this!

@ax3l ax3l merged commit 9f42a5f into openPMD:dev Aug 6, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PhysicalParticleContainer.cpp takes a long time to compile

3 participants