Elements Lifetime (GPU)#1368
Conversation
7f272a7 to
7b00400
Compare
7b00400 to
8837f7d
Compare
ef43a24 to
6cf4c41
Compare
71c9e9c to
c82bdbf
Compare
c82bdbf to
19f5905
Compare
| /* Copyright 2022-2023 The Regents of the University of California, through Lawrence | ||
| * Berkeley National Laboratory (subject to receipt of any required | ||
| * approvals from the U.S. Dept. of Energy). All rights reserved. | ||
| * | ||
| * This file is part of ImpactX. | ||
| * | ||
| * FIXME: SHOULD BE REPLACED WITH amrex::Gpu::TrackedVector in v26.05+, see | ||
| * https://github.com/AMReX-Codes/amrex/pull/5253 | ||
| * | ||
| * Authors: Axel Huebl | ||
| * License: BSD-3-Clause-LBNL | ||
| */ |
Check notice
Code scanning / CodeQL
FIXME comment Note
There was a problem hiding this comment.
This will be replaced after we released 26.04 of ImpactX, because it goes in AMReX 26.05+ only
8363c50 to
c069964
Compare
f480634 to
cca268d
Compare
Template for all elements with dynamic GPU data.
Modernize dynamic data management
GPUDataRegistry in src/elements/mixin/dynamicdata.H uses function-local statics (next_id, registry) inside template member functions. With explicit template instantiation (IMPACTX_GPUDATA_INSTANTIATE / IMPACTX_GPUDATA_INSTANTIATE), this works for static builds. However, for DLL/shared library builds, each DLL that links the library could get its own copy of these statics unless the explicit instantiations are properly exported with visibility attributes.
cca268d to
1a83a74
Compare
|
@WeiqunZhang @EZoni this is ready to be reviewed/merged. |
| def create_soft_solenoid(name="sol"): | ||
| """Create a SoftSolenoid element with minimal coefficients.""" | ||
| return elements.SoftSolenoid( | ||
| name=name, | ||
| ds=1.0, | ||
| bscale=0.1, | ||
| cos_coefficients=[0.5, 0.3, 0.1], | ||
| sin_coefficients=[0.0, 0.0, 0.0], | ||
| mapsteps=10, | ||
| nslice=1, | ||
| ) | ||
|
|
||
|
|
||
| def create_soft_quadrupole(name="quad"): | ||
| """Create a SoftQuadrupole element with minimal coefficients.""" | ||
| return elements.SoftQuadrupole( | ||
| name=name, | ||
| ds=0.5, | ||
| gscale=1.0, | ||
| cos_coefficients=[1.0, 0.5], | ||
| sin_coefficients=[0.0, 0.0], | ||
| mapsteps=10, | ||
| nslice=1, | ||
| ) |
There was a problem hiding this comment.
Should we add coverage for all other elements that use the new data infrastructure or is it too difficult to set up a meaningful/reasonable test simulation with all of those?
There was a problem hiding this comment.
Good questions! No that is not needed, there are 6 elements in ImpactX that use dynamic GPU data and they all use the same pattern. Testing one of them is sufficient for demonstrate the general lifetime issue and to ensure it does not happen again.
|
|
||
| # Verify simulation completed | ||
| ref = sim.particle_container().ref_particle() | ||
| assert ref.s == pytest.approx(1.5, rel=1e-6) |
There was a problem hiding this comment.
Is the 1.5 here some sort of expected analytical value? If so, it may be useful to add a short comment about it.
There was a problem hiding this comment.
It's just the sum of the .ds of the elements in sim.lattice - I can write this more explicitly, this is just a test the simulation actually did anything. We can actually remove this part as well.
|
|
||
| ref1 = sim1.particle_container().ref_particle() | ||
| s_after_sim1 = ref1.s | ||
| assert s_after_sim1 == pytest.approx(1.0, rel=1e-6) |
There was a problem hiding this comment.
Is the 1.0 here some sort of expected analytical value? If so, it may be useful to add a short comment about it.
There was a problem hiding this comment.
same here, just the ds of the one lattice element we modeled. Not super needed to validate really.
|
|
||
| ref2 = sim2.particle_container().ref_particle() | ||
| s_after_sim2 = ref2.s | ||
| assert s_after_sim2 == pytest.approx(1.0, rel=1e-6) |
There was a problem hiding this comment.
Is the 1.0 here some sort of expected analytical value? If so, it may be useful to add a short comment about it.
There was a problem hiding this comment.
Same here, just checked the sim did anything. not needed to check.
|
As far as I can review, the PR looks good to me. I just left a few minor suggestions and will approve once a couple of inline comments are fixed (left-over from copy/paste). |
Not super needed to check that the simulation had a minimal effect on the reference particle. We just want to check element lifetimes here. Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>
Co-authored-by: Edoardo Zoni <59625522+EZoni@users.noreply.github.com>
EZoni
left a comment
There was a problem hiding this comment.
Thanks for the updates! As far as my understanding goes, this looks good to me. I'd let you or @WeiqunZhang finalize it and decide when it's ready to merge.
|
Thank you! 🎉 |
## Summary This adds a helper class for synchronizing a pair of host & device vectors (w/o using managed memory). In BLAST codes, I find this to cause significant boilerplate in physics-focused parts of the code and this little helper class cuts down on lines and mental book-keeping. Additionally, especially in interactive pyAMReX and ImpactX workflows, this construct is a building block for enabling user-friendly and even multi-simulation-spanning user data (e.g., ImpactX element data) by enabling users to define their inputs even before AMReX was initialized and being able to reuse their input classes across many thousands of simulations, e.g., in optimization loops, even with AMReX device arenas being shut down/recreated in between. ## Additional background The unit test shows the most common usage patterns & needs. BLAST-ImpactX/impactx#1368 ## Checklist The proposed changes: - [x] add new capabilities to AMReX - [x] CPU build: write the optimized fallback path - [x] GPU build: use AMReX-session agnostic pinned memory for host OR do synchronous copies - [x] include documentation in the code and/or rst files, if appropriate - [x] is covered by unit tests --------- Co-authored-by: Alexander Sinn <alexander.sinn@desy.de>
NVCC 12.2 + C++20 evaluates the default initializer of an inline
static data member during implicit instantiation of the enclosing
class template, so any reference such as `DynamicData::get(id)`
from an element header triggers `inline static
std::map<int, std::shared_ptr<T>> s_registry{};` and fails with
"incomplete type is not allowed".
Move `s_next_id` / `s_registry` to plain `static` declarations
inside the class with out-of-class non-inline template definitions
in the same header. Implicit class instantiation no longer touches
the default initializer; the definitions are processed only when
an explicit instantiation actually requests them.
This preserves the cross-DSO guarantee from BLAST-ImpactX#1441: combined with
`IMPACTX_EXPORT`, `extern template struct` in element headers, and
`template struct` in each element .cpp, the data members emit as
`STB_GNU_UNIQUE` symbols and resolve to a single shared registry
even when pyImpactX is built with `-fvisibility=hidden` and LTO.
It also avoids the function-local-static design from BLAST-ImpactX#1368, which
under LTO + hidden visibility duplicated the registry inside
pyImpactX via the inlined accessor body of `emplace<Args...>`.
Only `dynamicdata.H` changes; call sites and the
`IMPACTX_GPUDATA_EXTERN` / `_INSTANTIATE` macros are unchanged.
* C++20
Like AMReX and WarpX, we transition to C++20 now.
* CI: CUDA 12.2
* Fix GPUDataRegistry for NVCC 12 + C++20
NVCC 12.2 + C++20 evaluates the default initializer of an inline
static data member during implicit instantiation of the enclosing
class template, so any reference such as `DynamicData::get(id)`
from an element header triggers `inline static
std::map<int, std::shared_ptr<T>> s_registry{};` and fails with
"incomplete type is not allowed".
Move `s_next_id` / `s_registry` to plain `static` declarations
inside the class with out-of-class non-inline template definitions
in the same header. Implicit class instantiation no longer touches
the default initializer; the definitions are processed only when
an explicit instantiation actually requests them.
This preserves the cross-DSO guarantee from #1441: combined with
`IMPACTX_EXPORT`, `extern template struct` in element headers, and
`template struct` in each element .cpp, the data members emit as
`STB_GNU_UNIQUE` symbols and resolve to a single shared registry
even when pyImpactX is built with `-fvisibility=hidden` and LTO.
It also avoids the function-local-static design from #1368, which
under LTO + hidden visibility duplicated the registry inside
pyImpactX via the inlined accessor body of `emplace<Args...>`.
Only `dynamicdata.H` changes; call sites and the
`IMPACTX_GPUDATA_EXTERN` / `_INSTANTIATE` macros are unchanged.
* CI: Bump ASAN
Clang 14 is known buggy/incomplete for std::source_location,
which toml11 uses (via openPMD-api).
* Doc: Update ROCm to 6+
* Fix NVCC Compile Issue in Envelope
On GPU, we have to more carefully deal with the lifetime of dynamic (array) data in elements. This PR is an infrastructure update for complex elements that have such dynamic GPU data, to address multiple usage bugs on GPU.
Part 1: This PR
sim.latticeBeamMonitor: Delay Open #1178Part 2: Follow-up PR
sim.latticebehave like a Python list: shared instead of copied elements [WIP] Bettersim.lattice#1381