Skip to content

Commit a98accc

Browse files
committed
einsum: double-fence each sub-World before destruction (incl. on exception unwind)
Add an inline RAII guard `FenceSubWorldsOnExit` to the generalized- contraction path of einsum, declared right after the `worlds` vector so it destructs *before* `worlds` (LIFO) and *after* AB/C. On normal exit this is a final harmless drain; on exception unwind it drains any `lazy_sync_children` tasks that ~DistArray scheduled via lazy_deleter on sub-World taskqs before those sub-Worlds are torn down. Without this, those tasks survive into the global ThreadPool past ~World, then trip ~WorldObject's `World::exists(&world)` assertion when an enclosing scope's fence runs them, masking the real exception with a cryptic abort. Two fences per sub-World are required: WorldGopInterface::fence_impl runs `deferred_->do_cleanup()` only after its task-drain loop, so the fresh lazy_sync_children tasks the destructors enqueue via lazy_sync are left pending when the first fence returns; the second fence drains them. (A single sub-World's ranks all call this collectively, so the cross-rank lazy_sync handshake matches up correctly. The same trick would not be safe inside the global fence_impl because main-world peers have independent shared_ptr refcounts and need not reach the same do_cleanup in lockstep.)
1 parent 8ca1ff2 commit a98accc

1 file changed

Lines changed: 33 additions & 0 deletions

File tree

src/TiledArray/einsum/tiledarray.h

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -653,6 +653,39 @@ auto einsum(expressions::TsrExpr<ArrayA_> A, expressions::TsrExpr<ArrayB_> B,
653653
// dead World (e.g. while unwinding an exception thrown mid-contraction).
654654
std::vector<std::shared_ptr<World>> worlds;
655655

656+
// RAII fencer: on normal exit and (critically) on exception unwind,
657+
// fence every live sub-World before it is destroyed. ~DistArray ->
658+
// lazy_deleter calls world.gop.lazy_sync(...) which enqueues a
659+
// lazy_sync_children task onto the sub-World's taskq; without a fence
660+
// those tasks survive into the global ThreadPool past the sub-World's
661+
// ~World, then trip ~WorldObject's `World::exists(&world)` assertion
662+
// when some later fence (e.g. an enclosing scope's fence run during
663+
// unwind) picks them up. Declared *after* `worlds` so it destructs
664+
// *before* `worlds` (LIFO); destructs *after* AB/C so it sees the
665+
// tasks they scheduled via lazy_deleter.
666+
//
667+
// Two fences per sub-World are needed: WorldGopInterface::fence_impl
668+
// runs `deferred_->do_cleanup()` only after its task-drain loop, and
669+
// the destructors that releases triggers (TA's lazy_deleter ->
670+
// lazy_sync) schedule a fresh lazy_sync_children task on the world's
671+
// taskq. That task is left pending when fence_impl returns; a second
672+
// fence drains it. (Within a single sub-World all participating ranks
673+
// call fence collectively, so the cross-rank lazy_sync handshake
674+
// matches up correctly.)
675+
struct FenceSubWorldsOnExit {
676+
std::vector<std::shared_ptr<World>> &worlds_;
677+
~FenceSubWorldsOnExit() {
678+
for (auto &w : worlds_) {
679+
if (!w) continue;
680+
try {
681+
w->gop.fence();
682+
w->gop.fence();
683+
} catch (...) {
684+
}
685+
}
686+
}
687+
} fence_subworlds_on_exit{worlds};
688+
656689
std::tuple<ArrayTerm<ArrayA>, ArrayTerm<ArrayB>> AB{{A.array(), a},
657690
{B.array(), b}};
658691

0 commit comments

Comments
 (0)