Skip to content

Commit 1910acc

Browse files
committed
einsum: fence sub-Worlds before destruction (incl. on exception unwind)
Add an inline RAII guard `FenceSubWorldsOnExit` to the generalized- contraction path of einsum, declared right after the `worlds` vector so it destructs *before* `worlds` (LIFO) and *after* AB/C. On normal exit this is a final harmless drain; on exception unwind it drains any `lazy_sync_children` tasks that ~DistArray scheduled via lazy_deleter on sub-World taskqs before those sub-Worlds are torn down. Without this, those tasks survive into the global ThreadPool past ~World, then trip ~WorldObject's `World::exists(&world)` assertion when an enclosing scope's fence runs them, masking the real exception with a cryptic abort. One fence per sub-World suffices because lazy_deleter now bypasses lazy_sync when invoked from `do_cleanup` (gated by `world.gop.is_in_do_cleanup()`): the deferred-cleanup path performs direct deletes rather than scheduling cross-rank tasks. The remaining tasks this fence has to drain come only from non-deferred ~DistArray calls (e.g. AB during exception unwind), and all participating ranks of a sub-World reach this RAII guard in lockstep so their lazy_sync handshakes match up.
1 parent 9141b2e commit 1910acc

1 file changed

Lines changed: 32 additions & 0 deletions

File tree

src/TiledArray/einsum/tiledarray.h

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -653,6 +653,38 @@ auto einsum(expressions::TsrExpr<ArrayA_> A, expressions::TsrExpr<ArrayB_> B,
653653
// dead World (e.g. while unwinding an exception thrown mid-contraction).
654654
std::vector<std::shared_ptr<World>> worlds;
655655

656+
// RAII fencer: on normal exit and (critically) on exception unwind,
657+
// fence every live sub-World before it is destroyed. ~DistArray ->
658+
// lazy_deleter calls world.gop.lazy_sync(...) which enqueues a
659+
// lazy_sync_children task onto the sub-World's taskq; without a fence
660+
// those tasks survive into the global ThreadPool past the sub-World's
661+
// ~World, then trip ~WorldObject's `World::exists(&world)` assertion
662+
// when some later fence (e.g. an enclosing scope's fence run during
663+
// unwind) picks them up. Declared *after* `worlds` so it destructs
664+
// *before* `worlds` (LIFO); destructs *after* AB/C so it sees the
665+
// tasks they scheduled via lazy_deleter.
666+
//
667+
// One fence per sub-World is sufficient: lazy_deleter's fast path
668+
// skips lazy_sync when invoked from inside fence_impl's do_cleanup
669+
// (gated by `world.gop.is_in_do_cleanup()`), so the deferred-cleanup
670+
// path performs direct deletes rather than scheduling cross-rank
671+
// tasks. Tasks scheduled by *non*-deferred ~DistArray's (e.g. AB
672+
// during exception unwind) are drained by this fence's drain loop;
673+
// all participating ranks of a sub-World reach this RAII guard in
674+
// lockstep at function exit, so their lazy_sync handshakes match up.
675+
struct FenceSubWorldsOnExit {
676+
std::vector<std::shared_ptr<World>> &worlds_;
677+
~FenceSubWorldsOnExit() {
678+
for (auto &w : worlds_) {
679+
if (!w) continue;
680+
try {
681+
w->gop.fence();
682+
} catch (...) {
683+
}
684+
}
685+
}
686+
} fence_subworlds_on_exit{worlds};
687+
656688
std::tuple<ArrayTerm<ArrayA>, ArrayTerm<ArrayB>> AB{{A.array(), a},
657689
{B.array(), b}};
658690

0 commit comments

Comments
 (0)