Skip to content

Commit e81bceb

Browse files
ccottermaikelericniebler
authored
Fix race in MPSC algo (#1812)
* Fix race in MPSC algo Relacy helped identify a race in the existing MPSC algo. I am having a hard time exactly explaining what's going on, but in the newly added unit test (the non Relacy one), I am able to observe three different odd behaviors - a consumer consuming the same elemment in an finite loop, apparently due to the internal next pointers pointing in some sort of cycle - consumer returning &__nil_! - consumer never able to consume a produced value (node is lost) With the non-relacy unit test, in the existing algo, if I insert a random sleep of 0-10 microseconds in push_back after __back_ is exchanged, I can observe one of the above behaviors nearly every single time. The most common was the first behavior. The existing algo claims it came from Dmitry Vyukov's implementation, though one key difference is that the existing one uses an atomic pointer to a Node for the "nil" object, whereas Dmitry's stores an actual Node object embedded in the queue. I re-implemented the version in stdexec exactly as it appears on Dmitry's website (which I had to dig up on archive.org), and it passes newly added Relacy (exploring many thread interleavings) and non-Relacy unit tests. I originally tracked down a bug in timed_thread_scheduler.cpp, where sometimes `STDEXEC_ASSERT(op->command_ == command_type::command_type::stop);` failed. * Add Relacy tests to the build * Default relacy only fofr GNU * Remove old Relacy Makefile * Simplify tests * Improve implementation to be near to Dmitry's * Final fixups - Use STDEXEC::__std::atomic - Update compiler requirements for Relacy - Compile Relacy with TBB off --------- Co-authored-by: Maikel Nadolski <maikel.nadolski@gmail.com> Co-authored-by: Eric Niebler <eniebler@nvidia.com>
1 parent 8d59830 commit e81bceb

15 files changed

Lines changed: 744 additions & 112 deletions

CMakeLists.txt

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,26 @@ else()
150150
set(stdexec_compiler_frontend ${CMAKE_CXX_COMPILER_ID})
151151
endif()
152152

153+
# Build relacy tests by default only for GNU compiler and if tests are enabled
154+
# It will take work to get other platforms to work in the future.
155+
# Additionally, do not build the Relacy tests when a sanitizer is active. TSAN
156+
# doesn't do anything since Relacy simulates threads with fibers anyway. Relacy
157+
# has primtive (compared to ASAN) use-after-free detection which doesn't compose
158+
# with ASAN.
159+
# Additionally, we require GCC-12 or newer for Relacy, at __atomic.hpp has fallback
160+
# when std::atomic<std::shared_ptr<>> is not provided, and GCC-12 is the first
161+
# libstdc++ that provides it.
162+
if(
163+
${STDEXEC_BUILD_TESTS} AND
164+
CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND
165+
CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL 12 AND
166+
NOT CMAKE_CXX_FLAGS MATCHES "-fsanitize")
167+
set(STDEXEC_BUILD_RELACY_TESTS_DEFAULT ON)
168+
else()
169+
set(STDEXEC_BUILD_RELACY_TESTS_DEFAULT OFF)
170+
endif()
171+
option(STDEXEC_BUILD_RELACY_TESTS "Build stdexec relacy tests" ${STDEXEC_BUILD_RELACY_TESTS_DEFAULT})
172+
153173
set(stdexec_export_targets)
154174

155175
# Define the main library

include/exec/timed_thread_scheduler.hpp

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,16 +47,20 @@ namespace experimental::execution
4747
stop
4848
};
4949

50+
// Default ctor for __intrusive_mpsc_queue's internal stub node
51+
constexpr timed_thread_operation_base() = default;
52+
5053
constexpr timed_thread_operation_base(
5154
void (*set_value)(timed_thread_operation_base*) noexcept,
5255
command_type command = command_type::schedule) noexcept
5356
: command_{command}
5457
, set_value_{set_value}
5558
{}
5659

57-
STDEXEC::__std::atomic<void*> next_{nullptr};
58-
command_type command_;
59-
void (*set_value_)(timed_thread_operation_base*) noexcept;
60+
using set_value_fn_t = void (*)(timed_thread_operation_base*) noexcept;
61+
STDEXEC::__std::atomic<timed_thread_operation_base*> next_{nullptr};
62+
command_type command_;
63+
set_value_fn_t set_value_;
6064
};
6165

6266
template <class Tp>

include/stdexec/__detail/__atomic.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ namespace STDEXEC::__std
6464
using std::atomic_thread_fence;
6565
using std::atomic_signal_fence;
6666

67-
# if __cpp_lib_atomic_ref >= 2018'06L && !defined(STDEXEC_RELACY)
67+
# if __cpp_lib_atomic_ref >= 2018'06L
6868
using std::atomic_ref;
6969
# else
7070
inline constexpr int __atomic_flag_map[] = {
Lines changed: 52 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright (c) Dmitiy V'jukov
2+
* Copyright (c) Dmitry V'jukov
33
* Copyright (c) 2024 Maikel Nadolski
44
* Copyright (c) 2024 NVIDIA Corporation
55
*
@@ -23,72 +23,84 @@
2323

2424
#include "__atomic.hpp"
2525

26-
#include "./__spin_loop_pause.hpp"
26+
#include "stdexec/__detail/__config.hpp"
2727

2828
namespace STDEXEC
2929
{
3030
template <auto _Ptr>
3131
class __intrusive_mpsc_queue;
3232

33-
template <class _Node, __std::atomic<void*> _Node::* _Next>
33+
// _Node must be default_initializable only for the queue to construct an
34+
// internal "stub" node - only the _Next data element is accessed internally.
35+
template <class _Node, __std::atomic<_Node*> _Node::* _Next>
36+
requires __std::default_initializable<_Node>
3437
class __intrusive_mpsc_queue<_Next>
3538
{
36-
__std::atomic<void*> __back_{&__nil_};
37-
void* __front_{&__nil_};
38-
__std::atomic<_Node*> __nil_ = nullptr;
39+
__std::atomic<_Node*> __head_{&__stub_};
40+
_Node* __tail_{&__stub_};
41+
_Node __stub_{};
3942

40-
constexpr void push_back_nil()
43+
public:
44+
__intrusive_mpsc_queue()
4145
{
42-
__nil_.store(nullptr, __std::memory_order_relaxed);
43-
auto* __prev = static_cast<_Node*>(__back_.exchange(&__nil_, __std::memory_order_acq_rel));
44-
(__prev->*_Next).store(&__nil_, __std::memory_order_release);
46+
(__stub_.*_Next).store(nullptr, __std::memory_order_release);
4547
}
4648

47-
public:
4849
constexpr auto push_back(_Node* __new_node) noexcept -> bool
4950
{
5051
(__new_node->*_Next).store(nullptr, __std::memory_order_relaxed);
51-
void* __prev_back = __back_.exchange(__new_node, __std::memory_order_acq_rel);
52-
bool __is_nil = __prev_back == static_cast<void*>(&__nil_);
53-
if (__is_nil)
54-
{
55-
__nil_.store(__new_node, __std::memory_order_release);
56-
}
57-
else
58-
{
59-
(static_cast<_Node*>(__prev_back)->*_Next).store(__new_node, __std::memory_order_release);
60-
}
61-
return __is_nil;
52+
_Node* __prev = __head_.exchange(__new_node, __std::memory_order_acq_rel);
53+
(__prev->*_Next).store(__new_node, __std::memory_order_release);
54+
return __prev == &__stub_;
6255
}
6356

6457
constexpr auto pop_front() noexcept -> _Node*
6558
{
66-
if (__front_ == static_cast<void*>(&__nil_))
59+
_Node* __tail = this->__tail_;
60+
STDEXEC_ASSERT(__tail != nullptr);
61+
_Node* __next = (__tail->*_Next).load(__std::memory_order_acquire);
62+
// If tail is pointing to the stub node we need to advance it once more
63+
if (&__stub_ == __tail)
6764
{
68-
_Node* __next = __nil_.load(__std::memory_order_acquire);
69-
if (!__next)
65+
if (nullptr == __next)
7066
{
7167
return nullptr;
7268
}
73-
__front_ = __next;
69+
this->__tail_ = __next;
70+
__tail = __next;
71+
__next = (__next->*_Next).load(__std::memory_order_acquire);
72+
}
73+
// Normal case: there is a next node and we can just advance the tail
74+
if (nullptr != __next)
75+
{
76+
this->__tail_ = __next;
77+
return __tail;
7478
}
75-
auto* __front = static_cast<_Node*>(__front_);
76-
void* __next = (__front->*_Next).load(__std::memory_order_acquire);
77-
if (__next)
79+
// Next is nullptr here means that either:
80+
// 1) There are no more nodes in the queue
81+
// 2) A producer is in the middle of adding a new node
82+
_Node const * __head = this->__head_.load(__std::memory_order_acquire);
83+
// A producer is in the middle of adding a new node
84+
// we cannot return tail as we cannot link the next node yet
85+
if (__tail != __head)
7886
{
79-
__front_ = __next;
80-
return __front;
87+
return nullptr;
8188
}
82-
STDEXEC_ASSERT(!__next);
83-
push_back_nil();
84-
do
89+
// No more nodes in the queue - we need to insert a stub node
90+
// to be able to link to an eventual empty state (or new nodes)
91+
push_back(&__stub_);
92+
// Now re-attempt to load next
93+
__next = (__tail->*_Next).load(__std::memory_order_acquire);
94+
if (nullptr != __next)
8595
{
86-
__spin_loop_pause();
87-
__next = (__front->*_Next).load(__std::memory_order_acquire);
96+
// Successfully linked either a new node or the stub node
97+
this->__tail_ = __next;
98+
return __tail;
8899
}
89-
while (!__next);
90-
__front_ = __next;
91-
return __front;
100+
// A producer is in the middle of adding a new node since next is still nullptr
101+
// and not our stub node, thus we cannot link the next node yet
102+
return nullptr;
92103
}
93104
};
94-
} // namespace STDEXEC
105+
106+
} // namespace STDEXEC

include/stdexec/__detail/__stop_token.hpp

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,14 @@
2424
# include <stop_token> // IWYU pragma: export
2525
#endif
2626

27+
// This shouldn't be necessary, but some standard library implementations claim support
28+
// for jthread but don't actually provide std::stop_token.
29+
STDEXEC_NAMESPACE_STD_BEGIN
30+
class stop_token;
31+
template <class _Callback>
32+
class stop_callback;
33+
STDEXEC_NAMESPACE_STD_END
34+
2735
STDEXEC_P2300_NAMESPACE_BEGIN()
2836

2937
template <template <class> class _Callback>
@@ -82,7 +90,7 @@ STDEXEC_P2300_NAMESPACE_BEGIN()
8290
private:
8391
struct __callback_type
8492
{
85-
constexpr explicit __callback_type(never_stop_token, STDEXEC::__ignore) noexcept { }
93+
constexpr explicit __callback_type(never_stop_token, STDEXEC::__ignore) noexcept {}
8694
};
8795
public:
8896
template <class>

test/CMakeLists.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ set(stdexec_test_sources
6565
stdexec/detail/test_completion_signatures.cpp
6666
stdexec/detail/test_demangle.cpp
6767
stdexec/detail/test_utility.cpp
68+
stdexec/detail/test_intrusive_mpsc_queue.cpp
6869
stdexec/schedulers/test_task_scheduler.cpp
6970
stdexec/schedulers/test_parallel_scheduler.cpp
7071
stdexec/queries/test_env.cpp
@@ -154,6 +155,10 @@ icm_add_build_failure_test(
154155
FOLDER test
155156
)
156157

158+
if(STDEXEC_BUILD_RELACY_TESTS)
159+
add_subdirectory(rrd)
160+
endif()
161+
157162
# # Adding multiple tests with a glob
158163
# icm_glob_build_failure_tests(
159164
# PATTERN *_fail*.cpp

test/rrd/CMakeLists.txt

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
include(FetchContent)
2+
3+
FetchContent_Declare(
4+
relacy
5+
GIT_REPOSITORY https://github.com/dvyukov/relacy
6+
GIT_TAG master
7+
)
8+
9+
FetchContent_Populate(relacy)
10+
11+
add_library(relacy INTERFACE)
12+
target_include_directories(relacy INTERFACE
13+
$<BUILD_INTERFACE:${relacy_SOURCE_DIR}/relacy/fakestd>
14+
$<BUILD_INTERFACE:${relacy_SOURCE_DIR}>
15+
$<INSTALL_INTERFACE:include>
16+
)
17+
18+
function(add_relacy_test target_name)
19+
add_executable(${target_name} ${target_name}.cpp)
20+
21+
target_link_libraries(${target_name} PRIVATE relacy)
22+
# Disable TBB - Relacy simulates all threads with fibers anyway...
23+
target_compile_definitions(${target_name} PRIVATE _GLIBCXX_USE_TBB_PAR_BACKEND=0)
24+
target_include_directories(${target_name} PRIVATE ../../include)
25+
target_include_directories(${target_name} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
26+
set_target_properties(${target_name} PROPERTIES
27+
CXX_STANDARD 20
28+
CXX_STANDARD_REQUIRED ON
29+
CXX_EXTENSIONS OFF)
30+
31+
add_test(NAME relacy-${target_name} COMMAND ${target_name})
32+
endfunction()
33+
34+
set(relacy_tests
35+
async_scope
36+
intrusive_mpsc_queue
37+
split
38+
sync_wait
39+
)
40+
41+
foreach(test ${relacy_tests})
42+
add_relacy_test(${test})
43+
endforeach()
44+
45+
# Target to build all relacy tests
46+
add_custom_target(relacy-tests DEPENDS ${relacy_tests})

test/rrd/Makefile

Lines changed: 0 additions & 43 deletions
This file was deleted.

test/rrd/README.md

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,20 @@ STDEXEC library could needs to use `x.fetch_add(1)` to be compatible with Relacy
1515

1616
## Instructions
1717

18-
Run the following commands from within this directory (`./tests/rrd`).
18+
Configure and build stdexec following the build instructions in the top level
19+
[README.md](../../README.md). There are a couple relacy specific build and ctest
20+
targets, though they are part of the standard build and ctest and will be run
21+
automatically if cmake is configured with `-DSTDEXEC_BUILD_RELACY_TESTS=1`.
22+
`STDEXEC_BUILD_RELACY_TESTS` is set by default for GCC today.
23+
24+
Run the following on a Linux machine with GCC as the toolchain.
1925

2026
```
21-
git clone -b STDEXEC https://github.com/dvyukov/relacy
22-
CXX=g++-11 make -j 4
23-
./build/split
24-
./build/async_scope
27+
mkdir build && cd build
28+
cmake ..
29+
make relacy-tests -j 4
30+
ctest -R relacy # Run all relacy tests
31+
./test/rrd/sync_wait # Run a specific relacy test directly
2532
```
2633

2734
## Recommended use
@@ -35,8 +42,16 @@ out a more stable build on all environments/compilers, we should revisit this.
3542
## Supported platforms
3643

3744
The STDEXEC Relacy tests have been verified to build and run on
38-
* Linux based GCC+11 with libstdc++ (`x86_64`)
39-
* Mac with Apple Clang 15 with libc++ (`x86_64`)
45+
* Linux based GCC+12-14 with libstdc++ (`x86_64`)
46+
* Mac with Apple Clang 15 and 17 with libc++ (`x86_64`)
47+
48+
## Caveat
49+
50+
Relacy relies on a less than robust approach to implement its runtime: it replaces
51+
std:: names with its own versions, for example, std::atomic and std::mutex, as well
52+
as pthread_* APIs. As libstdc++/libc++ evolve, newer versions may not be compatible with
53+
Relacy. In these cases, changes to Relacy are needed to correctly intercept and replace
54+
std:: names.
4055

41-
G++12 and newer are known to have issues that could be addressed with patches
42-
to Relacy.
56+
When the compilers and standard libraries release new versions, we will need to test the
57+
new versions can compile the stdexec Relacy tests before enabling the new compiler.

0 commit comments

Comments
 (0)