Skip to content

Commit abf876b

Browse files
committed
Set thread QoS to USER_INITIATED on Apple Silicon
On Apple Silicon Macs, TBB worker threads are created with the default QoS class, which macOS may schedule to efficiency cores even when performance cores are available. This significantly degrades parallel performance. This adds a pthread_set_qos_class_self_np() call in on_scheduler_entry() to set USER_INITIATED QoS, signaling to macOS that these are compute threads the user is waiting for. This causes macOS to prefer performance cores when available. Fixes #3277
1 parent 42063be commit abf876b

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

stan/math/rev/core/init_chainablestack.hpp

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@
1111
#include <thread>
1212
#include <tuple>
1313

14+
#ifdef __APPLE__
15+
#include <pthread.h>
16+
#include <sys/qos.h>
17+
#endif
18+
1419
namespace stan {
1520
namespace math {
1621

@@ -20,6 +25,10 @@ namespace math {
2025
* hook ensures that each worker thread has an initialized AD tape
2126
* ready for use.
2227
*
28+
* On Apple Silicon, this also sets the thread QoS class to
29+
* USER_INITIATED so that macOS prefers scheduling compute threads
30+
* on performance cores rather than efficiency cores.
31+
*
2332
* Refer to
2433
* https://software.intel.com/content/www/us/en/develop/documentation/tbb-documentation/top/intel-threading-building-blocks-developer-reference/task-scheduler/taskschedulerobserver.html
2534
* for details on the observer concept.
@@ -37,6 +46,13 @@ class ad_tape_observer final : public tbb::task_scheduler_observer {
3746
~ad_tape_observer() { observe(false); }
3847

3948
void on_scheduler_entry(bool worker) {
49+
#ifdef __APPLE__
50+
#if defined(__arm64__) || defined(__aarch64__)
51+
// Set thread QoS to USER_INITIATED so macOS prefers scheduling
52+
// TBB worker threads on performance cores rather than efficiency cores.
53+
pthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0);
54+
#endif
55+
#endif
4056
std::lock_guard<std::mutex> thread_tape_map_lock(thread_tape_map_mutex_);
4157
const std::thread::id thread_id = std::this_thread::get_id();
4258
if (thread_tape_map_.find(thread_id) == thread_tape_map_.end()) {

0 commit comments

Comments
 (0)