Skip to content

feat(smp): re-add an anti-spurious-wakeup feature#2470

Open
zyuiop wants to merge 3 commits into
hermit-os:mainfrom
zyuiop:feat/smp-reduce-interrupts
Open

feat(smp): re-add an anti-spurious-wakeup feature#2470
zyuiop wants to merge 3 commits into
hermit-os:mainfrom
zyuiop:feat/smp-reduce-interrupts

Conversation

@zyuiop

@zyuiop zyuiop commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

As a follow-up to #2468, try to re-implement the logic I removed previously, but this time avoiding any potential race condition (hopefully??).

@zyuiop zyuiop force-pushed the feat/smp-reduce-interrupts branch from 54cfef7 to b69fae2 Compare June 6, 2026 16:16

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Details
Benchmark Current: c88e434 Previous: 30cb3f9 Performance Ratio
startup_benchmark Build Time 77.61 s 78.59 s 0.99
startup_benchmark File Size 0.76 MB 0.76 MB 1.00
Startup Time - 1 core 0.74 s (±0.02 s) 0.73 s (±0.02 s) 1.02
Startup Time - 2 cores 0.73 s (±0.02 s) 0.75 s (±0.02 s) 0.98
Startup Time - 4 cores 0.75 s (±0.02 s) 0.76 s (±0.02 s) 1.00
multithreaded_benchmark Build Time 80.67 s 80.39 s 1.00
multithreaded_benchmark File Size 0.82 MB 0.82 MB 1.01
Multithreaded Pi Efficiency - 2 Threads 90.93 % (±6.08 %) 89.59 % (±5.95 %) 1.02
Multithreaded Pi Efficiency - 4 Threads 44.49 % (±3.62 %) 43.86 % (±2.44 %) 1.01
Multithreaded Pi Efficiency - 8 Threads 26.16 % (±1.37 %) 25.65 % (±1.39 %) 1.02
micro_benchmarks Build Time 87.03 s 87.27 s 1.00
micro_benchmarks File Size 0.83 MB 0.82 MB 1.01
Scheduling time - 1 thread 59.80 ticks (±1.77 ticks) 64.58 ticks (±2.95 ticks) 0.93
Scheduling time - 2 threads 33.31 ticks (±2.54 ticks) 35.27 ticks (±2.44 ticks) 0.94
Micro - Time for syscall (getpid) 2.80 ticks (±0.20 ticks) 2.72 ticks (±0.18 ticks) 1.03
Memcpy speed - (built_in) block size 4096 84981.71 MByte/s (±58667.63 MByte/s) 84336.36 MByte/s (±58124.47 MByte/s) 1.01
Memcpy speed - (built_in) block size 1048576 30873.90 MByte/s (±25037.42 MByte/s) 30954.90 MByte/s (±25149.11 MByte/s) 1.00
Memcpy speed - (built_in) block size 16777216 27912.52 MByte/s (±23154.88 MByte/s) 27618.34 MByte/s (±22854.11 MByte/s) 1.01
Memset speed - (built_in) block size 4096 85091.75 MByte/s (±58745.19 MByte/s) 84961.36 MByte/s (±58506.83 MByte/s) 1.00
Memset speed - (built_in) block size 1048576 31757.79 MByte/s (±25600.09 MByte/s) 31697.81 MByte/s (±25573.78 MByte/s) 1.00
Memset speed - (built_in) block size 16777216 28360.23 MByte/s (±23342.98 MByte/s) 28408.60 MByte/s (±23353.48 MByte/s) 1.00
Memcpy speed - (rust) block size 4096 75285.07 MByte/s (±52498.42 MByte/s) 74740.31 MByte/s (±52172.06 MByte/s) 1.01
Memcpy speed - (rust) block size 1048576 31130.50 MByte/s (±25332.13 MByte/s) 30989.85 MByte/s (±25131.37 MByte/s) 1.00
Memcpy speed - (rust) block size 16777216 27938.15 MByte/s (±23241.99 MByte/s) 27766.19 MByte/s (±22932.85 MByte/s) 1.01
Memset speed - (rust) block size 4096 75500.38 MByte/s (±52636.73 MByte/s) 75196.89 MByte/s (±52449.92 MByte/s) 1.00
Memset speed - (rust) block size 1048576 31927.34 MByte/s (±25795.65 MByte/s) 31748.80 MByte/s (±25574.00 MByte/s) 1.01
Memset speed - (rust) block size 16777216 28478.15 MByte/s (±23491.83 MByte/s) 28552.59 MByte/s (±23427.10 MByte/s) 1.00
alloc_benchmarks Build Time 80.85 s 81.63 s 0.99
alloc_benchmarks File Size 0.84 MB 0.84 MB 1.00
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 4686.19 Ticks (±1512.86 Ticks) 5722.73 Ticks (±63.96 Ticks) 0.82
Allocations - Average Allocation time (no fail) 4686.19 Ticks (±1512.86 Ticks) 5722.73 Ticks (±63.96 Ticks) 0.82
Allocations - Average Deallocation time 860.91 Ticks (±149.07 Ticks) 1530.42 Ticks (±212.44 Ticks) 0.56
mutex_benchmark Build Time 103.71 s 80.99 s 1.28
mutex_benchmark File Size 0.83 MB 0.82 MB 1.01
Mutex Stress Test Average Time per Iteration - 1 Threads 12.02 ns (±0.32 ns) 12.10 ns (±0.36 ns) 0.99
Mutex Stress Test Average Time per Iteration - 2 Threads 13.66 ns (±0.51 ns) 17.00 ns (±3.01 ns) 0.80

This comment was automatically generated by workflow using github-action-benchmark.

@zyuiop zyuiop force-pushed the feat/smp-reduce-interrupts branch from b69fae2 to 8e50433 Compare June 6, 2026 17:00
@mkroening mkroening self-requested a review June 8, 2026 13:47
@mkroening mkroening self-assigned this Jun 8, 2026
This one has tons of comments to convince myself it works. I am relatively convinced, but concurrent processing is hard you know?
@zyuiop zyuiop force-pushed the feat/smp-reduce-interrupts branch from 8e50433 to c88e434 Compare June 17, 2026 12:47

@mkroening mkroening left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! :)

This does fix the laplace performance issue on my nested Intel VM (Hermit VM started with 4 cores):

Regarding the code:
Could you move this into a sleep_state.rs file and call it SleepState? That way, the gates should be less scattered, and the name might be clearer.

Please also don't use #[inline(always)]. For reference, see When to #[inline] - Standard library developers Guide. #[inline] is okay if the function is very small.

Regarding the atomics: relaxed ordering should be fine here, no? We don't use atomics here to build a synchronization primitive to synchronize memory access and are only interested in the values to make decisions.

Comment thread src/scheduler/mod.rs Outdated
Comment thread src/scheduler/mod.rs Outdated
Comment thread src/scheduler/mod.rs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants