Skip to content

Test io-event against Ruby GC bug (blocking_operation safety)#171

Open
samuel-williams-shopify wants to merge 13 commits intomainfrom
test-ruby-gc-bug
Open

Test io-event against Ruby GC bug (blocking_operation safety)#171
samuel-williams-shopify wants to merge 13 commits intomainfrom
test-ruby-gc-bug

Conversation

@samuel-williams-shopify
Copy link
Copy Markdown
Contributor

Purpose

This workflow builds Ruby from samuel-williams-shopify/ruby @ bug/blocking-operation-gc using the same configure flags as ruby-dev-builder, then runs io-event's full test suite.

At commit 97aa28abab (current, unfixed): the blocking_operation VALUE in rb_fiber_scheduler_blocking_operation_wait is not GC-guarded. Under the -O3 --enable-shared build the compiler can keep it only in a register; when our scheduler causes a fiber switch the GC may collect it, causing a [BUG] Segmentation fault in get_blocking_operation().

After RB_GC_GUARD is added to the Ruby branch: the VALUE is forced onto the stack and the GC always finds it → tests pass.

Tracking

Made with Cursor

Builds Ruby from samuel-williams-shopify/ruby @ bug/blocking-operation-gc
with the same flags as ruby-dev-builder. At 97aa28abab (no fix) the
blocking_operation VALUE is not GC-guarded and the test suite should
crash. Once RB_GC_GUARD is applied the tests should pass.

Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 9, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, the C frame of rb_fiber_scheduler_blocking_operation_wait
is no longer active. In optimised builds (-O3 --enable-shared), blocking_operation
may be held only in a machine register not saved/scanned by the conservative GC,
allowing it to be collected. get_blocking_operation() at line 1104 then reads
freed/reused memory, crashing with rb_unexpected_object_type.

Confirmed by reproducing the crash using:
  ./configure --enable-shared --disable-install-doc --enable-yjit cppflags=-DENABLE_PATH_CHECK=0

RB_GC_GUARD(blocking_operation) after rb_funcall forces the compiler to keep
the VALUE on the stack (volatile read), ensuring the GC always finds it.

See: socketry/io-event#170
     socketry/io-event#171
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify and others added 2 commits May 9, 2026 22:50
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 9, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may only be in a machine register
not scanned by the conservative GC, allowing collection. Confirmed by
reproducing the crash (segfault in get_blocking_operation) with:
  ./configure --enable-shared --disable-install-doc --enable-yjit
RB_GC_GUARD forces the VALUE onto the stack ensuring the GC always finds it.

See: socketry/io-event#171
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 9, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may only be in a machine register
not scanned by the conservative GC, allowing collection. Confirmed by
reproducing the crash (segfault in get_blocking_operation) with:
  ./configure --enable-shared --disable-install-doc --enable-yjit
RB_GC_GUARD forces the VALUE onto the stack ensuring the GC always finds it.

See: socketry/io-event#171
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 9, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may only be in a machine register
not scanned by the conservative GC, allowing collection. Confirmed by
reproducing the crash (segfault in get_blocking_operation) with:
  ./configure --enable-shared --disable-install-doc --enable-yjit
RB_GC_GUARD forces the VALUE onto the stack ensuring the GC always finds it.

See: socketry/io-event#171
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify and others added 3 commits May 9, 2026 23:26
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 10, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may not be reachable via the
conservative GC scan of the suspended fiber's C stack.

rb_gc_register_address pins blocking_operation in the global GC root list,
which is always walked regardless of fiber state. The address is kept
registered through the last implicit use of the VALUE — including all accesses
via the raw  C pointer derived from it — so that a compacting GC
cannot move the object and leave  dangling.

Confirmed by reproducing the crash in io-event CI:
  ./configure --enable-shared --disable-install-doc --enable-yjit
See: socketry/io-event#171
     ruby#16908

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 10, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, blocking_operation may not be reachable via the
conservative GC scan of the suspended fiber's C stack.

rb_gc_register_address pins blocking_operation in the global GC root list,
which is always walked regardless of fiber state. The address is kept
registered through the last implicit use of the VALUE — including all accesses
via the raw  C pointer derived from it — so that a compacting GC
cannot move the object and leave  dangling.

Confirmed by reproducing the crash in io-event CI:
  ./configure --enable-shared --disable-install-doc --enable-yjit
See: socketry/io-event#171
     ruby#16908

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 10, 2026
…eration_wait

Use rb_gc_register_address to pin blocking_operation as a precise GC root
during rb_funcall. The scheduler's blocking_operation_wait may cause a fiber
switch via rb_fiber_scheduler_block, which suspends the calling fiber. The
conservative GC does not find the VALUE on the suspended fiber's C stack
(possibly due to it being in a machine register not captured in the saved
context), so the object can be collected or moved without updating the local
VALUE. rb_gc_register_address ensures the object is a precise root that is
always found and properly handled by both the regular and compacting GC.
rb_gc_unregister_address is called after the last use of the raw
pointer (which is derived from blocking_operation) to avoid a dangling
registered address.

Confirmed by io-event CI which reliably crashes without this fix and passes
with it: socketry/io-event#171

Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify and others added 3 commits May 10, 2026 09:30
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant