Skip to content

Commit fdbb204

Browse files
jhawthornluke-gruber
authored andcommitted
Only check malloc_increase atomically if necessary
Previously we did a non-atomic read of malloc_increase to determine whether or not we needed to gc on malloc. We also should only need to check the value when we have actually flushed/committed a new increase. Use relaxed load in atomic_sub_nounderflow The previous code did a non-atomic load of val, which would work fine (since the value would only be used for an atomic CAS) but resulted in TSAN errors. This alos adjusts the cas to use relaxed memory model, though I'm not sure that actually makes a difference anywhere. Use relaxed atomics for malloc increase as well Use atomics for loading deferred Use atomics for final slot count Use flag for is_lazy_sweeping WIP: simpler background thread (no-op for now) Adjustments Allow one more thread WIP: getting closer to checking end of sweep condition correctly dequeue usleep Attempt pre-sweep Fix accounting of free slots Sweep anything that !needs_cleanup Add TODO Free some T_OBJECTs in background thread get id2ref working Finish background sweeping before compaction gc_abort() waits for background sweeping to finish Add page->page_lock and lock it when changing freelist Right now only the mutator and the background thread need to lock it. We should re-init all these locks on fork due to Ractors (TODO). Make sure not background sweeping during Process.warmup Less locking/unlocking in gc_sweep_step_worker Better sweep_lock management reinit sweep_lock,sweep_cond after fork Allow sweep thread to acquire VM lock. It never joins a VM barrier. Add simply T_DATA freeing Also finish background freeing in ruby_vm_destruct. Allow taking VM lock in sweep thread Add fiber_pool_lock for cont.c Don't take VM lock in background sweep thread Make id2ref_tbl_lock re-entrant We can get the following: 1) RB_VM_LOCK() // need this to allow GC when inserting into tbl 2) id2ref_tbl_lock() // 3) insert into id2ref_tbl, causes GC 4) free object id, which acquires id2ref_tbl_lock again Therefore, the lock needs to be re-entrant. freeing of id2ref object_id in background thread Get gen fields freeing done in background sweep thread First pass at making zombies in background sweep thread add mutexes_lock lock for thread->keeping_mutexes We need to lock this when manipulating this linked list, because freeing a mutex, which can now be done in a background thread, manipulates it. I made this lock global for now, but it should really be either per-ractor or per-thread. Add autoload_free_lock We can't free autoload_data by 1 thread while also freeing an autoload_const that's associated with it concurrently. This can happen currently if they're on separate pages. Add assertions after major GC that background thread is inactive. I'm going to work on allowing background sweeping during a major (unless explicitly requested via GC.start). This is probably more important even than during minors. Add more assertions and comments whitelist making zombies in sweep thread sweep some imemos in background sweep thread tmp commit stuck tmp Get 1 pass over pages mostly working GC compaction tests are still broken. Not sure why. TODO: when in background thread, never modify the page's freelist directly in case user code is being run. Instead, each page should have a deferred_freelist that the user thread will link in when the page is swept. Merge freelist and deferred freelist when we process a page some cleanup Get GC compaction working, doesn't use background thread Fix running GC in cleanup finalizers stuck with GC compact Fix GC.compact Remove usage of page_lock mutex as we no longer need it. Keep actual lock around, but I'll remove it in a separate commit. GC: Remove unused page->page_lock mutex cleanup Remove unused code, add comments Background thread only sweeps until ruby thread is done with that heap There are some problems with the current approach: 1) The background thread can get ahead of the ruby thread on the current heap and sweep more than is necessary instead of moving on to the next heap. We should track `incremental_step_freed_objects` for each heap so ruby thread and background thread are in sync, and background thread can sweep next heap when necessary. 2) We need to restart the background sweeping when we exit from GC. There should be a `objspace->background sweep_mode` after GC exits and background sweeping begins. checkin Fix issues with parallel sweep Issues were: * post-fork issues * gc_sweep_dequeue_page/heap_is_sweep_done/has_sweeping_pages trio is tricky * rb_ec_cleanup issues (aborting bg sweeping, stopping thread) Fix issue with gc_sweep_rest() that could loop forever It could happen when background sweeping got ahead of the ruby thread. Fix more bugs Attempt to make has_sweeping_pages() faster We can make it even faster if we always let the ruby thread take the last page. This is what it used to do, and I think it was the right strategy in hindsight, just because of `has_sweeping_pages` and `gc_sweep_finish`. Otherwise, the ruby thread could need to wait on the background thread somtimes when it's called. Simplify end conditions by ruby thread taking last sweeping_page This is how it used to work, and I think it's a good idea to simplify checks for when sweeping is finished. Tracking down allocation bug Fixed allocation bug It had to do with adjacent bitfields being same memory object and used concurrently. Changed them to bools and it fixed the issue. Improve efficiency of requesting background sweep help Keep track of heap->latest_swept_page cleanup unlink pages in sweep thread Add to free_pages and pooled_pages in sweep thread remove redundant work Use atomic for heap->foreground_sweep_steps and separate swept_pages lock Add heap->skip_sweep_continue Add parallel sweep lock stats Output sweep time at end of process Add counts of sweep events less conditionals Change PSWEEP_LOCK_STATS to use per-callsite stats Add wall clock psweep timings first pass at rb_garbage_object_p with sweep thread Fix WB issues with sweep thread Use atomic operations for bitmaps that can be read/modified by both the mutator and the sweep thread. Avoid the tricky case of `gc_setup_mark_bits` by deferring it to sweep finish. This way, it doesn't conflict with write barriers. Make page->needs_setup_mark_bits its own memory object tmp commit before pairing Bug fix for mark T_NONE John found the fix.
1 parent e04267a commit fdbb204

22 files changed

Lines changed: 2893 additions & 316 deletions

class.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -625,6 +625,7 @@ class_get_subclasses_for_ns(struct st_table *tbl, VALUE box_id)
625625
static int
626626
remove_class_from_subclasses_replace_first_entry(st_data_t *key, st_data_t *value, st_data_t arg, int existing)
627627
{
628+
RUBY_ASSERT(existing);
628629
*value = arg;
629630
return ST_CONTINUE;
630631
}
@@ -647,6 +648,7 @@ remove_class_from_subclasses(struct st_table *tbl, VALUE box_id, VALUE klass)
647648

648649
if (first_entry) {
649650
if (next) {
651+
// NOTE: doesn't allocate, just replaces existing value
650652
st_update(tbl, box_id, remove_class_from_subclasses_replace_first_entry, (st_data_t)next);
651653
}
652654
else {

cont.c

Lines changed: 70 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,51 @@ rb_free_shared_fiber_pool(void)
287287

288288
static ID fiber_initialize_keywords[3] = {0};
289289

290+
rb_nativethread_lock_t fiber_lock;
291+
#ifdef RUBY_THREAD_PTHREAD_H
292+
pthread_t fiber_pool_lock_owner;
293+
#endif
294+
295+
static inline void
296+
ASSERT_fiber_pool_locked(void)
297+
{
298+
#ifdef RUBY_THREAD_PTHREAD_H
299+
VM_ASSERT(pthread_self() == fiber_pool_lock_owner);
300+
#endif
301+
}
302+
303+
static inline void
304+
ASSERT_fiber_pool_unlocked(void)
305+
{
306+
#ifdef RUBY_THREAD_PTHREAD_H
307+
VM_ASSERT(pthread_self() != fiber_pool_lock_owner);
308+
#endif
309+
}
310+
311+
static inline void
312+
fiber_pool_lock(void) {
313+
ASSERT_fiber_pool_unlocked();
314+
rb_native_mutex_lock(&fiber_lock);
315+
#ifdef RUBY_THREAD_PTHREAD_H
316+
fiber_pool_lock_owner = pthread_self();
317+
#endif
318+
}
319+
320+
static inline void
321+
fiber_pool_unlock(void) {
322+
ASSERT_fiber_pool_locked();
323+
#ifdef RUBY_THREAD_PTHREAD_H
324+
fiber_pool_lock_owner = 0;
325+
#endif
326+
rb_native_mutex_unlock(&fiber_lock);
327+
}
328+
329+
void
330+
fiber_pool_lock_reset(void)
331+
{
332+
rb_native_mutex_initialize(&fiber_lock);
333+
}
334+
290335
/*
291336
* FreeBSD require a first (i.e. addr) argument of mmap(2) is not NULL
292337
* if MAP_STACK is passed.
@@ -508,10 +553,12 @@ fiber_pool_allocate_memory(size_t * count, size_t stride)
508553
// @return the allocated fiber pool.
509554
// @sa fiber_pool_allocation_free
510555
static struct fiber_pool_allocation *
511-
fiber_pool_expand(struct fiber_pool * fiber_pool, size_t count)
556+
fiber_pool_expand(struct fiber_pool * fiber_pool, size_t count, bool needs_lock, bool unlock_before_raise)
512557
{
513558
struct fiber_pool_allocation * allocation;
514-
RB_VM_LOCK_ENTER();
559+
allocation = RB_ALLOC(struct fiber_pool_allocation); // needs to be outside of fiber lock
560+
561+
if (needs_lock) fiber_pool_lock();
515562
{
516563
STACK_GROW_DIR_DETECTION;
517564

@@ -522,11 +569,12 @@ fiber_pool_expand(struct fiber_pool * fiber_pool, size_t count)
522569
void * base = fiber_pool_allocate_memory(&count, stride);
523570

524571
if (base == NULL) {
572+
ruby_sized_xfree(allocation, sizeof(struct fiber_pool_allocation));
573+
if (unlock_before_raise) fiber_pool_unlock();
525574
rb_raise(rb_eFiberError, "can't alloc machine stack to fiber (%"PRIuSIZE" x %"PRIuSIZE" bytes): %s", count, size, ERRNOMSG);
526575
}
527576

528577
struct fiber_pool_vacancy * vacancies = fiber_pool->vacancies;
529-
allocation = RB_ALLOC(struct fiber_pool_allocation);
530578

531579
// Initialize fiber pool allocation:
532580
allocation->base = base;
@@ -552,6 +600,8 @@ fiber_pool_expand(struct fiber_pool * fiber_pool, size_t count)
552600

553601
if (!VirtualProtect(page, RB_PAGE_SIZE, PAGE_READWRITE | PAGE_GUARD, &old_protect)) {
554602
VirtualFree(allocation->base, 0, MEM_RELEASE);
603+
ruby_sized_xfree(allocation, sizeof(struct fiber_pool_allocation));
604+
if (unlock_before_raise) fiber_pool_unlock();
555605
rb_raise(rb_eFiberError, "can't set a guard page: %s", ERRNOMSG);
556606
}
557607
#elif defined(__wasi__)
@@ -560,6 +610,8 @@ fiber_pool_expand(struct fiber_pool * fiber_pool, size_t count)
560610
#else
561611
if (mprotect(page, RB_PAGE_SIZE, PROT_NONE) < 0) {
562612
munmap(allocation->base, count*stride);
613+
ruby_sized_xfree(allocation, sizeof(struct fiber_pool_allocation));
614+
if (unlock_before_raise) fiber_pool_unlock();
563615
rb_raise(rb_eFiberError, "can't set a guard page: %s", ERRNOMSG);
564616
}
565617
#endif
@@ -590,7 +642,7 @@ fiber_pool_expand(struct fiber_pool * fiber_pool, size_t count)
590642
fiber_pool->vacancies = vacancies;
591643
fiber_pool->count += count;
592644
}
593-
RB_VM_LOCK_LEAVE();
645+
if (needs_lock) fiber_pool_unlock();
594646

595647
return allocation;
596648
}
@@ -612,7 +664,7 @@ fiber_pool_initialize(struct fiber_pool * fiber_pool, size_t size, size_t count,
612664

613665
fiber_pool->vm_stack_size = vm_stack_size;
614666

615-
fiber_pool_expand(fiber_pool, count);
667+
fiber_pool_expand(fiber_pool, count, false, false);
616668
}
617669

618670
#ifdef FIBER_POOL_ALLOCATION_FREE
@@ -665,7 +717,7 @@ static struct fiber_pool_stack
665717
fiber_pool_stack_acquire(struct fiber_pool * fiber_pool)
666718
{
667719
struct fiber_pool_vacancy * vacancy ;
668-
RB_VM_LOCK_ENTER();
720+
fiber_pool_lock();
669721
{
670722
vacancy = fiber_pool_vacancy_pop(fiber_pool);
671723

@@ -679,7 +731,7 @@ fiber_pool_stack_acquire(struct fiber_pool * fiber_pool)
679731
if (count > maximum) count = maximum;
680732
if (count < minimum) count = minimum;
681733

682-
fiber_pool_expand(fiber_pool, count);
734+
fiber_pool_expand(fiber_pool, count, false, true);
683735

684736
// The free list should now contain some stacks:
685737
VM_ASSERT(fiber_pool->vacancies);
@@ -700,10 +752,9 @@ fiber_pool_stack_acquire(struct fiber_pool * fiber_pool)
700752
#ifdef FIBER_POOL_ALLOCATION_FREE
701753
vacancy->stack.allocation->used += 1;
702754
#endif
703-
704755
fiber_pool_stack_reset(&vacancy->stack);
705756
}
706-
RB_VM_LOCK_LEAVE();
757+
fiber_pool_unlock();
707758

708759
return vacancy->stack;
709760
}
@@ -769,7 +820,7 @@ fiber_pool_stack_free(struct fiber_pool_stack * stack)
769820
#endif
770821
}
771822

772-
// Release and return a stack to the vacancy list.
823+
// Release and return a stack to the vacancy list. fiber_lock is acquired upon entry.
773824
static void
774825
fiber_pool_stack_release(struct fiber_pool_stack * stack)
775826
{
@@ -920,17 +971,6 @@ fiber_stack_release(rb_fiber_t * fiber)
920971
rb_ec_clear_vm_stack(ec);
921972
}
922973

923-
static void
924-
fiber_stack_release_locked(rb_fiber_t *fiber)
925-
{
926-
if (!ruby_vm_during_cleanup) {
927-
// We can't try to acquire the VM lock here because MMTK calls free in its own native thread which has no ec.
928-
// This assertion will fail on MMTK but we currently don't have CI for debug releases of MMTK, so we can assert for now.
929-
ASSERT_vm_locking_with_barrier();
930-
}
931-
fiber_stack_release(fiber);
932-
}
933-
934974
static const char *
935975
fiber_status_name(enum fiber_status s)
936976
{
@@ -1091,7 +1131,11 @@ cont_free(void *ptr)
10911131
else {
10921132
rb_fiber_t *fiber = (rb_fiber_t*)cont;
10931133
coroutine_destroy(&fiber->context);
1094-
fiber_stack_release_locked(fiber);
1134+
fiber_pool_lock();
1135+
{
1136+
fiber_stack_release(fiber);
1137+
}
1138+
fiber_pool_unlock();
10951139
}
10961140

10971141
RUBY_FREE_UNLESS_NULL(cont->saved_vm_stack.ptr);
@@ -2756,9 +2800,11 @@ fiber_switch(rb_fiber_t *fiber, int argc, const VALUE *argv, int kw_splat, rb_fi
27562800
// We cannot free the stack until the pthread is joined:
27572801
#ifndef COROUTINE_PTHREAD_CONTEXT
27582802
if (resuming_fiber && FIBER_TERMINATED_P(fiber)) {
2759-
RB_VM_LOCKING() {
2803+
fiber_pool_lock();
2804+
{
27602805
fiber_stack_release(fiber);
27612806
}
2807+
fiber_pool_unlock();
27622808
}
27632809
#endif
27642810

@@ -3475,6 +3521,7 @@ Init_Cont(void)
34753521
#endif
34763522
SET_MACHINE_STACK_END(&th->ec->machine.stack_end);
34773523

3524+
rb_native_mutex_initialize(&fiber_lock);
34783525
fiber_pool_initialize(&shared_fiber_pool, stack_size, FIBER_POOL_INITIAL_SIZE, vm_stack_size);
34793526

34803527
fiber_initialize_keywords[0] = rb_intern_const("blocking");

darray.h

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,10 @@
4747
#define rb_darray_append_without_gc(ptr_to_ary, element) \
4848
rb_darray_append_impl(ptr_to_ary, element, rb_darray_realloc_mul_add_without_gc)
4949

50+
//#define rb_darray_clear_and_free_without_gc(ptr_to_ary) \
51+
//rb_darray_size(ptr_to_ary) ? (rb_darray_free_without_gc(ptr_to_ary)) : (void)0
52+
53+
5054
#define rb_darray_append_impl(ptr_to_ary, element, realloc_func) do { \
5155
rb_darray_ensure_space((ptr_to_ary), \
5256
sizeof(**(ptr_to_ary)), \
@@ -134,6 +138,21 @@ rb_darray_size(const void *ary)
134138
}
135139

136140

141+
/* Remove n items from the beginning of the array */
142+
#define rb_darray_shift_n(ary, n) rb_darray_shift_n_impl(ary, ary->data, n, sizeof((ary)->data[0]))
143+
144+
static inline void
145+
rb_darray_shift_n_impl(void *ary, void *data, size_t n, size_t type_sz)
146+
{
147+
rb_darray_meta_t *meta = ary;
148+
RUBY_ASSERT(meta->size >= n);
149+
char *dst = (char*)data;
150+
if (n > 0) {
151+
memmove(dst, dst + n * type_sz, (meta->size - n) * type_sz);
152+
meta->size -= n;
153+
}
154+
}
155+
137156
static inline void
138157
rb_darray_pop(void *ary, size_t count)
139158
{
@@ -199,14 +218,23 @@ rb_darray_realloc_mul_add(void *orig_ptr, size_t x, size_t y, size_t z)
199218
return ptr;
200219
}
201220

221+
bool is_sweep_thread_p(void);
222+
202223
/* Internal function. Like rb_xrealloc_mul_add but does not trigger GC. */
203224
static inline void *
204225
rb_darray_realloc_mul_add_without_gc(void *orig_ptr, size_t x, size_t y, size_t z)
205226
{
206227
size_t size = rbimpl_size_add_or_raise(rbimpl_size_mul_or_raise(x, y), z);
207228

208229
void *ptr = realloc(orig_ptr, size);
209-
if (ptr == NULL) rb_bug("rb_darray_realloc_mul_add_without_gc: failed");
230+
if (ptr == NULL) {
231+
if (!is_sweep_thread_p()) {
232+
rb_bug("rb_darray_realloc_mul_add_without_gc: failed");
233+
}
234+
else {
235+
fprintf(stderr, "darray: realloc failed (from sweep thread)\n");
236+
}
237+
}
210238

211239
return ptr;
212240
}

error.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1108,11 +1108,16 @@ rb_bug_without_die(const char *fmt, ...)
11081108
va_end(args);
11091109
}
11101110

1111+
bool is_sweep_thread_p(void);
1112+
11111113
void
11121114
rb_bug(const char *fmt, ...)
11131115
{
11141116
va_list args;
11151117
va_start(args, fmt);
1118+
if (is_sweep_thread_p()) {
1119+
fprintf(stderr, "rb_bug() called from sweep_thread!\n");
1120+
}
11161121
rb_bug_without_die_internal(fmt, args);
11171122
va_end(args);
11181123
die();

0 commit comments

Comments
 (0)