Skip to content

Commit caae9bf

Browse files
paracycleclaude
andcommitted
Add cpu heap mode based on Tavakolisomeh et al. (MPLR '23)
Adds MMTK_HEAP_MODE=cpu, a dynamic heap-sizing policy that grows or shrinks the heap after each GC cycle to keep measured GC CPU overhead near a configurable target. The control law follows Tavakolisomeh et al., 'Heap Size Adjustment with CPU Control', MPLR '23: a sigmoid of the (averaged) GC CPU overhead error in (-inf, +inf) maps to a heap-size adjustment factor in (0.5, 1.5). Implementation lives alongside the existing 'ruby' delegated trigger in gc/mmtk/src/heap/. T_GC is wall-clock GC duration; T_APP is process CPU time delta read via clock_gettime(CLOCK_PROCESS_CPUTIME_ID), which correctly credits multi-threaded mutator parallelism. Nursery-only generational GCs are skipped so the trigger only re-sizes at full collections. Configuration: MMTK_GC_CPU_TARGET target GC CPU overhead, percent. Default 5. MMTK_GC_CPU_WINDOW number of recent cycles averaged. Default 3. The default differs from the paper's recommended 15. The paper targets ZGC, a concurrent generational collector; MMTk-Ruby currently ships stop-the-world Immix, where every percent of GC CPU also blocks the mutator. An empirical sweep of MMTK_GC_CPU_TARGET across ruby-bench (railsbench, lobsters, psych-load, liquid-render, lee) found 5-6 to be Pareto-optimal vs the existing 'ruby' heap mode: about 6 percent geomean throughput improvement at essentially equal peak RSS. Targets >=10 trade large amounts of throughput for modest RSS savings on this collector. bin/smoke-test, bin/ruby-mmtk-mode, bin/compare-heap-modes, and doc/testing-cpu-heap-mode.md are included so reviewers and future contributors can reproduce the sweep against ruby/ruby-bench. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent b0f7cb4 commit caae9bf

10 files changed

Lines changed: 923 additions & 17 deletions

File tree

README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,12 @@ This repository holds the [MMTk](https://www.mmtk.io/) bindings for Ruby. The bi
2020
After building Ruby and the MMTk bindings, run Ruby with `RUBY_GC_LIBRARY=mmtk` environment variable. You can also configure the following environment variables:
2121

2222
- `MMTK_PLAN=<NoGC|MarkSweep|Immix>`: Configures the GC algorithm used by MMTk. Defaults to `Immix`.
23-
- `MMTK_HEAP_MODE=<fixed|dynamic>`: Configures the MMTk heap used. `fixed` is a fixed size heap, `dynamic` is a dynamic sized heap that will grow and shrink in size based on heuristics using the [MemBalancer](https://dl.acm.org/doi/pdf/10.1145/3563323) algorithm. Defaults to `dynamic`.
24-
- `MMTK_HEAP_MIN=<size>`: Configures the lower bound in heap memory usage by MMTk. Only valid when `MMTK_HEAP_MODE=dynamic`. `size` is in bytes, but you can also append `KiB`, `MiB`, `GiB` for larger sizes. Defaults to 1MiB.
23+
- `MMTK_HEAP_MODE=<fixed|dynamic|ruby|cpu>`: Configures the MMTk heap used. Defaults to `dynamic`.
24+
- `fixed`: a fixed size heap.
25+
- `dynamic`: a dynamic sized heap that will grow and shrink in size based on heuristics using the [MemBalancer](https://dl.acm.org/doi/pdf/10.1145/3563323) algorithm.
26+
- `ruby`: a dynamic sized heap that grows and shrinks based on the ratio of free to used slots, using the same `RUBY_GC_HEAP_FREE_SLOTS_*_RATIO` env vars as the default Ruby GC.
27+
- `cpu`: a dynamic sized heap that adjusts itself to hit a target GC CPU overhead, using the algorithm from [Tavakolisomeh et al., "Heap Size Adjustment with CPU Control" (MPLR '23)](https://dl.acm.org/doi/10.1145/3617651.3622988). Tunable via `MMTK_GC_CPU_TARGET` and `MMTK_GC_CPU_WINDOW` (see below).
28+
- `MMTK_HEAP_MIN=<size>`: Configures the lower bound in heap memory usage by MMTk. Only valid when `MMTK_HEAP_MODE` is `dynamic`, `ruby`, or `cpu`. `size` is in bytes, but you can also append `KiB`, `MiB`, `GiB` for larger sizes. Defaults to 1MiB.
2529
- `MMTK_HEAP_MAX=<size>`: Configures the upper bound in heap memory usage by MMTk. Once this limit is reached and no objects can be garbage collected, it will crash with an out-of-memory. `size` is in bytes, but you can also append `KiB`, `MiB`, `GiB` for larger sizes. Defaults to 80% of your system RAM.
30+
- `MMTK_GC_CPU_TARGET=<percent>`: Target GC CPU overhead, as a percentage, when `MMTK_HEAP_MODE=cpu`. After each GC cycle, the heap is grown if the measured GC CPU overhead exceeds this target and shrunk if it falls below. Defaults to `5`. The paper recommends `15` for the concurrent collector it targets (ZGC), but on MMTk-Ruby's stop-the-world Immix every percent of GC CPU also blocks the mutator, so a smaller budget gives better throughput. Empirical sweeps across ruby-bench find 5 Pareto-optimal vs. the `ruby` heap mode (~6% geomean speedup at essentially equal peak RSS).
31+
- `MMTK_GC_CPU_WINDOW=<n>`: Number of recent GC cycles averaged when measuring GC CPU overhead for `MMTK_HEAP_MODE=cpu`. Larger values smooth the signal at the cost of responsiveness. Defaults to `3`.

bin/compare-heap-modes

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
#!/usr/bin/env bash
2+
# Compare MMTk heap modes on ruby-bench.
3+
#
4+
# Runs the ruby-bench suite (expected checked out at $RUBY_BENCH_DIR, default
5+
# ../ruby-bench) with two Ruby "executables" that are the same modular-GC
6+
# Ruby, but wrapped so each sets a different MMTK_HEAP_MODE.
7+
#
8+
# Required env:
9+
# RUBY_BIN Path to a Ruby built with --with-modular-gc and the
10+
# MMTk binding installed. (e.g. ~/.rubies/ruby-mmtk/bin/ruby)
11+
#
12+
# Optional env:
13+
# RUBY_BENCH_DIR Path to ruby-bench checkout (default: ../ruby-bench)
14+
# MODES Space-separated list of heap modes to compare
15+
# (default: "ruby cpu"). Others: "fixed dynamic".
16+
# BENCHES Space-separated list of benchmarks to run (default: a
17+
# curated small-but-GC-sensitive set). Pass empty string
18+
# "" to run the whole default suite.
19+
# WARMUP WARMUP_ITRS (default 5)
20+
# BENCH MIN_BENCH_ITRS (default 10)
21+
# TIME MIN_BENCH_TIME (default 20)
22+
# MMTK_GC_CPU_TARGET CPU overhead target for `cpu` mode (default 5)
23+
# MMTK_GC_CPU_WINDOW averaging window for `cpu` mode (default 3)
24+
#
25+
# Example:
26+
# RUBY_BIN=~/.rubies/ruby-mmtk/bin/ruby \
27+
# bin/compare-heap-modes
28+
#
29+
# RUBY_BIN=~/.rubies/ruby-mmtk/bin/ruby MODES="ruby cpu dynamic" \
30+
# BENCHES="liquid-render psych-load railsbench" \
31+
# bin/compare-heap-modes
32+
33+
set -euo pipefail
34+
35+
if [ -z "${RUBY_BIN:-}" ]; then
36+
echo "error: RUBY_BIN must be set to a Ruby built with --with-modular-gc" >&2
37+
exit 64
38+
fi
39+
if [ ! -x "$RUBY_BIN" ]; then
40+
echo "error: RUBY_BIN=$RUBY_BIN is not executable" >&2
41+
exit 64
42+
fi
43+
44+
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
45+
BENCH_DIR="${RUBY_BENCH_DIR:-$REPO_ROOT/../ruby-bench}"
46+
if [ ! -d "$BENCH_DIR" ]; then
47+
echo "error: ruby-bench checkout not found at $BENCH_DIR" >&2
48+
echo " clone it with: git clone https://github.com/ruby/ruby-bench $BENCH_DIR" >&2
49+
exit 66
50+
fi
51+
52+
# Put RUBY_BIN's bin directory first on PATH so `bundle exec`, `ruby`, and any
53+
# shebangs invoking `ruby` resolve to the modular-GC Ruby instead of whatever
54+
# system Ruby comes first in the caller's environment.
55+
RUBY_BIN_DIR="$(cd "$(dirname "$RUBY_BIN")" && pwd)"
56+
export PATH="$RUBY_BIN_DIR:$PATH"
57+
58+
# Clear gem-path vars that might still point at a different Ruby's gems.
59+
unset GEM_HOME GEM_PATH BUNDLE_PATH RUBYLIB RUBYOPT 2>/dev/null || true
60+
61+
MODES=${MODES:-"ruby cpu"}
62+
# A curated GC-sensitive subset. Override with BENCHES="".
63+
DEFAULT_BENCHES="liquid-render psych-load railsbench lee binarytrees"
64+
if [ -z "${BENCHES+x}" ]; then
65+
BENCHES="$DEFAULT_BENCHES"
66+
fi
67+
68+
export WARMUP_ITRS="${WARMUP:-5}"
69+
export MIN_BENCH_ITRS="${BENCH:-10}"
70+
export MIN_BENCH_TIME="${TIME:-20}"
71+
72+
# Export tunables so all wrapped runs see the same values. The `ruby` mode
73+
# ignores MMTK_GC_CPU_*; the `cpu` mode ignores RUBY_GC_HEAP_*.
74+
export MMTK_GC_CPU_TARGET="${MMTK_GC_CPU_TARGET:-5}"
75+
export MMTK_GC_CPU_WINDOW="${MMTK_GC_CPU_WINDOW:-3}"
76+
77+
WRAPPER="$REPO_ROOT/bin/ruby-mmtk-mode"
78+
RUBY_ARGS=()
79+
for mode in $MODES; do
80+
RUBY_ARGS+=(-e "mmtk-$mode::$WRAPPER $mode -- ")
81+
done
82+
83+
cd "$BENCH_DIR"
84+
85+
echo "== compare-heap-modes =="
86+
echo "ruby_bin: $RUBY_BIN"
87+
echo "modes: $MODES"
88+
echo "benches: ${BENCHES:-<all>}"
89+
echo "warmup: $WARMUP_ITRS"
90+
echo "bench: $MIN_BENCH_ITRS iters / $MIN_BENCH_TIME s min"
91+
echo "cpu target:$MMTK_GC_CPU_TARGET% window=$MMTK_GC_CPU_WINDOW"
92+
echo "---"
93+
94+
# `--rss` records peak RSS per run, essential for comparing memory footprint
95+
# between heap-sizing policies.
96+
# `--no-sudo` skips CPU governor / turbo tweaks that would need root.
97+
export RUBY_BIN
98+
exec bundle exec ./run_benchmarks.rb --no-sudo --rss "${RUBY_ARGS[@]}" ${BENCHES:-}

bin/ruby-mmtk-mode

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
#!/bin/sh
2+
# Wrapper that invokes a modular-GC Ruby with MMTk + a specific MMTK_HEAP_MODE.
3+
#
4+
# ruby-bench's run_benchmarks.rb compares Ruby executables passed via `-e`.
5+
# Each `-e` entry is a single command line, so to compare two MMTk heap modes
6+
# we need one executable per mode with the relevant env vars already baked in.
7+
#
8+
# Usage:
9+
# bin/ruby-mmtk-mode <mode> [-- extra env VAR=VAL ...] -- <ruby args...>
10+
#
11+
# The caller is expected to set RUBY_BIN to the path of a Ruby built with
12+
# --with-modular-gc and the MMTk binding installed (or to have it on $PATH).
13+
#
14+
# Examples:
15+
# RUBY_BIN=$HOME/.rubies/ruby-mmtk/bin/ruby bin/ruby-mmtk-mode ruby -- -e 'puts GC.config'
16+
# RUBY_BIN=$HOME/.rubies/ruby-mmtk/bin/ruby bin/ruby-mmtk-mode cpu -- -e 'puts GC.config'
17+
18+
set -eu
19+
20+
if [ $# -lt 1 ]; then
21+
echo "usage: $0 <heap_mode> [VAR=VAL ...] -- <ruby args>" >&2
22+
exit 64
23+
fi
24+
25+
MODE=$1
26+
shift
27+
28+
# Optional additional env vars before the `--` separator.
29+
while [ $# -gt 0 ] && [ "$1" != "--" ]; do
30+
case "$1" in
31+
*=*) export "$1" ;;
32+
*)
33+
echo "$0: expected VAR=VAL or --, got: $1" >&2
34+
exit 64
35+
;;
36+
esac
37+
shift
38+
done
39+
if [ $# -gt 0 ] && [ "$1" = "--" ]; then
40+
shift
41+
fi
42+
43+
RUBY=${RUBY_BIN:-ruby}
44+
45+
exec env \
46+
RUBY_GC_LIBRARY=mmtk \
47+
MMTK_HEAP_MODE="$MODE" \
48+
"$RUBY" "$@"

bin/smoke-test

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
#!/usr/bin/env ruby
2+
# frozen_string_literal: true
3+
4+
# Smoke test for MMTk heap modes.
5+
#
6+
# Runs an allocation-heavy loop under a given MMTK_HEAP_MODE and reports:
7+
# - the mode Ruby actually booted with (GC.config)
8+
# - GC cycle count triggered during the loop
9+
# - wall-clock time and process CPU time
10+
# - peak resident set size (peak RSS)
11+
#
12+
# Usage (after `rake install:release` against a modular-GC Ruby):
13+
#
14+
# bin/smoke-test # defaults to MMTK_HEAP_MODE=cpu
15+
# MMTK_HEAP_MODE=ruby bin/smoke-test
16+
# MMTK_HEAP_MODE=cpu MMTK_GC_CPU_TARGET=10 bin/smoke-test
17+
# SMOKE_ITERATIONS=2_000_000 bin/smoke-test # longer run for trigger to adapt
18+
#
19+
# If this script is run without RUBY_GC_LIBRARY=mmtk set, it will re-exec
20+
# itself with that env var plus whatever other MMTK_* vars you passed.
21+
22+
unless ENV["RUBY_GC_LIBRARY"] == "mmtk"
23+
ENV["RUBY_GC_LIBRARY"] = "mmtk"
24+
ENV["MMTK_HEAP_MODE"] ||= "cpu"
25+
exec(RbConfig.ruby, __FILE__, *ARGV)
26+
end
27+
28+
impl = GC.config[:implementation]
29+
unless impl == "mmtk"
30+
abort "smoke-test: expected GC implementation 'mmtk', got #{impl.inspect}. " \
31+
"Is your Ruby built with --with-modular-gc and is the binding installed?"
32+
end
33+
34+
require "fiddle"
35+
36+
# getrusage(RUSAGE_SELF) returns peak RSS in ru_maxrss. On macOS the value is
37+
# in bytes; on Linux it's in kilobytes.
38+
module Rusage
39+
extend self
40+
41+
RUSAGE_SELF = 0
42+
43+
# struct rusage on macOS/Linux: first two fields are ru_utime / ru_stime
44+
# (struct timeval = { long, long }), then a series of long integers.
45+
# ru_maxrss is the 3rd long integer (offset after the 2 timevals).
46+
# Each field here is a 64-bit long on 64-bit platforms.
47+
# Layout (all i64):
48+
# [0..1] ru_utime (sec, usec)
49+
# [2..3] ru_stime (sec, usec)
50+
# [4] ru_maxrss <-- what we want
51+
# ... more fields we don't use
52+
STRUCT_LONGS = 18
53+
54+
def peak_rss_bytes
55+
libc = Fiddle::Handle::DEFAULT
56+
getrusage = Fiddle::Function.new(
57+
libc["getrusage"], [Fiddle::TYPE_INT, Fiddle::TYPE_VOIDP], Fiddle::TYPE_INT
58+
)
59+
buf = Fiddle::Pointer.malloc(STRUCT_LONGS * Fiddle::SIZEOF_LONG, Fiddle::RUBY_FREE)
60+
raise "getrusage failed" unless getrusage.call(RUSAGE_SELF, buf) == 0
61+
maxrss = buf[4 * Fiddle::SIZEOF_LONG, Fiddle::SIZEOF_LONG].unpack1("q")
62+
# macOS reports bytes, Linux reports kilobytes.
63+
RbConfig::CONFIG["host_os"].include?("darwin") ? maxrss : maxrss * 1024
64+
end
65+
end
66+
67+
puts "== MMTk smoke test =="
68+
puts "implementation: #{GC.config[:implementation]}"
69+
puts "mmtk_plan: #{GC.config[:mmtk_plan]}"
70+
puts "mmtk_heap_mode: #{GC.config[:mmtk_heap_mode]}"
71+
puts "mmtk_heap_min: #{GC.config[:mmtk_heap_min]}" if GC.config[:mmtk_heap_min]
72+
puts "mmtk_heap_max: #{GC.config[:mmtk_heap_max]}"
73+
puts "mmtk_worker_count: #{GC.config[:mmtk_worker_count]}"
74+
if GC.config[:mmtk_heap_mode] == "cpu"
75+
puts "cpu target (env): #{ENV.fetch('MMTK_GC_CPU_TARGET', '5')}%"
76+
puts "cpu window (env): #{ENV.fetch('MMTK_GC_CPU_WINDOW', '3')}"
77+
end
78+
puts "---"
79+
80+
ITERATIONS = Integer(ENV.fetch("SMOKE_ITERATIONS", 500_000))
81+
OBJECT_SIZE = Integer(ENV.fetch("SMOKE_OBJECT_SIZE", 256))
82+
LIVE_SET = Integer(ENV.fetch("SMOKE_LIVE_SET", 2_000))
83+
84+
# The workload: maintain a rolling working set of LIVE_SET objects, each
85+
# OBJECT_SIZE bytes. Each iteration allocates a new object and drops an old
86+
# one. This produces a steady stream of garbage and a predictable live-set
87+
# size, so the CPU trigger has a stable signal to converge on.
88+
89+
gc_before = GC.count
90+
t_wall_start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
91+
t_cpu_start = Process.clock_gettime(Process::CLOCK_PROCESS_CPUTIME_ID)
92+
93+
sink = Array.new(LIVE_SET) { String.new("x" * OBJECT_SIZE) }
94+
i = 0
95+
while i < ITERATIONS
96+
sink[i % LIVE_SET] = String.new("x" * OBJECT_SIZE)
97+
i += 1
98+
end
99+
100+
t_wall_end = Process.clock_gettime(Process::CLOCK_MONOTONIC)
101+
t_cpu_end = Process.clock_gettime(Process::CLOCK_PROCESS_CPUTIME_ID)
102+
gc_after = GC.count
103+
104+
wall_s = t_wall_end - t_wall_start
105+
cpu_s = t_cpu_end - t_cpu_start
106+
rss = Rusage.peak_rss_bytes
107+
108+
printf "iterations: %d (live set %d x %dB)\n", ITERATIONS, LIVE_SET, OBJECT_SIZE
109+
printf "gc cycles: %d (before=%d, after=%d)\n", (gc_after - gc_before), gc_before, gc_after
110+
printf "wall time: %.3fs\n", wall_s
111+
printf "cpu time: %.3fs (%.1f%% of wall)\n", cpu_s, (cpu_s / wall_s) * 100.0
112+
printf "peak rss: %.1f MiB (%d bytes)\n", rss / 1024.0 / 1024.0, rss
113+
puts "OK"

0 commit comments

Comments
 (0)