A reference reading of wren_lift/src/runtime/gc*.rs (and the
interpreter / VM glue that drives it), focused on the parts Zyntax
should adopt as its first opt-in GC menu entry. Not a plan —
descriptive only. The plan happens after we agree on which pieces
to lift.
Source paths in this document are relative to
/Users/amaterasu/Vibranium/wren_lift/. Zyntax-side analogues are
relative to the Zyntax workspace root.
The session's earlier note ("wren_lift uses generational immix") is
wrong. There is no immix anywhere in wren_lift — no code,
docs, reference dir, or git history mentions it. The default is a
generational collector with a bump-pointer copying nursery and a
mark-sweep old gen (src/runtime/gc.rs:1). The body of this
survey reflects the actual algorithm.
That said, the menu architecture the user described — multiple GC algorithms behind a trait, runtime-selectable — is exactly what wren_lift ships. That pattern is the highest-leverage thing to lift, independent of which algorithms Zyntax eventually offers.
src/runtime/gc_trait.rs defines:
pub trait GcAllocator {
fn alloc_string(...) -> *mut ObjString;
fn alloc_list(...) -> *mut ObjList;
// ... one per object kind ...
fn intern_string(...) -> *mut ObjString;
fn write_barrier(&mut self, source: *mut ObjHeader, value: Value);
fn collect(&mut self, roots: &mut [Value]);
fn should_collect(&self) -> bool;
fn stats(&self) -> &GcStats;
}
pub enum GcImpl {
Generational(super::gc::Gc), // default
Arena(ArenaGc), // alloc-only, free on drop
MarkSweep(MarkSweepGc), // simple stop-the-world
}
pub enum GcStrategy {
#[default] Generational,
Arena,
MarkSweep,
}Dispatch is by enum-match (a gc_dispatch! macro), not trait
objects or generics. The comment at line 65-71 spells out why:
trait objects add vtable indirection on every allocation; generics
infect 30+ files with parameter pollution. Enum match becomes a
single predicted branch.
The CLI wires --gc to a GcStrategy enum at src/main.rs:223-238:
let gc_strategy = match cli.gc {
GcMode::Generational => GcStrategy::Generational,
GcMode::Arena => GcStrategy::Arena,
GcMode::MarkSweep => GcStrategy::MarkSweep,
};Lift directly: this trait+enum-dispatch shape is the right
seam for Zyntax. CompilationConfig.memory_strategy already exists;
extending its variants for opt-in GC algorithms is the natural
place to wire this.
Every heap object starts with ObjHeader (src/runtime/object.rs:127-141):
offset 0: obj_type u8 // type tag
offset 1: gc_mark u8 // 0=white, 1=gray, 2=black, 3=forwarded
offset 2: generation u8 // 0=young (nursery), 1=old
offset 3-7: padding
offset 8: next *mut // intrusive linked list (sweep walks this)
offset 16: class *mut // method dispatch
// total 24 bytes
Two GC-relevant decisions baked into this header:
- Tri-color marking inline in
gc_mark.FORWARDED(value 3) is overloaded — used during minor GC, where the surviving nursery slot stores a forwarding address in place of its data and the mark byte signals "look in the forwarding field". - Intrusive linked list via
next. The old gen is just a head pointer + linked chain; sweep walks the chain and unlinks dead objects. Cheap, but precludes parallel sweep without extra bookkeeping.
Zyntax mapping: today HirType::Ptr(Opaque(name)) carries no
GC-aware metadata. To make objects GC-managed, Zyntax would need
either a HirType::Gc(...) variant or a per-type attribute that
the lowering uses to allocate via the GC trait (vs. malloc/Box).
The header layout itself is opaque to the language — the
allocator owns it; user code never reads gc_mark.
Single contiguous Vec<u8> with a bump pointer:
struct Nursery {
buffer: Vec<u8>,
alloc_ptr: usize,
}try_alloc<T>(val)aligns, bumps, returns*mut T. O(1).reset()slamsalloc_ptrback to 0 after a collection.contains(ptr)is a simple address-range check — used by the write barrier to identify young pointers.
Allocation fast path (alloc<T>, gc.rs:711-724): if the nursery
has space, bump-allocate there and push the new header into
nursery_objects: Vec<*mut ObjHeader> (so the collector can
iterate them later). On nursery overflow, fall through to direct
old-gen allocation.
Default nursery size: 64 MB, overridable via WLIFT_GC_NURSERY_MB
env var. The comment at gc.rs:238-251 notes the tension: bigger
nursery → fewer collections but larger floor (matters on small
VMs like Fly's shared-cpu-1x); smaller nursery → more frequent
collections but tighter resident set.
intrusive linked list)
Chunk-based bump allocator:
pub struct OldArena {
chunks: Vec<Vec<u8>>, // each chunk is a separately heap-allocated block
alloc_ptr: usize,
chunk_size: usize, // 256 KB initial
}Plus the parallel state in Gc:
old_objects: *mut ObjHeader, // head of intrusive linked list
old_count: usize,
old_arena: OldArena,
remembered_set: Vec<*mut ObjHeader>, // old objs with young pointers
intern_table: HashMap<u64, Vec<*mut ObjString>>,Each new chunk is 256 KB. Pointers into earlier chunks stay valid
because the Vec<Vec<u8>> never reallocates the inner chunks
(only adds new ones). Sweep walks old_objects and reclaims
unmarked entries; chunks that end up entirely empty get released
back to the OS (release_empty_old_chunks, gc.rs:940).
A few decisions worth flagging:
-
alloc_classalways goes to old gen (gc.rs:653-661):// ObjClass instances live for the program's lifetime and // get baked into JIT class-check constants. Allocate them // straight to the old arena so the pointer is stable for // the lifetime of any JIT'd code that references it — a // nursery-then-promoted class would invalidate every // class-check that captured the old address.
This is a JIT-stability constraint baked into the GC. Zyntax has the same problem: any address baked into wasm-JIT'd code must not move. Zyntax's
register_static_pluginsis the natural analogue — the symbol pointers are similarly stable for-life and must skip nursery promotion. -
alloc_instanceallocates both header + fields in nursery in one shot (gc.rs:665-699). Falls back to old gen if the combined size doesn't fit. This avoids fragmenting a struct across nursery + old gen on the first cycle. -
String interning is hash-keyed (
intern_table: HashMap<u64, Vec<*mut ObjString>>). Collisions resolved by linear scan of the bucket's value list, comparing actual bytes. Interned strings still participate in normal GC — the table itself gets updated to track forwarded addresses during minor GC.
Multi-trigger. Fires when ANY of:
- Nursery used > 75% of capacity — primary trigger for short-lived churn.
- Nursery used > 64 MB absolute ceiling — caps idle floor on long-running event loops where the nursery would otherwise sit at hundreds of MB of dead objects.
- External bytes charged via
track_externalexceed 32 MB. Used byFiber.newto charge krio mmap stacks that don't show up in nursery accounting. Without this, a web server's RSS climbs unbounded per request. - Old gen count ≥
gc_threshold(starts at 256, grows byheap_grow_factor: f64 = 2.0per cycle). - Wall-clock heartbeat: 30 s since last major + any allocation since. Catches drip-feed workloads (favicon polls, prefetch hits) where neither byte threshold trips for minutes.
The 5-way OR is the kind of "every axis I've seen go wrong in production" heuristic that comes from running this in real deployments. Worth importing wholesale into Zyntax with the same env-var override for the nursery size.
Precise post-write barrier. Generational invariant: anytime an
old object stores a pointer to a young object, that old object
joins the remembered_set.
pub fn write_barrier(&mut self, source: *mut ObjHeader, value: Value) {
if !value.is_object() || source.is_null() {
return;
}
unsafe {
if (*source).generation != GEN_OLD {
return;
}
if let Some(obj_ptr) = value.as_object() {
let target = obj_ptr as *mut ObjHeader;
if !target.is_null() && (*target).generation == GEN_YOUNG {
self.remembered_set.push(source);
}
}
}
}The order of the checks is tuned for the common case: non-object
values bail before touching *source. The comment at lines 774-781
notes this was a deliberate reorder — the previous version
dereferenced source.generation first, which paid a memory
load on every store even for primitive writes.
Call sites in the interpreter (vm_interp.rs):
SetUpvalue→vm.gc.write_barrier(uv_ptr, v)(line 1506)SetStaticField→vm.gc.write_barrier(class_ptr, v)(line 1558)- Instance-field store →
vm.gc.write_barrier(ptr, val)(line 1744)
There's also a debug invariant validator at gc.rs:454-589:
validate_write_barriers() walks every old object, finds every
young reference reachable from it (by checking field-by-field
per object type), and panics if any are missing from the
remembered set. Exhaustive, expensive, debug-only — but exactly
the kind of belt-and-suspenders that catches the "I forgot to
barrier this store" class of bug at test time instead of in
production. Zyntax should adopt the same pattern.
1. Mark from roots (gray stack DFS).
2. Mark from remembered set (treat each entry as a root —
it's an old object that might reference young).
3. Process gray stack to completion.
4. process_nursery: walk nursery_objects.
For each live object, Box-promote to old gen.
Store forwarding address INLINE in the nursery slot
(gc_mark = FORWARDED, next 8 bytes = new address).
Dead objects: drop in place.
5. Update all pointers that pointed into nursery to their
forwarded addresses:
- update_roots_inline(roots, &nursery)
- update_old_gen_pointers_inline()
- update_intern_table_inline()
6. Reset nursery arena (bump pointer → 0).
7. Reset gc_mark to WHITE on all old objects (they were
marked during tracing).
The inline forwarding pointer is the clever bit: no separate
forwarding table. Step 4 writes the new address into the dead
nursery slot's header, and step 5 reads it back. The slot is
recycled when step 6 resets the arena. This works because nursery
slots are stable until reset (the underlying Vec<u8> isn't
touched until nursery.reset()).
Minor GC + sweep old gen + release empty arena chunks +
optional madvise(MADV_FREE) via wlift_alloc.
1. Mark from roots + remembered set (same as minor).
2. Promote nursery survivors (same as minor).
3. Sweep dead old-gen objects: walk old_objects, free
anything still WHITE, splice out of the linked list,
reset BLACK survivors to WHITE.
4. release_empty_old_chunks: free any OldArena chunk
that contains no surviving object. Critical for
long-running servers — without it, a chunk that
ever held a now-dead object stays allocated forever.
5. wlift_alloc::pressure_release() — calls madvise to
hand free pages back to the OS so process RSS
actually drops.
The combination of (3) (linked-list traversal sweep), (4) (chunk reclamation), and (5) (OS page release) is what makes the steady-state RSS track the working set rather than the high-water mark. Steps 4-5 are an operational detail that matters disproportionately for hosted environments (containers, serverless) but is invisible from the algorithm description.
let external_pressure = self.external_bytes_since_gc > 4 * 1024 * 1024;
if external_pressure || self.minor_since_major >= self.config.major_gc_interval {
self.collect_major(roots);
} else {
self.collect_minor(roots);
}- Force major when off-heap pressure exceeds 4 MB (about 16 krio fibers' worth of mmap stacks). Without this, dead fibers promoted by an earlier minor sweep accumulate in old gen.
- Otherwise, every 8th collection (
major_gc_interval: u32 = 8) is a major.
Two safepoint mechanisms run in parallel:
Inside the bytecode dispatch loop, every check_interval steps
(0xF / 0xFF / 0xFFF depending on step_limit config):
if steps & check_interval == 0 {
if vm.gc.should_collect() || vm.gc_requested {
// 1. Stash live registers back into frame.values
// so the GC can scan them as roots.
if let Some(frame) = (*fiber).mir_frames.last_mut() {
frame.values = std::mem::take(&mut values);
frame.pc = pc;
}
// 2. Trigger collection.
vm.collect_garbage();
vm.method_cache.invalidate();
vm.engine.invalidate_inline_caches();
// 3. Restore registers from (possibly forwarded) frame.
if let Some(frame) = (*fiber).mir_frames.last_mut() {
values = std::mem::take(&mut frame.values);
}
}
}The stash-around-collect dance is the crucial pattern: live interpreter registers go back into the frame structure before GC runs (so they're discoverable as roots) and come back out after. Without this, registers that were "live in a CPU register but not yet written back to memory" would be invisible to the collector.
The check is gated on steps & check_interval == 0 — a single AND
and a branch, hottest path stays fast.
JIT'd functions don't go through the interpreter dispatch loop, so the safepoint there isn't enough. The JIT lowering emits stack-map metadata at every potential GC point (call sites, loop back-edges, allocation sites):
struct LiveRootMetadata {
location: RootLocation, // e.g. Spill(offset)
// …other metadata
}
struct SafepointMetadata {
code_offset: u32, // PC offset within the function
live_roots: Vec<LiveRootMetadata>,
}At GC time, scan_native_stack_roots() does:
- Pass 1 — walk explicitly-pushed JIT frame list (each
wren_call_Nboundary pushed(fp, func_id, ret_addr)). For each frame, find the active safepoint by matchingret_addr - code_range.startagainstsafepoint.code_offset, then scan only the spill slots listed insafepoint.live_roots. - Pass 2 — aarch64 frame-pointer chain walking, for AOT bodies that don't push JIT frames (entered straight from the linker's main).
Precise scanning: the safepoint stack map tells the GC exactly which slots hold object pointers at that PC. The fallback is to skip scanning if no match found — wren_lift deliberately doesn't fall back to conservative (full-stack scan), because that would scan non-object spill slots and corrupt them.
Zyntax mapping: the BC interpreter pattern (4.1) ports
directly — Zyntax's HirInterpreter already has a dispatch loop
that could host a safepoint check at a similar interval. The
JIT stack-map machinery (4.2) is more involved; the wasm backend
would need to emit safepoint metadata alongside the wasm bytes,
and the host glue would need to know how to discover live spill
slots in wasm linear memory. Probably a later phase — start with
interpreter-only safepoints.
Three sources of roots, all consolidated when the safepoint fires:
- Fiber stacks — every
ObjFibercarriesstack: Vec<Value>and a chain ofmir_frames, each with its ownvalues: Vec<Value>register file andclosure/defining_classpointers. - JIT spill slots — discovered via the stack-map walk in 4.2.
- Global module variables —
ObjModulecarriesvariables: Vec<Value>; all loaded modules are roots.
The fiber set is discovered via GcImpl::for_each_fiber (gc_trait.rs:229),
which goes through the GC's own object list rather than a side
table — that way every fiber the closure sees is, by definition,
currently allocated. (The comment at gc_trait.rs:223-228 calls
this out as anti-dangling-pointer hygiene.)
Zyntax mapping: the natural analogue is the HirInterpreter's
register VM state — CompiledFunction register files, the value
stack, the symbol table. Plus krio-fiber stacks for async (which
Zyntax already shares with wren_lift via the krio adapter).
wren_lift ships two additional variants. Neither is being lifted into Zyntax. Recorded here only for reference.
"Allocates, never collects." All objects are Box::into_raw-ed
and tracked in a Vec<*mut ObjHeader>; the whole arena is freed
when the ArenaGc drops. O(1) alloc, zero pause time,
unbounded memory growth.
Excluded from Zyntax's menu by user decision. Even with docs warning "only for short-lived workloads," exposing it lets a user pick it for a long-running embedder and OOM. The menu shouldn't ship foot-guns.
Simple non-generational stop-the-world. All objects in a single
Vec, threshold-triggered mark phase + sweep phase. No nursery,
no promotion, no write barriers needed. Pause time proportional
to total heap; no compaction.
Not part of the first-lift scope. The user has scoped initial GC work to just the generational variant (§3 and §7). MarkSweep might join the menu later — its main appeal as a baseline / no-barrier fallback doesn't outweigh the cost of building it now when the generational collector is the only one being shipped first.
This section identifies the specific files / line ranges where the integration would land. Not a plan; a map.
| Concept | wren_lift | Zyntax analogue |
|---|---|---|
| Strategy enum | runtime/gc_trait.rs::GcStrategy |
compiler/src/lib.rs::MemoryStrategy (already exists, opt: Option<MemoryStrategy>) |
| Enum dispatch | GcImpl + gc_dispatch! macro |
New: compiler/src/gc/mod.rs::GcImpl (parallel structure) |
| Trait | GcAllocator |
New: compiler/src/gc/trait.rs::GcAllocator |
| CLI surface | main.rs::cli.gc |
zyntax_cli --memory-strategy flag (extend existing) |
HirType::Gc(...) or HirType::HeapPtr(Opaque(...)) would be the
likely shape. The lowering's Alloc/Box HIR ops (today they
emit malloc + free at drop sites under the default
speculative-drop strategy) would route to gc.alloc_* instead
when the strategy is GC-enabled.
Header layout itself sits in compiler/src/gc/object.rs — modeled
after wren_lift/src/runtime/object.rs:127-141, adapted for
Zyntax's actual object kinds (no ObjFn / ObjClass per se; we
have HirFunction id references, struct/enum instances,
List / Map / String stdlib types).
| wren_lift | Zyntax |
|---|---|
vm_interp.rs:1330-1373 (steps & check_interval == 0) |
crates/compiler/src/hir_interp.rs dispatch loop — same shape, same gc.should_collect() predicate |
| Stash registers back into frame before GC | HirInterpreter register file needs to be discoverable from outside the inner loop — minor refactor |
| Stack map for JIT spills | Phase out — the wasm backend would need a SafepointMetadata emission pass, deferred |
In Zyntax, every HirInstruction::Store with a heap-pointer
target needs to be barrier-aware when the GC strategy is
generational. Two ways:
- Lowering pass: a transformation that walks the HIR after
the SSA build, inserts
Call(Symbol("gc_write_barrier"), [src, val])before each Store whose target is heap-typed. Cleanly separated from the rest of lowering, and naturally disabled under non-GC strategies (default speculative-drop, future non-generational variants) by skipping the pass entirely. - Cranelift/Wasm backend hook: emit the barrier inline at Store sites. Faster (no extra HIR instruction, no extra regalloc pressure) but couples the backends to GC details.
Option 1 is the cleaner first pass. The barrier-as-Symbol-call also makes it cheap to A/B different barrier implementations (remembered set, card marking, etc.) without backend changes.
Lift the 5-trigger predicate from gc.rs:404-450 essentially
verbatim. Constants (75%, 64 MB, 32 MB, gc_threshold growth
factor, 30 s heartbeat) all become GcConfig fields with the
same env-var overrides. Don't try to invent something better
on first pass — these values were tuned against real wren_lift
deployments.
Zyntax-specific. Sources:
- HirInterpreter register files (
CompiledFunction.regs, the value stack). - HirInterpreter symbol table (
HirInterpreter::symbol_table_snapshot, the FFI registered symbols — these are weak roots: the underlying function pointer doesn't need GC tracing because it's a code address, but any boxed receiver state stored alongside might). - Global / module statics if/when Zyntax adds module-level state.
- Krio fiber stacks when async is in use. The krio adapter
already exposes fiber discovery; same pattern as
GcImpl::for_each_fiber.
Port gc.rs::validate_write_barriers (#[cfg(debug_assertions)],
exhaustive young-pointer walk per old object). High debugging
value, zero release cost.
- Concurrent / parallel GC. wren_lift is single-threaded stop-the-world. The interpreter loop stops, the collector runs, the loop resumes. For Zyntax's first opt-in GC entry, same constraint is fine — concurrent collection is a future menu entry of its own.
- Generational immix (mark-region). Not in wren_lift, not on this survey. A separate study if/when that algorithm joins the menu.
- The wasm-JIT path. Zyntax's wasm tier-1 emitter would need safepoint stack maps to fully integrate; the BC interpreter is the proper first integration target.
- Off-heap accounting (
track_external). Important once Zyntax has krio fibers in scope, but not on the critical path for a minimum-viable GC menu entry.
wren_lift/src/runtime/gc.rs— default GC (2289 LOC)wren_lift/src/runtime/gc_arena.rs— Arena variant (274 LOC)wren_lift/src/runtime/gc_marksweep.rs— MarkSweep variant (532 LOC)wren_lift/src/runtime/gc_trait.rs— trait + enum dispatch (317 LOC)wren_lift/src/runtime/object.rs:127-141— ObjHeader layoutwren_lift/src/runtime/vm_interp.rs:1312-1373— interpreter safepointwren_lift/src/runtime/vm.rs:2915-2984— JIT precise stack scanningwren_lift/src/main.rs:223-238— CLI strategy plumbing