This document is a practical, action-oriented reference for converting the NeoVM
codebase from the ObjId-based Value enum to the TaggedValue tagged-pointer
system. Every section gives before and after code, the mechanical rule,
and known gotchas.
- Architecture Overview
- Value Type Rename
- Constructor Conversions
- Pattern Matching Conversions
- Accessor Conversions
- Type Check Conversions
- Heap Allocation Changes
- Exhaustive Match via ValueKind
- Eliminated Concepts
- GC and Rooting Changes
- File-by-File Conversion Order
- HashKey and Equality
- Common Pitfalls
Value (enum, 16+ bytes) ObjId (8 bytes)
├── Nil ├── index: u32
├── True └── generation: u32
├── Int(i64)
├── Float(f64, u32) LispHeap
├── Symbol(SymId) ├── objects: Vec<HeapObject>
├── Keyword(SymId) ├── generations: Vec<u32>
├── Char(char) ├── marks: Vec<bool>
├── Subr(SymId) └── free_list: Vec<u32>
├── Str(ObjId) ──────►
├── Cons(ObjId) ──────► Thread-local access:
├── Vector(ObjId) ──────► with_heap(|h| ...)
├── Record(ObjId) ──────► with_heap_mut(|h| ...)
├── HashTable(ObjId) ──────►
├── Lambda(ObjId) ──────►
├── Macro(ObjId) ──────►
├── ByteCode(ObjId) ──────►
├── Marker(ObjId) ──────►
├── Overlay(ObjId) ──────►
├── Buffer(BufferId)
├── Window(u64)
├── Frame(u64)
└── Timer(u64)
TaggedValue (usize, 8 bytes on 64-bit)
Low 3 bits = tag
┌────────────────────────────────────────────────────────────────┐
│ Tag Type Payload Fast check │
│ 000 Symbol sym_index << 3 (v & 7) == 0 │
│ x01 Fixnum integer << 2 (v & 3) == 1 │
│ 010 Cons pointer | 2 (v & 7) == 2 │
│ 011 Veclike pointer | 3 (v & 7) == 3 │
│ 100 String pointer | 4 (v & 7) == 4 │
│ 110 Float pointer | 6 (v & 7) == 6 │
│ 111 Immediate sub-tag in bits 3-7 (v & 7) == 7 │
│ ├── 00000 Char (21-bit codepoint in bits 8+) │
│ ├── 00001 Keyword (SymId in bits 8+) │
│ └── 00010 Subr (SymId in bits 8+) │
└────────────────────────────────────────────────────────────────┘
Veclike sub-types (VecLikeType in VecLikeHeader):
Vector, HashTable, Lambda, Macro, ByteCode,
Record, Overlay, Marker, Buffer, Window, Frame, Timer
Heap objects:
ConsCell — 16 bytes, no header, block-allocated
StringObj — GcHeader + LispString
FloatObj — GcHeader + f64
VectorObj — VecLikeHeader + Vec<TaggedValue>
HashTableObj — VecLikeHeader + LispHashTable
LambdaObj — VecLikeHeader + LambdaData
MacroObj — VecLikeHeader + LambdaData
ByteCodeObj — VecLikeHeader + ByteCodeFunction
RecordObj — VecLikeHeader + Vec<TaggedValue>
OverlayObj — VecLikeHeader + OverlayData
MarkerObj — VecLikeHeader + MarkerData
BufferObj — VecLikeHeader + BufferId
WindowObj — VecLikeHeader + u64
FrameObj — VecLikeHeader + u64
TimerObj — VecLikeHeader + u64
Thread-local access:
with_tagged_heap(|h| ...) // replaces with_heap / with_heap_mut
Key differences:
- 8 bytes per value instead of 16+ bytes (enum discriminant + payload).
- Heap types accessed via direct pointer dereference -- no
ObjIdlookup. - Floats are heap-allocated (no inline
f64); fixnums cover 62-bit range. Buffer,Window,Frame,Timerbecome veclike sub-types with heap allocation, unifying all heap types under the same GC model.- No generation counter. No stale-handle panics. Dangling pointer = UB (prevented by correct GC rooting).
The type alias strategy: during migration, Value can be aliased to
TaggedValue so most call sites don't need renaming.
// In value.rs (after migration):
pub type Value = TaggedValue;Files that reference Value directly generally need no change beyond ensuring
the import resolves to TaggedValue.
// BEFORE
Value::Nil
let x = Value::Nil;
// AFTER
Value::NIL // associated constant, not a variant
let x = Value::NIL;Rule: Find-and-replace Value::Nil with Value::NIL.
Watch out for pattern match arms -- those change differently (see Section 4).
// BEFORE
Value::True
Value::t()
// AFTER
Value::T // associated constant
Value::T // .t() method still exists as compat aliasRule: Replace Value::True with Value::T.
Value::t() still works but is just Self::T.
// BEFORE
Value::Int(42)
Value::Int(n) // constructor from variable
// AFTER
Value::fixnum(42)
Value::fixnum(n)Rule: Replace Value::Int(expr) constructors with Value::fixnum(expr).
Note: old Value::Int was i64. New fixnum() is also i64 but with 62-bit
range. Values outside [MOST_NEGATIVE_FIXNUM, MOST_POSITIVE_FIXNUM] will
silently truncate via bit shifting -- add range checks if needed.
// BEFORE
Value::Float(3.14, next_float_id())
// Often constructed as:
let f = Value::Float(val, next_float_id());
// AFTER
with_tagged_heap(|h| h.alloc_float(3.14))
// Or via a convenience method (to be added):
Value::make_float(3.14)Rule: Floats become heap-allocated. Every Value::Float(f, id) constructor
becomes a heap allocation. The next_float_id() call is eliminated entirely.
Key change: Float identity is now pointer-based (like GNU Emacs), not ID-based.
Two alloc_float(3.14) calls produce different float objects (different
pointers), matching GNU eq semantics.
// BEFORE
Value::Symbol(sym_id)
Value::symbol("defun") // constructor that interns
// AFTER
Value::symbol(sym_id) // from SymId (method, not variant)
Value::symbol_by_name("defun") // constructor that internsRule: Value::Symbol(id) (variant construction) becomes Value::symbol(id)
(method call). The Value::symbol(string) constructor is renamed to
Value::symbol_by_name(string) to avoid ambiguity.
Watch out: The old Value::symbol("name") accepted impl AsRef<str> and
also canonicalized "nil" -> Value::Nil, "t" -> Value::True, and
:keyword -> Value::Keyword. The new Value::symbol(SymId) does no such
canonicalization. Use Value::symbol_by_name() if you need the old behavior.
// BEFORE
Value::Keyword(sym_id)
Value::keyword("test")
// AFTER
Value::keyword(sym_id) // from SymId
Value::keyword_by_name("test") // from stringRule: Same pattern as Symbol.
// BEFORE
Value::Char('A')
Value::Char(c)
// AFTER
Value::char('A')
Value::char(c)Rule: Value::Char(expr) -> Value::char(expr).
// BEFORE
Value::Subr(sym_id)
// AFTER
Value::subr(sym_id)Rule: Value::Subr(id) -> Value::subr(id).
// BEFORE
Value::Str(obj_id) // direct variant construction (rare)
Value::string("hello") // constructor (common)
Value::multibyte_string("hi")
Value::unibyte_string("hi")
// AFTER
// Value::string("hello") -- will call with_tagged_heap internally
// The low-level variant construction is eliminated entirely.
// String values come only from heap allocation:
with_tagged_heap(|h| h.alloc_string(LispString::new(s, multibyte)))The high-level constructors (Value::string(), Value::multibyte_string(),
etc.) will be reimplemented as methods on TaggedValue that call
with_tagged_heap(|h| h.alloc_string(...)) internally.
// BEFORE
Value::Cons(obj_id) // direct variant construction (rare)
Value::cons(car, cdr) // constructor (very common)
Value::list(vec![a, b, c])
// AFTER
with_tagged_heap(|h| h.alloc_cons(car, cdr))
// The Value::cons() convenience wrapper will be reimplemented:
// pub fn cons(car: Value, cdr: Value) -> Value {
// with_tagged_heap(|h| h.alloc_cons(car, cdr))
// }
// Value::list() continues to work (calls cons() in a fold).// BEFORE
Value::Vector(obj_id)
Value::Record(obj_id)
Value::vector(vec![a, b])
// AFTER
with_tagged_heap(|h| h.alloc_vector(items)) // returns TaggedValue
with_tagged_heap(|h| h.alloc_record(items))Both Vector and Record become veclike sub-types. Their TaggedValue has the
same 011 tag; the difference is VecLikeType::Vector vs VecLikeType::Record
in the header.
// BEFORE
Value::HashTable(obj_id)
Value::Lambda(obj_id)
Value::Macro(obj_id)
Value::ByteCode(obj_id)
// AFTER -- all become veclike allocations
with_tagged_heap(|h| h.alloc_hash_table(ht))
with_tagged_heap(|h| h.alloc_lambda(data))
with_tagged_heap(|h| h.alloc_macro(data))
with_tagged_heap(|h| h.alloc_bytecode(data))High-level convenience methods on TaggedValue (e.g., Value::make_lambda())
will wrap these.
// BEFORE
Value::Buffer(buffer_id)
Value::Window(id)
Value::Frame(id)
Value::Timer(id)
// AFTER -- become veclike sub-types
with_tagged_heap(|h| h.alloc_buffer(buffer_id))
with_tagged_heap(|h| h.alloc_window(id))
with_tagged_heap(|h| h.alloc_frame(id))
with_tagged_heap(|h| h.alloc_timer(id))These were previously inline in the Value enum (no heap allocation). Now they are heap-allocated veclike objects. This is a semantic change -- each creation allocates. Callers that previously constructed these cheaply may need caching.
This is the largest category of changes. The old Value enum allowed direct
match with variant destructuring. The new TaggedValue is an opaque usize
-- you must use accessor methods or value.kind().
// BEFORE
match value {
Value::Cons(id) => {
let car = with_heap(|h| h.cons_car(id));
// ...
}
_ => {}
}
// AFTER (preferred: use predicate + accessor)
if value.is_cons() {
let car = value.cons_car();
// ...
}// BEFORE
match value {
Value::Int(n) => do_something(n),
_ => {}
}
// AFTER
if let Some(n) = value.as_fixnum() {
do_something(n);
}// BEFORE
match value {
Value::Symbol(id) => use_sym(id),
_ => {}
}
// AFTER
if let Some(id) = value.as_symbol_id() {
use_sym(id);
}// BEFORE
match value {
Value::Float(f, _id) => use_float(f),
_ => {}
}
// AFTER
if let Some(f) = value.as_float() {
use_float(f);
}// BEFORE
match value {
Value::Char(c) => use_char(c),
_ => {}
}
// AFTER
if let Some(c) = value.as_char() {
use_char(c);
}// BEFORE
match value {
Value::Keyword(id) => use_kw(id),
_ => {}
}
// AFTER
if let Some(id) = value.as_keyword_id() {
use_kw(id);
}// BEFORE
match value {
Value::Subr(id) => use_subr(id),
_ => {}
}
// AFTER
if let Some(id) = value.as_subr_id() {
use_subr(id);
}For exhaustive or multi-arm matches, use value.kind():
// BEFORE
match value {
Value::Nil => handle_nil(),
Value::True => handle_t(),
Value::Int(n) => handle_int(n),
Value::Float(f, _) => handle_float(f),
Value::Symbol(id) => handle_sym(id),
Value::Keyword(id) => handle_kw(id),
Value::Char(c) => handle_char(c),
Value::Subr(id) => handle_subr(id),
Value::Cons(_) => handle_cons(value),
Value::Str(_) => handle_str(value),
Value::Vector(_) => handle_vec(value),
Value::Record(_) => handle_rec(value),
Value::HashTable(_) => handle_ht(value),
Value::Lambda(_) => handle_lambda(value),
Value::Macro(_) => handle_macro(value),
Value::ByteCode(_) => handle_bc(value),
Value::Marker(_) => handle_marker(value),
Value::Overlay(_) => handle_overlay(value),
Value::Buffer(_) => handle_buffer(value),
Value::Window(_) => handle_window(value),
Value::Frame(_) => handle_frame(value),
Value::Timer(_) => handle_timer(value),
}
// AFTER
match value.kind() {
ValueKind::Nil => handle_nil(),
ValueKind::T => handle_t(),
ValueKind::Fixnum(n) => handle_int(n),
ValueKind::Float => handle_float(value.xfloat()),
ValueKind::Symbol(id) => handle_sym(id),
ValueKind::Keyword(id) => handle_kw(id),
ValueKind::Char(c) => handle_char(c),
ValueKind::Subr(id) => handle_subr(id),
ValueKind::Cons => handle_cons(value),
ValueKind::String => handle_str(value),
ValueKind::Veclike(VecLikeType::Vector) => handle_vec(value),
ValueKind::Veclike(VecLikeType::Record) => handle_rec(value),
ValueKind::Veclike(VecLikeType::HashTable) => handle_ht(value),
ValueKind::Veclike(VecLikeType::Lambda) => handle_lambda(value),
ValueKind::Veclike(VecLikeType::Macro) => handle_macro(value),
ValueKind::Veclike(VecLikeType::ByteCode) => handle_bc(value),
ValueKind::Veclike(VecLikeType::Marker) => handle_marker(value),
ValueKind::Veclike(VecLikeType::Overlay) => handle_overlay(value),
ValueKind::Veclike(VecLikeType::Buffer) => handle_buffer(value),
ValueKind::Veclike(VecLikeType::Window) => handle_window(value),
ValueKind::Veclike(VecLikeType::Frame) => handle_frame(value),
ValueKind::Veclike(VecLikeType::Timer) => handle_timer(value),
ValueKind::Unknown => panic!("unknown value kind"),
}Key points:
ValueKind::Fixnum(n)carries the i64 value inline.ValueKind::Floatdoes NOT carry the f64 -- usevalue.xfloat()separately.ValueKind::ConsandValueKind::Stringcarry no data -- use accessors.ValueKind::Veclike(sub_type)needs a nested match for the sub-type.ValueKind::Unknownshould be unreachable in well-formed code.
// BEFORE
match value {
Value::Nil => return default,
Value::Cons(id) => {
let pair = read_cons(id);
process(pair.car, pair.cdr)
}
other => error(other),
}
// AFTER
if value.is_nil() {
return default;
} else if value.is_cons() {
let car = value.cons_car();
let cdr = value.cons_cdr();
process(car, cdr)
} else {
error(value)
}Or using kind() if you prefer match syntax:
match value.kind() {
ValueKind::Nil => return default,
ValueKind::Cons => {
let car = value.cons_car();
let cdr = value.cons_cdr();
process(car, cdr)
}
_ => error(value),
}// BEFORE
if let Value::Int(n) = value { ... }
if let Value::Symbol(id) = value { ... }
if let Value::Cons(id) = value { ... }
if let Value::Str(id) = value { ... }
if let Value::Float(f, _) = value { ... }
if let Value::Char(c) = value { ... }
if let Value::Keyword(id) = value { ... }
if let Value::Subr(id) = value { ... }
// AFTER
if let Some(n) = value.as_fixnum() { ... }
if let Some(id) = value.as_symbol_id() { ... }
if value.is_cons() { ... } // no id to extract
if value.is_string() { ... } // no id to extract
if let Some(f) = value.as_float() { ... }
if let Some(c) = value.as_char() { ... }
if let Some(id) = value.as_keyword_id() { ... }
if let Some(id) = value.as_subr_id() { ... }// BEFORE
if let Value::Cons(id) = value {
let car = with_heap(|h| h.cons_car(id));
let cdr = with_heap(|h| h.cons_cdr(id));
// ...
}
// AFTER
if value.is_cons() {
let car = value.cons_car();
let cdr = value.cons_cdr();
// ...
}// BEFORE
if let Value::Cons(outer) = value {
let pair = read_cons(outer);
if let Value::Cons(inner) = pair.car {
let inner_pair = read_cons(inner);
// use inner_pair.car, inner_pair.cdr
}
}
// AFTER
if value.is_cons() {
let car = value.cons_car();
if car.is_cons() {
let inner_car = car.cons_car();
let inner_cdr = car.cons_cdr();
// use inner_car, inner_cdr
}
}// BEFORE (heap-level, with ObjId)
with_heap(|h| h.cons_car(id))
with_heap(|h| h.cons_cdr(id))
with_heap_mut(|h| h.set_car(id, val))
with_heap_mut(|h| h.set_cdr(id, val))
// BEFORE (Value-level methods)
value.cons_car()
value.cons_cdr()
value.set_car(val)
value.set_cdr(val)
// AFTER (direct pointer dereference -- same method names)
value.cons_car() // direct, no heap closure
value.cons_cdr()
value.set_car(val)
value.set_cdr(val)The method signatures are the same, but the implementation changes from
with_heap(|h| h.cons_car(*id)) to unsafe { (*self.xcons_ptr()).car }.
Most call sites need no change if they were already using the Value-level
methods.
Call sites using the with_heap pattern directly with an ObjId must be
refactored to use Value-level accessors.
// BEFORE
let pair = read_cons(id);
use_value(pair.car);
use_value(pair.cdr);
// AFTER (read_cons and ConsSnapshot are eliminated)
let car = value.cons_car();
let cdr = value.cons_cdr();
use_value(car);
use_value(cdr);If the code had let pair = read_cons(id) and then used pair.car /
pair.cdr in multiple places, just call cons_car() / cons_cdr() directly
on the tagged value.
Warning: If code was using read_cons() to snapshot both car and cdr
atomically before mutation, the new code must also read both before mutating:
// BEFORE
let pair = read_cons(id); // snapshot
set_car(id, pair.cdr); // swap car and cdr
set_cdr(id, pair.car);
// AFTER
let car = value.cons_car();
let cdr = value.cons_cdr();
value.set_car(cdr);
value.set_cdr(car);// BEFORE
with_heap(|h| h.get_string(id)) // -> &str
with_heap(|h| h.get_lisp_string(id)) // -> &LispString
with_heap_mut(|h| h.get_string_mut(id)) // -> &mut String
with_heap(|h| h.string_is_multibyte(id)) // -> bool
value.as_str() // -> Option<&str>
// AFTER
value.as_str() // -> Option<&'static str>
// For LispString access:
unsafe { (*value.as_string_ptr().unwrap()).data.as_str() }
// For mutable string access:
unsafe { (*value.as_string_ptr().unwrap() as *mut StringObj).data.make_mut() }
// For multibyte check:
unsafe { (*value.as_string_ptr().unwrap()).data.multibyte }value.as_str() continues to work with the same signature but the
implementation changes from ObjId lookup to direct pointer dereference.
For LispString-level access (multibyte flag, slicing, mutation), you dereference
the StringObj pointer directly. Consider adding convenience methods:
impl TaggedValue {
pub fn as_lisp_string(&self) -> Option<&LispString> {
self.as_string_ptr().map(|p| unsafe { &(*p).data })
}
pub fn string_is_multibyte(&self) -> bool {
self.as_lisp_string().map_or(false, |s| s.multibyte)
}
}// BEFORE
with_heap(|h| h.get_vector(id)) // -> &Vec<Value>
with_heap(|h| h.vector_ref(id, idx)) // -> Value
with_heap(|h| h.vector_len(id)) // -> usize
with_heap_mut(|h| h.vector_set(id, idx, v)) // -> ()
with_heap_mut(|h| h.get_vector_mut(id)) // -> &mut Vec<Value>
// AFTER -- cast veclike pointer to VectorObj
unsafe {
let obj = value.as_veclike_ptr().unwrap() as *const VectorObj;
let data: &Vec<TaggedValue> = &(*obj).data;
// vector_ref:
data[idx]
// vector_len:
data.len()
// vector_set (via *mut):
(*(obj as *mut VectorObj)).data[idx] = v;
}Add convenience methods on TaggedValue:
impl TaggedValue {
pub fn vector_ref(self, idx: usize) -> Self { ... }
pub fn vector_set(self, idx: usize, val: Self) { ... }
pub fn vector_len(self) -> usize { ... }
}// BEFORE
with_heap(|h| h.get_hash_table(id)) // -> &LispHashTable
with_heap_mut(|h| h.get_hash_table_mut(id)) // -> &mut LispHashTable
// AFTER
unsafe {
let obj = value.as_veclike_ptr().unwrap() as *const HashTableObj;
let ht: &LispHashTable = &(*obj).table;
// or mutable:
let ht: &mut LispHashTable = &mut (*(obj as *mut HashTableObj)).table;
}// BEFORE
with_heap(|h| h.get_lambda(id)) // -> &LambdaData
with_heap(|h| h.get_macro_data(id)) // -> &LambdaData
value.get_lambda_data() // -> Option<&LambdaData>
// AFTER
unsafe {
let obj = value.as_veclike_ptr().unwrap() as *const LambdaObj;
let data: &LambdaData = &(*obj).data;
}
// Or for macros:
unsafe {
let obj = value.as_veclike_ptr().unwrap() as *const MacroObj;
let data: &LambdaData = &(*obj).data;
}// BEFORE
with_heap(|h| h.get_bytecode(id)) // -> &ByteCodeFunction
value.get_bytecode_data() // -> Option<&ByteCodeFunction>
// AFTER
unsafe {
let obj = value.as_veclike_ptr().unwrap() as *const ByteCodeObj;
let data: &ByteCodeFunction = &(*obj).data;
}// BEFORE
with_heap(|h| h.get_marker(id)) // -> &MarkerData
with_heap(|h| h.get_overlay(id)) // -> &OverlayData
// AFTER
unsafe {
let obj = value.as_veclike_ptr().unwrap() as *const MarkerObj;
&(*obj).data
}
// Similarly for overlay:
unsafe {
let obj = value.as_veclike_ptr().unwrap() as *const OverlayObj;
&(*obj).data
}// BEFORE
if let Some(id) = value.str_id() {
set_string_text_properties(id, runs);
}
// AFTER
// String identity is now pointer-based. Text property storage
// must be keyed by pointer address or by the value itself:
if value.is_string() {
let ptr = value.as_string_ptr().unwrap() as usize;
set_string_text_properties_by_ptr(ptr, runs);
}The str_id() -> Option<ObjId> method is eliminated because there is no ObjId.
Any code that used ObjId as a map key for string properties must switch to using
the raw pointer value (obtainable via value.bits() or as_string_ptr()).
// BEFORE // AFTER
matches!(value, Value::Nil) value.is_nil()
matches!(value, Value::True) value.is_t()
matches!(value, Value::Cons(_)) value.is_cons()
matches!(value, Value::Str(_)) value.is_string()
matches!(value, Value::Int(_)) value.is_fixnum()
matches!(value, Value::Float(_, _)) value.is_float()
matches!(value, Value::Symbol(_)) value.is_symbol()
matches!(value, Value::Keyword(_)) value.is_keyword()
matches!(value, Value::Char(_)) value.is_char()
matches!(value, Value::Subr(_)) value.is_subr()
matches!(value, Value::Vector(_)) value.is_vector()
matches!(value, Value::Record(_)) value.is_record()
matches!(value, Value::HashTable(_)) value.is_hash_table()
matches!(value, Value::Lambda(_)) value.veclike_type() == Some(VecLikeType::Lambda)
matches!(value, Value::Macro(_)) value.veclike_type() == Some(VecLikeType::Macro)
matches!(value, Value::ByteCode(_)) value.veclike_type() == Some(VecLikeType::ByteCode)
matches!(value, Value::Marker(_)) value.veclike_type() == Some(VecLikeType::Marker)
matches!(value, Value::Overlay(_)) value.veclike_type() == Some(VecLikeType::Overlay)
matches!(value, Value::Buffer(_)) value.veclike_type() == Some(VecLikeType::Buffer)
matches!(value, Value::Window(_)) value.veclike_type() == Some(VecLikeType::Window)
matches!(value, Value::Frame(_)) value.veclike_type() == Some(VecLikeType::Frame)
matches!(value, Value::Timer(_)) value.veclike_type() == Some(VecLikeType::Timer)
matches!(value, Value::Nil | Value::Cons(_)) value.is_list()For the veclike sub-types used frequently, add convenience predicates:
impl TaggedValue {
pub fn is_lambda(self) -> bool {
self.veclike_type() == Some(VecLikeType::Lambda)
}
pub fn is_macro(self) -> bool {
self.veclike_type() == Some(VecLikeType::Macro)
}
pub fn is_bytecode(self) -> bool {
self.veclike_type() == Some(VecLikeType::ByteCode)
}
pub fn is_marker(self) -> bool {
self.veclike_type() == Some(VecLikeType::Marker)
}
pub fn is_overlay(self) -> bool {
self.veclike_type() == Some(VecLikeType::Overlay)
}
pub fn is_buffer(self) -> bool {
self.veclike_type() == Some(VecLikeType::Buffer)
}
pub fn is_window(self) -> bool {
self.veclike_type() == Some(VecLikeType::Window)
}
pub fn is_frame(self) -> bool {
self.veclike_type() == Some(VecLikeType::Frame)
}
pub fn is_timer(self) -> bool {
self.veclike_type() == Some(VecLikeType::Timer)
}
}// BEFORE
matches!(value, Value::Nil | Value::True | Value::Symbol(_) | Value::Keyword(_))
// AFTER (value.is_symbol() already covers Nil and True because they use TAG_SYMBOL)
// BUT: Keyword uses TAG_IMMEDIATE, not TAG_SYMBOL!
// So the correct conversion depends on intent:
// "is this value a symbol in the Elisp sense?" (nil, t, interned symbols, keywords)
value.is_symbol() || value.is_keyword() // nil/t are symbols; keywords are separate
// "is this a function?"
// BEFORE: matches!(value, Value::Lambda(_) | Value::Subr(_) | Value::ByteCode(_))
// AFTER:
value.is_function() // covers subr + lambda + bytecodeImportant tag difference: In the old system, Value::Nil and Value::True
were separate enum variants. In the new system, nil is Symbol(0) and t is
Symbol(1) -- both have tag 000. So value.is_symbol() returns true for
nil and t. If you need to exclude nil/t, check explicitly:
// "is this a non-nil, non-t symbol?"
value.is_symbol() && !value.is_nil() && !value.is_t()// BEFORE
let id = with_heap_mut(|heap| heap.alloc_cons(car, cdr));
let value = Value::Cons(id);
// AFTER
let value = with_tagged_heap(|h| h.alloc_cons(car, cdr));
// alloc_cons returns TaggedValue directly, no separate id+wrapping stepThe old Value::cons(), Value::string(), Value::vector(), etc. called
with_heap_mut internally. The new versions call with_tagged_heap internally.
Call sites using these convenience methods need no change beyond the type rename.
// These continue to work (implementation changes, API doesn't):
Value::cons(car, cdr)
Value::string("hello")
Value::list(vec![a, b, c])
Value::vector(vec![a, b])
Value::make_lambda(data)
Value::make_bytecode(bc)
Value::hash_table(HashTableTest::Eq)// BEFORE
set_current_heap(&mut heap); // sets CURRENT_HEAP thread-local
with_heap(|h| ...); // reads via CURRENT_HEAP
with_heap_mut(|h| ...); // writes via CURRENT_HEAP
clear_current_heap();
// AFTER
set_tagged_heap(&mut heap); // sets TAGGED_HEAP thread-local
with_tagged_heap(|h| ...); // unified read/write (always &mut)with_heap and with_heap_mut are eliminated. There is only
with_tagged_heap which always gives &mut TaggedHeap (needed because
allocation mutates the heap and may trigger GC).
When you need to handle all possible value types, use value.kind():
match value.kind() {
// === Immediates (no heap allocation) ===
ValueKind::Nil => { /* nil */ }
ValueKind::T => { /* t */ }
ValueKind::Fixnum(n) => { /* integer n */ }
ValueKind::Symbol(id) => { /* interned symbol */ }
ValueKind::Char(c) => { /* unicode char */ }
ValueKind::Keyword(id) => { /* keyword */ }
ValueKind::Subr(id) => { /* builtin function */ }
// === Heap-allocated (direct pointer) ===
ValueKind::Cons => {
let car = value.cons_car();
let cdr = value.cons_cdr();
}
ValueKind::String => {
let s = value.as_str().unwrap();
}
ValueKind::Float => {
let f = value.xfloat();
}
// === Veclike sub-types ===
ValueKind::Veclike(vt) => match vt {
VecLikeType::Vector => { /* vector */ }
VecLikeType::Record => { /* record */ }
VecLikeType::HashTable => { /* hash table */ }
VecLikeType::Lambda => { /* closure */ }
VecLikeType::Macro => { /* macro */ }
VecLikeType::ByteCode => { /* bytecode function */ }
VecLikeType::Overlay => { /* overlay */ }
VecLikeType::Marker => { /* marker */ }
VecLikeType::Buffer => { /* buffer ref */ }
VecLikeType::Window => { /* window ref */ }
VecLikeType::Frame => { /* frame ref */ }
VecLikeType::Timer => { /* timer ref */ }
},
ValueKind::Unknown => unreachable!("corrupt tagged value"),
}Performance note: value.kind() decodes the tag into an enum. For hot
paths, prefer the direct predicate methods (is_cons(), is_fixnum(), etc.)
which compile to a single AND + CMP instruction. Use kind() only when you
actually need multi-way dispatch.
The following types/functions/patterns are completely removed in the tagged pointer system:
// ELIMINATED
pub struct ObjId {
index: u32,
generation: u32,
}Every use of ObjId is replaced by either a TaggedValue (which embeds the
heap pointer in its bits) or a raw pointer to a heap object.
// ELIMINATED
pub fn with_heap<R>(f: impl FnOnce(&LispHeap) -> R) -> R { ... }
pub fn with_heap_mut<R>(f: impl FnOnce(&mut LispHeap) -> R) -> R { ... }Replaced by with_tagged_heap(|h| ...) for allocation operations, and by
direct pointer dereference for accessors (cons_car, as_str, etc.).
// ELIMINATED
pub struct ConsSnapshot { pub car: Value, pub cdr: Value }
pub fn read_cons(id: ObjId) -> ConsSnapshot { ... }Use value.cons_car() and value.cons_cdr() directly.
Occurrence count: ~319 uses of read_cons() across 66 files.
// ELIMINATED
pub enum HeapObject {
Cons { car: Value, cdr: Value },
Vector(Vec<Value>),
HashTable(LispHashTable),
Str(LispString),
Lambda(LambdaData),
Macro(LambdaData),
ByteCode(ByteCodeFunction),
Overlay(OverlayData),
Marker(MarkerData),
Free,
}Replaced by concrete struct types (ConsCell, StringObj, FloatObj,
VectorObj, HashTableObj, LambdaObj, MacroObj, ByteCodeObj,
RecordObj, OverlayObj, MarkerObj, BufferObj, WindowObj, FrameObj,
TimerObj) that live at stable memory addresses.
// ELIMINATED
pub struct LispHeap {
objects: Vec<HeapObject>,
generations: Vec<u32>,
marks: Vec<bool>,
free_list: Vec<u32>,
...
}Replaced by TaggedHeap which uses:
- Block allocator for cons cells (external mark bitmap).
- Intrusive linked list (
GcHeader.next) for all other heap objects. - System allocator (
Box) for individual non-cons objects.
// ELIMINATED
fn check(&self, id: ObjId) {
// Panic if generation mismatch (stale handle)
}No generation counters exist. Correctness relies on the GC not collecting reachable objects. Use-after-free is undefined behavior, not a checked panic. This is the same model as GNU Emacs.
// ELIMINATED
pub fn next_float_id() -> u32 { ... }
Value::Float(f64, u32) // the u32 was the alloc IDFloat identity is now pointer-based. Each alloc_float() returns a unique
pointer. Two floats with the same f64 value are eq only if they are the
same allocation (same pointer).
// ELIMINATED (in LispHeap)
fn value_objid(val: &Value) -> Option<ObjId> { ... }
fn push_value_ids(val: &Value, worklist: &mut Vec<ObjId>) { ... }The GC now works directly with TaggedValue. Tracing checks
value.is_heap_object() and follows the pointer directly.
// BEFORE (ObjId-based)
fn push_value_ids(val: &Value, worklist: &mut Vec<ObjId>) {
match val {
Value::Cons(id) | Value::Vector(id) | ... => worklist.push(*id),
_ => {}
}
}
// AFTER (tagged pointer)
fn push_if_heap(val: TaggedValue, queue: &mut Vec<TaggedValue>) {
if val.is_heap_object() {
queue.push(val);
}
}The old system scanned the stack for (u32 index, u32 generation) pairs.
The new system scans for usize values with valid heap-pointer tags
(010, 011, 100, 110) that point into known heap regions.
// BEFORE
temp_roots: Vec<Value>, // in eval.rs Context
temp_roots.push(value);
temp_roots.truncate(saved_len);
// AFTER -- same concept, different inner type
temp_roots: Vec<TaggedValue>,
temp_roots.push(value);
temp_roots.truncate(saved_len);The temp_roots mechanism continues to exist for explicit GC rooting. The
only change is the element type.
// BEFORE
fn write_barrier(&mut self, id: ObjId) {
if self.gc_phase == GcPhase::Marking && self.marks[id.index] {
self.marks[id.index] = false;
self.gray_queue.push(id);
}
}
// AFTER
// For cons cells: mark the cons cell in the bitmap.
// For non-cons: use the GcHeader.marked flag.
// Write barrier behavior is the same (push to gray on mutation during mark).Recommended order (dependencies flow downward):
-
tagged/value.rs-- Add remaining convenience methods (constructor wrappers, accessor methods for veclike sub-types). -
tagged/header.rs-- Complete, no changes needed. -
tagged/gc.rs-- Complete Phase 2 TODOs:- Typed deallocation in
free_gc_object(). - Tracing for HashTable, Lambda, Macro, ByteCode, Overlay children.
- Typed deallocation in
-
gc/types.rs-- KeepLispString,OverlayData,MarkerData. RemoveObjId,HeapObject. -
gc/heap.rs-- RemoveLispHeapentirely (replaced byTaggedHeap). -
gc/mod.rs-- Re-export fromtagged/instead ofgc/. -
emacs_core/value.rs-- The big one:- Replace
pub enum Value { ... }withpub type Value = TaggedValue; - Move convenience constructors to
TaggedValueimpl. - Remove
with_heap,with_heap_mut,set_current_heap, etc. - Remove
ConsSnapshot,read_cons(). - Update
PartialEq,Display, all accessor methods. - Update
eq_value,eql_value,equal_value. - Update
HashKeyto use pointers instead of ObjId.
- Replace
emacs_core/eval.rs-- (~236 Value constructor matches, ~91 temp_roots)- Convert all
Value::Cons(id)matches. - Convert all
read_cons()calls. - Update
with_heap/with_heap_mutcalls. - Keep
temp_roots(just change element type).
- Convert all
These files have the highest match counts and can be converted independently:
emacs_core/builtins/symbols.rs(~131 matches)emacs_core/builtins/cons_list.rs(~93 matches)emacs_core/builtins/buffers.rs(~127 matches)emacs_core/builtins/stubs.rs(~77 matches)emacs_core/builtins/arithmetic.rs(~66 matches)emacs_core/builtins/collections.rs(~49 matches)emacs_core/builtins/strings.rs(~59 matches)emacs_core/builtins/types.rs(~61 matches)emacs_core/builtins/misc_eval.rs(~59 matches)emacs_core/builtins/search.rs(~43 matches)emacs_core/builtins/misc_pure.rs(~18 matches)emacs_core/builtins/higher_order.rs(~22 matches)emacs_core/builtins/keymaps.rs(~20 matches)emacs_core/builtins/hooks.rs(~4 matches)emacs_core/builtins/mod.rs(~35 matches)
emacs_core/bytecode/vm.rs(~123 matches) -- critical pathemacs_core/print.rs(~84 matches)emacs_core/reader.rs(~52 matches)emacs_core/keymap.rs(~87 matches)emacs_core/xdisp.rs(~102 matches)emacs_core/syntax.rs(~89 matches)emacs_core/chartable.rs(~76 matches)emacs_core/interactive.rs(~64 matches)emacs_core/font.rs(~160 matches)emacs_core/fileio.rs(~64 matches)emacs_core/cl_lib.rs(~62 matches)emacs_core/hashtab.rs(~66 matches)emacs_core/display.rs(~58 matches)emacs_core/process.rs(~175 matches)emacs_core/textprop.rs(~56 matches)emacs_core/fns.rs(~35 matches)
- All remaining
.rsfiles (mostly tests and smaller modules).
emacs_core/builtins/tests.rs(~734 matches -- the largest file)- All
*_test.rsfiles -- update patterns to match new API.
emacs_core/pdump/convert.rs-- serialize/deserialize TaggedValue instead of Value enum + ObjId.emacs_core/pdump/types.rs-- update pdump type definitions.
// BEFORE
HashKey::Str(ObjId) // keyed by ObjId
HashKey::ObjId(u32, u32) // identity by index+generation
// AFTER
HashKey::Str(TaggedValue) // keyed by tagged pointer (contains string ptr)
HashKey::Ptr(usize) // identity by raw bits (pointer value)The ObjId variant in HashKey is replaced by pointer-based identity.
// BEFORE (match on enum variants)
fn eq_value(left: &Value, right: &Value) -> bool {
match (left, right) {
(Value::Nil, Value::Nil) => true,
(Value::Int(a), Value::Int(b)) => a == b,
(Value::Cons(a), Value::Cons(b)) => a == b,
...
}
}
// AFTER (TaggedValue already derives Eq based on raw bits)
// eq is just: left.bits() == right.bits()
// But: we need special cases for float eql and string/cons equal.
fn eq_value(left: TaggedValue, right: TaggedValue) -> bool {
left == right // bitwise comparison -- handles symbols, fixnums, cons identity
}
fn eql_value(left: TaggedValue, right: TaggedValue) -> bool {
if left == right { return true; }
// Floats: different pointers but same f64 value
if left.is_float() && right.is_float() {
return left.xfloat().to_bits() == right.xfloat().to_bits();
}
false
}
fn equal_value(left: TaggedValue, right: TaggedValue, depth: usize) -> bool {
if left == right { return true; }
match (left.kind(), right.kind()) {
(ValueKind::String, ValueKind::String) => {
left.as_str() == right.as_str()
}
(ValueKind::Cons, ValueKind::Cons) => {
equal_value(left.cons_car(), right.cons_car(), depth + 1)
&& equal_value(left.cons_cdr(), right.cons_cdr(), depth + 1)
}
(ValueKind::Float, ValueKind::Float) => {
left.xfloat().to_bits() == right.xfloat().to_bits()
}
// ... vectors, records, etc.
_ => false,
}
}Key insight: TaggedValue derives PartialEq and Eq via bitwise
comparison of the usize. This means:
- Fixnum equality works correctly (same bits = same value).
- Symbol equality works correctly (same SymId = same bits).
- Cons
eqis pointer identity (correct). - Float
eqis pointer identity (correct -- different allocations are noteq). - String
eqis pointer identity (correct foreq;equalcompares content).
// WRONG: assuming nil has its own tag
if value.tag() == TAG_SOME_SPECIAL_NIL { ... }
// RIGHT: nil is Symbol(0)
value.is_nil() // checks value.0 == 0
value.is_symbol() // true for nil!The old code uses Value::True. The new code uses Value::T. These are
different names for the same concept. Search-and-replace carefully:
Value::True->Value::T- But NOT in strings:
"true"stays"true". - In match arms:
Value::True =>becomesValueKind::T =>.
In the old system, Value::Keyword(SymId) and Value::Symbol(SymId) were
sibling enum variants. In the new system:
- Symbol uses tag
000(same as nil/t). - Keyword uses tag
111(immediate) with sub-tag00001.
So value.is_symbol() returns false for keywords. If your code checked
matches!(value, Value::Symbol(_) | Value::Keyword(_)), you need:
value.is_symbol() || value.is_keyword()// BEFORE: no heap allocation
let a = Value::Float(3.14, next_float_id());
let b = Value::Float(3.14, next_float_id());
// a != b (different float IDs)
// AFTER: heap allocation
let a = with_tagged_heap(|h| h.alloc_float(3.14));
let b = with_tagged_heap(|h| h.alloc_float(3.14));
// a != b (different pointers) -- same semantics, but now heap-allocatedThese were previously inline in the Value enum (zero-cost construction). Now they require heap allocation through the veclike system. If there are hot paths that create these frequently, consider caching.
The old system was safe Rust (checked via generation counters). The new system uses unsafe pointer dereferences. Every cons/string/float/veclike accessor is inherently unsafe. Wrap unsafe blocks tightly and document invariants.
// DANGEROUS
let s = value.as_str().unwrap(); // borrows from heap
do_something_that_triggers_gc(); // GC may free the string object
use(s); // dangling pointer!
// SAFE
let s = value.as_str().unwrap().to_owned(); // copy the string data
do_something_that_triggers_gc();
use(&s); // safe, we own the copyThis was also a concern in the old system but was masked by generation counters catching the bug. In the new system it's UB.
// BEFORE
with_heap(|h| match value {
Value::Cons(id) => Some(h.cons_car(id)),
_ => None,
})
// AFTER (no closure needed)
if value.is_cons() { Some(value.cons_car()) } else { None }The elimination of with_heap closures often simplifies code by removing
a nesting level.
// BEFORE (single heap borrow for multiple accesses)
with_heap(|h| {
let a_car = h.cons_car(a_id);
let b_car = h.cons_car(b_id);
a_car == b_car
})
// AFTER (no closure needed -- direct access)
let a_car = a.cons_car();
let b_car = b.cons_car();
a_car == b_car// BEFORE
Value::bool(condition) // returns Value::True or Value::Nil
// AFTER
Value::bool_val(condition) // renamed to avoid conflict with Rust's bool type
// Or keep Value::bool() if the type alias makes it unambiguousThese counts indicate the scale of changes needed per pattern:
| Pattern | Occurrences | Files |
|---|---|---|
Value::Variant(...) constructors/matches |
~8025 | 185 |
with_heap(...) / with_heap_mut(...) |
~605 | 91 |
matches!(value, Value::...) |
~507 | 81 |
read_cons(...) |
~319 | 66 |
ObjId references |
~241 | 32 |
if let Value::... |
~191 | 60 |
Total estimated touch points: ~10,000 across ~185 files.
| Old | New | Notes |
|---|---|---|
Value::Nil |
Value::NIL |
const |
Value::True |
Value::T |
const |
Value::Int(n) |
Value::fixnum(n) |
method |
Value::Float(f, id) |
heap.alloc_float(f) |
heap-allocated |
Value::Symbol(id) |
Value::symbol(id) |
method |
Value::Keyword(id) |
Value::keyword(id) |
method |
Value::Char(c) |
Value::char(c) |
method |
Value::Subr(id) |
Value::subr(id) |
method |
Value::Cons(id) |
heap.alloc_cons(car, cdr) |
returns TaggedValue |
Value::Str(id) |
heap.alloc_string(ls) |
returns TaggedValue |
Value::Vector(id) |
heap.alloc_vector(items) |
veclike |
Value::Record(id) |
heap.alloc_record(items) |
veclike |
Value::HashTable(id) |
heap.alloc_hash_table(ht) |
veclike |
Value::Lambda(id) |
heap.alloc_lambda(data) |
veclike |
Value::Macro(id) |
heap.alloc_macro(data) |
veclike |
Value::ByteCode(id) |
heap.alloc_bytecode(data) |
veclike |
Value::Marker(id) |
heap.alloc_marker(data) |
veclike |
Value::Overlay(id) |
heap.alloc_overlay(data) |
veclike |
Value::Buffer(bid) |
heap.alloc_buffer(bid) |
veclike (new) |
Value::Window(id) |
heap.alloc_window(id) |
veclike (new) |
Value::Frame(id) |
heap.alloc_frame(id) |
veclike (new) |
Value::Timer(id) |
heap.alloc_timer(id) |
veclike (new) |
with_heap(|h| ...) |
direct access or with_tagged_heap |
eliminated |
with_heap_mut(|h| ...) |
with_tagged_heap(|h| ...) |
unified |
read_cons(id) |
value.cons_car() / .cons_cdr() |
eliminated |
value.str_id() |
value.as_string_ptr() |
pointer-based |
ObjId |
eliminated | no equivalent |
HeapObject |
concrete *Obj structs |
per-type |
LispHeap |
TaggedHeap |
new impl |
check(id) |
n/a (no generation check) | unsafe invariant |
ConsSnapshot |
n/a | eliminated |
next_float_id() |
n/a | eliminated |