This guide documents effective debugging patterns discovered while working on this Ruby compiler. These techniques have proven successful for diagnosing and fixing compilation issues, runtime crashes, and language feature gaps.
- Quick Reference
- Debugging Compilation Failures
- Debugging Segfaults and Runtime Crashes
- Debugging Parser Issues
- Debugging Operator and Method Issues
- Testing and Verification
- Common Patterns and Solutions
| Symptom | Likely Cause | First Steps |
|---|---|---|
| "Method missing X#method" | Method not in vtable | Check if method exists in class, add if missing |
| "Method missing X#__get_raw" | Calling __get_raw on wrong type | Add type check with is_a?(Integer) before calling __get_raw |
| FPE (Floating Point Exception) | Division by zero or arg mismatch | Check __eqarg calls with GDB, or look for explicit 1/0 error signaling |
| SIGSEGV at address 0x1 or similar | Calling through nil/fixnum | Check vtable entry exists, method properly defined |
| SIGSEGV at 0xfffffffd or similar | Invalid vtable call in s-expression | Move method call out of %s() to use normal dispatch |
| "Unable to resolve X statically" | Missing class/constant | Check if class is defined, may need stub |
| Parse error | Parser doesn't support syntax | Check parser.rb, may need new operator/keyword support |
| Lambda at top-level crashes or doesn't compile | KNOWN LIMITATION: Top-level lambdas not supported | IMPORTANT: Lambdas work FINE in methods but CRASH at top-level; use method wrapper for testing |
| Test fails at top level but works in spec | KNOWN LIMITATION: Top-level scope issues | CRITICAL: Many language features (eigenclasses, etc.) work in methods but fail at top-level; ALWAYS test inside methods, NOT at top-level |
CRITICAL TECHNIQUE: When a large test file crashes or fails, systematically reduce it to the smallest reproducible case.
Process:
- Start with the full failing test
- Remove half the tests and check if it still fails
- Binary search to find the minimal failing case
- Once you have the minimal case, analyze WHY it fails
Example from bit_and_spec.rb investigation:
# Full spec: 104 lines, crashes with no output
./compile rubyspec_temp_bit_and_spec.rb # CRASH
# Test just fixnum context (50 lines): still crashes
# Test just first 4 tests: still crashes
# Test just first 2 tests: still crashes
# Test just first test: WORKS (just has assertion failure)
# Test just second test alone: CRASHES
# Narrow down second test:
# Line 1 alone: works
# Line 2 alone: works
# Lines 1-2 together: works
# Lines 1-3 together: CRASHES
# Found: The crash happens with 3+ certain bignum expressions together
# NOT caused by the & operator itself, but by compiler bug with expression combinationsKey insight: The crash was NOT in the code being tested (& operator with coercion), but in how the compiler handles certain combinations of bignum expressions. This would have been impossible to find without systematic reduction.
When compilation fails, examine the parse tree to understand how the compiler interprets your code.
# View the full parse tree
ruby -I. ./driver.rb test_file.rb --parsetree
# Search for specific patterns
ruby -I. ./driver.rb test_file.rb --parsetree 2>&1 | grep "pattern"Example: When investigating unary +:
ruby -I. ./driver.rb test_simple_uplus.rb --parsetree 2>&1 | grep -A10 "assign y"
# Found: (assign y (callm ((sexp 11)) +@ ()))
# Issue: Double-wrapped receiver ((sexp 11)) instead of (sexp 11)When a complex file fails to compile:
- Create a minimal reproduction case
- Add complexity incrementally until it breaks
- Identify the exact construct causing failure
Example: Rational literal investigation:
# Start simple
puts 5r # Works?
# Add complexity
x = 6/5r # Works?
puts x
# Use in expression
result = 3.ceildiv(6/5r) # Where does it fail?Compilation failures often reveal missing language features. Check:
- Parser support: Is the syntax recognized? (
parser.rb,tokens.rb) - Transformation: Does it transform correctly? (
transform.rb) - Compilation: Does it generate assembly? (
compiler.rb,compile_*.rb) - Runtime: Does the method exist? (
lib/core/*.rb)
Always start with a backtrace to identify the crash location:
gdb ./out/test_program <<'EOF'
run < /dev/null
bt 20
quit
EOFWhat to look for:
- Method names in stack: Identifies which method failed
__printerr: Indicates error handler was called (check fordiv 1 0)__eqarg: Argument count mismatch- Address like
0x1,0xb: Attempting to call a fixnum value as function
When crashing at specific addresses:
# Check what the crash address represents
gdb ./out/test_program <<'EOF'
run < /dev/null
info registers
x/10i $eip-20 # Disassemble around crash
quit
EOFAddress meanings:
0x00000001: Fixnum 0 (value(0 << 1) | 1)0x0000000b: Fixnum 5 (value(5 << 1) | 1)- Odd addresses: Tagged fixnums
- Even addresses: Likely pointers (objects, strings, etc.)
For "Method missing" errors:
- Verify the class:
# Check which class is being used
gdb ./out/test <<'EOF'
break __method_missing
run < /dev/null
frame 1
# Examine the object's class
quit
EOF- Check vtable:
# Search for method in vtable
grep "__method_ClassName_methodname" out/program.s
grep "__voff__methodname" out/program.s- Verify method definition:
# Check if method is defined in class
grep "def methodname" lib/core/classname.rbIf accessing an uninitialized global causes crashes:
Symptom: Calling .method on nil causes "Method missing NilClass#method"
Solution:
- Check initialization order in files
- Move global initializations before methods that use them
- Verify globals are initialized at file top-level, not in method definitions
Example Fix:
# WRONG: Used before defined
def it(&block)
$before_each_blocks.each { |b| b.call } # Crashes if nil!
end
$before_each_blocks = [] # Too late!
# RIGHT: Define before use
$before_each_blocks = []
def it(&block)
$before_each_blocks.each { |b| b.call } # Now safe
endFor bitwise/arithmetic operations that crash:
Check if result is properly tagged:
# Wrong: Returns untagged integer
def & other
%s(bitand (callm self __get_raw) (callm other __get_raw))
end
# Right: Wrap result to restore tag
def & other
%s(__int (bitand (callm self __get_raw) (callm other __get_raw)))
endWhy: Bitwise operations strip the type tag. Must re-apply with __int().
Problem: Calling __get_raw on non-Integer types causes segfaults.
__get_raw is type-specific and only exists on Integer, Float, and String. Calling it within an s-expression %s(callm other __get_raw) on a non-compatible type causes a segfault at an invalid address (like 0xfffffffd).
Solution:
# Wrong: Calls __get_raw without type checking
def & other
%s(__int (bitand (callm self __get_raw) (callm other __get_raw)))
end
# Right: Check type before calling __get_raw
def & other
if other.is_a?(Integer)
other_raw = other.__get_raw
%s(__int (bitand (callm self __get_raw) other_raw))
else
STDERR.puts("TypeError: Integer can't be coerced")
nil
end
endKey principles:
- Never call
__get_rawwithout knowing the receiver's type - Use
is_a?(Integer)to verify type before calling__get_raw - Call
__get_rawin Ruby code (not in s-expression) to get proper method dispatch - For error signaling without exceptions, print to STDERR and return
nil(not1/0which crashes)
Problem: FPE crashes in RubySpecs that test wrong argument counts.
Background: The compiler uses FPE (Floating Point Exception) via 1/0 as intentional error signaling in place of exceptions. This is by design, not a bug. However, RubySpecs that test ArgumentError handling will crash with FPE.
Workaround: Use *args pattern to validate argument count and return safe values.
# Wrong: Fixed argument count, FPE on wrong count
def method_name(arg1, arg2)
# ... implementation
end
# Right: Validate argument count, return safe value on error
def method_name(*args)
if args.length != 2
STDERR.puts("ArgumentError: wrong number of arguments (given #{args.length}, expected 2)")
return nil # or appropriate safe value (0, false, self, etc.)
end
arg1, arg2 = args
# ... normal implementation
endWhen to use:
- Methods that crash with FPE in RubySpecs testing ArgumentError
- Typically
<=>,**,fdiv, and comparison methods with wrong arg counts
Important notes:
- This is a temporary workaround until exceptions are implemented
- Document with comment:
# WORKAROUND: No exceptions - validate args manually - Choose appropriate safe return value based on method contract
- FPE signaling (
1/0) is still used elsewhere and is intentional
When parser fails on specific syntax:
- Check tokenizer output:
# Add debug output in tokens.rb
def self.expect(s)
puts "Parsing: #{s.peek(10)}" # See what's being parsed
# ... rest of method
end- Verify operator precedence:
grep "your_operator" operators.rb- Check if operator needs special handling:
grep "your_operator" shunting.rbFor operator symbols (:+, :-@, etc.):
Check: Does sym.rb recognize the symbol?
# sym.rb pattern
elsif s.peek == ?-
s.get
if s.peek == ?@ # Check for unary operator
s.get
return :":-@"
end
return :":-"Verification:
# Test that symbols parse correctly
require_relative 'sym'
# Symbol :-@ should be recognized as single tokenFor unary operators, check transformation in transform.rb:
# transform.rb - rewrite_operators method
if e[0] == :+ && e.length == 2 # Unary plus
e[3] = E[] # args = []
e[2] = :+@ # method = :+@
e[1] = e[1] # object = operand (DON'T wrap with E[])
e[0] = :callm # op = :callm
endCommon mistake: Using E[e[1]] double-wraps the argument.
Symptoms:
- "FIXME: Dummy" in error output
- Wrong results from operations
- Operations don't work at all
Check locations:
compile_arithmetic.rb- Compilation methods (compile_add,compile_sall, etc.)emitter.rb- Assembly instruction emitters (addl,sall, etc.)lib/core/fixnum.rb- Method definitions (def +,def <<, etc.)
Example fix for shift operators:
# 1. Add emitter (emitter.rb)
def sall src, dest; emit(:sall, src, dest); end
# 2. Implement compiler (compile_arithmetic.rb)
def compile_sall(scope, left, right)
# Evaluate shift amount -> %ecx
compile_eval_arg(scope, left)
@e.movl(@e.result, :ecx)
# Evaluate value to shift
compile_eval_arg(scope, right)
# Perform shift
@e.sall(:cl, @e.result)
Value.new([:subexpr])
end
# 3. Define method (lib/core/fixnum.rb)
def << other
%s(__int (sall (sar other) (sar self)))
endWhen methods need to handle non-Integer arguments:
def ceildiv(other)
# Convert to integer if it responds to to_int
if other.respond_to?(:to_int)
other = other.to_int
end
# Now use as integer
# ...
endWhere to put coercion:
- In the method (e.g.,
ceildiv) - CORRECT for application-level - Not in
__get_raw- That's too low-level - Operators should implement proper coercion protocol
- Create minimal test:
# test_feature.rb
puts "Testing..."
result = 5 << 2 # Or whatever you're testing
puts result # Should print 20
puts "Done!"- Compile and run:
./compile test_feature.rb && ./out/test_feature- Debug if crashes:
gdb ./out/test_feature <<'EOF'
run < /dev/null
bt
quit
EOF- Check assembly if wrong result:
grep -A20 "test pattern" out/test_feature.s- Verify selftest still passes:
make selftestFor RubySpec tests:
# Run single spec
./run_rubyspec rubyspec/core/integer/method_spec.rb
# Count failures
./run_rubyspec rubyspec/ --count-failures
# Debug specific failure
./run_rubyspec rubyspec/core/integer/method_spec.rb
gdb ./out/rubyspec_temp_method_specAfter fixing a bug:
-
Verify the fix:
- Test the specific failing case
- Test edge cases
- Test with different types
-
Verify no regressions:
# Always run selftest
make selftest
# Run related specs
./run_rubyspec rubyspec/core/integer/*.rb- Document the fix:
- Update TODO.md with what was fixed
- Add comments explaining why the fix works
- Reference issue in commit message
Symptom: Parser error or "FIXME" in output
Investigation:
- Search codebase for "FIXME" near the error
- Check if similar features are implemented
- Look for stubs or placeholder code
Example:
# Find FIXME comments
grep -r "FIXME.*shift" .
# Found: compile_sall and compile_sarl are dummy implementations
# Solution: Implement them properlySymptom: Operations produce wrong results but don't crash
Check: S-expression argument order in %s(...) expressions
Example:
# WRONG order for left shift
def << other
%s(__int (sall (sar self) (sar other)))
end
# Result: shifts 'other' by 'self' amount
# RIGHT order
def << other
%s(__int (sall (sar other) (sar self)))
end
# Result: shifts 'self' by 'other' amountVerification: Check the compile_* method to see argument order.
Symptom: "Method missing ClassName#methodname"
But: Method IS defined in the class file
Cause: Method not added to vtable during compilation
Solution:
- Ensure method is defined in class file loaded by compiler
- Check that class file is properly required
- Verify method name matches exactly (including special chars like
+@)
Symptom: Compiler generates invalid assembly
Debug:
# Try to assemble the output
gcc -m32 -c out/test.s 2>&1 | head -20
# Look for error line
grep "Error" outputCommon issues:
- Missing emitter method (e.g.,
andl,orl) - Invalid instruction (check x86 documentation)
- Wrong operand order (x86 is
instruction source, dest)
Symptom: Compiler can't compile itself or certain constructs
Constraints:
- Can't use exceptions (begin/rescue) in compiler source
- Can't use regexps
- Can't use eval
- Can't use
unless(useif !instead) - Can't use
returnwith s-expressions (%s(...)) - Limited metaprogramming
Solutions:
- Use simple Ruby constructs only
- Avoid features not yet implemented
- Mark with
@bugcomment if working around compiler limitation - Test with MRI first, then with compiled compiler
When debugging RubySpec failures, the preprocessed test files can help:
Location: Preprocessed files are in the repository root as rubyspec_temp_*.rb
Usage:
# Find the preprocessed file
ls rubyspec_temp_bit_length_spec.rb
# Check specific lines mentioned in backtrace
sed -n '460,465p' rubyspec_temp_bit_length_spec.rb
# Compare with original spec
diff rubyspec/core/integer/bit_length_spec.rb rubyspec_temp_bit_length_spec.rbNote: Line numbers in GDB backtraces refer to the preprocessed file, not the original spec file.
When GDB shows crashes at small odd addresses like 0x3, 0x5, 0xb:
Symptom: SIGSEGV at address 0x00000003 or similar small odd number
Diagnosis:
- These are tagged fixnum values:
(value << 1) | 1 - 0x3 = fixnum 1, 0x5 = fixnum 2, 0xb = fixnum 5
- Program is trying to call through a fixnum as if it were a function pointer
Likely causes:
- Closure/proc calling issue where fixnum is treated as callable
- Vtable entry contains wrong value
- Method returning fixnum instead of proc
- Incorrect function pointer handling in generated assembly
Debug steps:
# Check what's in the register being called
gdb ./out/program <<'EOF'
run < /dev/null
info registers
# Check eax - often contains the bad address
quit
EOFWhen encountering an issue, check these in order:
- Does it compile with MRI Ruby? (Rules out syntax errors)
- Does selftest pass? (Ensures compiler itself works)
- Is the feature implemented? (Check for FIXMEs)
- Is the method in the vtable? (Search assembly output)
- Are types properly tagged? (Check for
__intwrapping) - Is
__get_rawonly called on known types? (Add type checks withis_a?) - Is argument order correct? (Check s-expression vs implementation)
- Are globals initialized before use? (Check file order)
- Is the transformation correct? (Use
--parsetree) - Are error signals using
nilnot1/0? (Avoid FPE crashes in specs) - Do methods need ArgumentError handling? (Use
*argspattern to validate arg count)
- Parser debugging: Use
--parsetreeflag with driver.rb - Assembly inspection: Generated
.sfiles inout/directory - GDB debugging: Use
gdb ./out/programfor runtime issues - Selftest verification: Always run
make selftestafter changes
ARCHITECTURE.md- Overall compiler architectureTODO.md- Known issues and planned improvementssegfault_analysis_2025-10-09.md- Detailed analysis of spec failuresbitwise_operator_coercion_bug.md- Specific bug investigation example