TinyTotVM uses a clean, modular architecture designed for extensibility, performance, and educational clarity. The codebase is organized into logical modules that separate concerns and enable easy maintenance and extension.
TinyTotVM has been refactored from a monolithic 7,000+ line main.rs file into a well-organized, modular structure:
src/
├── main.rs # Entry point and CLI
├── lib.rs # Library interface
├── bytecode.rs # Unified instruction parsing
├── compiler.rs # Source compilation
├── lisp_compiler.rs # Lisp transpilation
├── optimizer.rs # Optimization passes
├── vm/ # Virtual machine core
│ ├── machine.rs # Execution engine
│ ├── value.rs # Type system
│ ├── opcode.rs # Instruction definitions
│ ├── stack.rs # Stack management
│ ├── memory.rs # Memory management
│ └── errors.rs # Error handling
├── concurrency/ # BEAM-style concurrency
│ ├── pool.rs # SMP scheduler
│ ├── process.rs # Process isolation
│ ├── scheduler.rs # Individual schedulers
│ ├── registry.rs # Process registry
│ ├── supervisor.rs # Supervision trees
│ └── messages.rs # Message types
├── gc/ # Garbage collection
│ ├── mark_sweep.rs # Mark-sweep GC
│ ├── no_gc.rs # No-op GC
│ └── stats.rs # GC statistics
├── profiling/ # Performance analysis
│ ├── profiler.rs # Profiling engine
│ └── stats.rs # Statistics
├── testing/ # Test framework
│ ├── harness.rs # Test execution
│ └── runner.rs # Test runner
├── ir/ # Intermediate Representation
│ ├── mod.rs # Core IR structures
│ ├── lowering.rs # Stack-to-register translation
│ └── vm.rs # Register-based execution
└── cli/ # Command line interface
├── args.rs # Argument parsing
└── commands.rs # Command dispatch
Benefits of Modular Architecture:
- Maintainability - Clear separation of concerns
- Testability - Individual modules can be tested in isolation
- Extensibility - Easy to add new features without affecting existing code
- Performance - Optimized compilation and reduced compile times
- Collaboration - Multiple developers can work on different modules
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Source Code │───▶│ Compiler │───▶│ Bytecode │
│ (.ttvm/.lisp) │ │ (parser) │ │ (.ttb) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Optimizer │◀───│ VM │◀───│ Loader │
│ (8 passes) │ │ (execution) │ │ (bytecode) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Standard Library │
│ (modules) │
└─────────────────┘
The heart of TinyTotVM, organized into specialized modules:
- Stack-based execution model
- Instruction dispatch loop
- Module loading and circular dependency detection
- Function call handling
- Exception processing
- Garbage collection integration
- Profiling and tracing support
- Dynamic type system
- Type coercion and conversion
- Value serialization/deserialization
- Stack operations and bounds checking
- Variable scoping and frame management
- Call stack management
- Comprehensive error types
- Stack unwinding and cleanup
- Error reporting and context
- OpCode definitions
- Message patterns for concurrency
- Instruction parameter handling
BEAM-style actor model with complete fault tolerance:
- SMP work-stealing scheduler
- Multi-core process distribution
- Load balancing and fairness
- Isolated process state
- Message passing and mailboxes
- Process monitoring and linking
- Supervision tree integration
- Named process registration
- Process lifecycle management
- Name resolution and cleanup
- Fault tolerance strategies
- Automatic process restart
- Supervision tree management
Pluggable garbage collection architecture:
- Traditional mark-and-sweep algorithm
- Root set identification
- Memory compaction
- Disabled garbage collection
- Testing and benchmarking
- Performance metrics
- Memory usage tracking
- Collection frequency analysis
Comprehensive performance analysis:
- Function-level timing
- Instruction counting
- Call frequency analysis
- Memory allocation tracking
- Performance reporting
- Color-coded output
- Trend analysis
Comprehensive test execution and reporting:
- Test discovery and execution
- Result collection and aggregation
- Progress reporting
- Individual test execution
- Error handling and reporting
- Test isolation
User-friendly command line interface:
- Command line option parsing
- Configuration validation
- Help and usage information
- Command routing and execution
- Error handling and reporting
- User feedback
enum Value {
Int(i64),
Float(f64),
Str(String),
Bool(bool),
Null,
List(Vec<Value>),
Object(HashMap<String, Value>),
Bytes(Vec<u8>),
Connection(String),
Stream(String),
Future(String),
Function { addr: usize, params: Vec<String> },
Closure { addr: usize, params: Vec<String>, captured: HashMap<String, Value> },
Exception { message: String, stack_trace: Vec<String> },
}- Main Stack - Value storage with 1024 pre-allocated slots
- Call Stack - Return addresses with 64 pre-allocated slots
- Variable Frames - Lexical scoping with frame stack
- Exception Stack - Try/catch blocks with unwinding support
┌─────────────────┐ ← Top
│ Value N │
├─────────────────┤
│ Value 2 │
├─────────────────┤
│ Value 1 │ ← Bottom
└─────────────────┘
Main Stack (1024 slots)
┌─────────────────┐ ← Current Call
│ Return Addr │
├─────────────────┤
│ Return Addr │ ← Previous Call
└─────────────────┘
Call Stack (64 slots)
variables: Vec<HashMap<String, Value>>
// │ └─ Variables in scope
// └─ Frame stack (one per function call)Pluggable GC architecture:
trait GcEngine {
fn alloc(&mut self, value: Value) -> GcRef;
fn mark_from_roots(&mut self, roots: &[&Value]);
fn sweep(&mut self) -> usize;
fn stats(&self) -> GcStats;
}Available Engines:
- MarkSweepGc - Traditional mark & sweep
- NoGc - Disabled garbage collection
- Future: Reference counting, generational GC
Source Code → Tokenization → AST → Instruction Generation
Features:
- Symbolic and numeric label resolution
- Parameter validation
- Syntax error reporting
- Instruction optimization hints
8-pass optimization engine:
pub struct Optimizer {
stats: OptimizationStats,
// Pass implementations
}
impl Optimizer {
pub fn optimize(&mut self, instructions: Vec<OpCode>) -> Vec<OpCode> {
// 8 optimization passes
}
}Optimization Passes:
- Constant folding
- Constant propagation
- Dead code elimination
- Peephole optimizations
- Instruction combining
- Jump threading
- Tail call optimization
- Memory layout optimization
Binary format for faster loading and unified instruction parsing:
Magic Header | Version | Instruction Count | Instructions | Metadata
Key Features:
- Unified parsing - Single parser for all instruction types
- Label resolution - Symbolic address resolution
- Module imports - Automatic dependency loading
- Error reporting - Detailed parse error messages
fn run(&mut self) -> VMResult<()> {
while self.ip < self.instructions.len() {
let instruction = &self.instructions[self.ip];
// Profiling hooks
if let Some(ref mut profiler) = self.profiler {
profiler.record_instruction();
}
// Tracing hooks
if self.trace_enabled {
println!("[trace] {:?} @ 0x{:04X}", instruction, self.ip);
}
// Execute instruction
match instruction {
OpCode::PushInt(n) => self.stack.push(Value::Int(*n)),
OpCode::Add => { /* arithmetic implementation */ },
// ... other instructions
}
self.ip += 1;
}
}// Function call mechanism
OpCode::Call { addr, params } => {
// Save return address
self.call_stack.push(self.ip + 1);
// Create new variable frame
let mut frame = HashMap::new();
for param_name in params.iter().rev() {
let value = self.pop_stack("CALL")?;
frame.insert(param_name.clone(), value);
}
self.variables.push(frame);
// Jump to function
self.ip = *addr;
}struct ExceptionHandler {
catch_addr: usize,
stack_size: usize,
call_stack_size: usize,
variable_frames: usize,
}
// Exception unwinding
fn unwind_to_handler(&mut self, exception: Value) {
if let Some(handler) = self.try_stack.pop() {
// Restore stack state
self.stack.truncate(handler.stack_size);
self.call_stack.truncate(handler.call_stack_size);
self.variables.truncate(handler.variable_frames);
// Jump to catch block
self.stack.push(exception);
self.ip = handler.catch_addr;
}
}fn import_module(&mut self, module_path: &str) -> VMResult<()> {
// Circular dependency detection
if self.loading_stack.contains(&module_path.to_string()) {
return Err(VMError::FileError {
filename: module_path.to_string(),
error: "Circular dependency detected".to_string(),
});
}
// Load and execute module
self.loading_stack.push(module_path.to_string());
let module_vm = VM::new(module_instructions);
module_vm.run()?;
// Import exports
self.loaded_modules.insert(module_path.to_string(), module_vm.exports);
self.loading_stack.pop();
}fn adjust_instruction_addresses(&self, instruction: &OpCode, base_addr: usize) -> OpCode {
match instruction {
OpCode::Call { addr, params } => OpCode::Call {
addr: addr + base_addr,
params: params.clone()
},
OpCode::Jmp(addr) => OpCode::Jmp(addr + base_addr),
// ... other address-containing instructions
}
}struct Profiler {
function_timings: HashMap<String, Duration>,
instruction_counts: HashMap<String, usize>,
call_counts: HashMap<String, usize>,
current_function_stack: Vec<(String, FunctionProfiler)>,
// ... other metrics
}
struct FunctionProfiler {
start_time: Instant,
instruction_count: usize,
}// Instruction-level tracing
if self.trace_enabled {
let indent = " ".repeat(self.call_depth);
println!("[trace] {}{:?} @ 0x{:04X}", indent, instruction, self.ip);
}
// Function call tracing
if self.trace_enabled {
println!("[trace] {}CALL {} with {} params", indent, function_name, params.len());
}#[derive(Debug)]
pub enum VMError {
StackUnderflow(String),
TypeMismatch { expected: String, got: String, operation: String },
UndefinedVariable(String),
IndexOutOfBounds { index: i64, length: usize },
FileError { filename: String, error: String },
ParseError { line: usize, instruction: String },
CallStackUnderflow,
NoVariableScope,
UnknownLabel(String),
InsufficientStackItems { needed: usize, available: usize },
}- Graceful Degradation - No crashes or panics
- Detailed Messages - Clear error descriptions with context
- Stack Preservation - Safe stack unwinding
- Resource Cleanup - Automatic cleanup on errors
// 1. Add to OpCode enum
#[derive(Debug, Clone)]
pub enum OpCode {
// ... existing instructions
NewInstruction(String),
}
// 2. Add execution logic
match instruction {
// ... existing cases
OpCode::NewInstruction(param) => {
// Implementation
}
}
// 3. Add parsing support
"NEW_INSTRUCTION" => {
let param = parts[1].to_string();
OpCode::NewInstruction(param)
}// 1. Add to Value enum
#[derive(Debug, Clone)]
pub enum Value {
// ... existing types
NewType(CustomData),
}
// 2. Add type-specific operations
// 3. Add serialization support
// 4. Add GC integration if needed; custom_module.ttvm
LABEL custom_function
; Implementation
RET
MAKE_FUNCTION custom_function params
STORE custom_function
EXPORT custom_function- Instruction Execution: O(1) per instruction
- Function Calls: O(1) call overhead
- Variable Access: O(1) with hash map lookup
- Exception Handling: O(n) stack unwinding
- Garbage Collection: O(n) mark & sweep
- Stack Space: O(n) program stack depth
- Call Stack: O(d) call depth
- Variable Storage: O(v) total variables
- Module Cache: O(m) number of modules
- Constant Folding: 37% instruction reduction
- Dead Code Elimination: 71% instruction reduction
- Combined Optimizations: Up to 46% overall improvement
- Memory Access: Reduced redundant operations
TinyTotVM includes an experimental register-based execution mode that translates stack-based bytecode to register-based intermediate representation.
// Core IR data structures
pub enum RegInstr {
Mov(RegId, RegValue), // Move value to register
Add(RegId, RegId, RegId), // dst = src1 + src2
Sub(RegId, RegId, RegId), // dst = src1 - src2
Jmp(usize), // Unconditional jump
Jz(RegId, usize), // Jump if register is zero
Print(RegId), // Print register value
Halt, // Stop execution
}
pub enum RegValue {
Const(Value), // Immediate constant
Reg(RegId), // Register reference
}
pub struct RegBlock {
instructions: Vec<RegInstr>,
register_count: u32,
entry: usize,
}The lowering pass converts stack-based operations to register-based operations:
// Stack-based: PUSH_INT 5; PUSH_INT 3; ADD
// Becomes register-based:
Mov(r0, Const(Value::Int(5))) // r0 = 5
Mov(r1, Const(Value::Int(3))) // r1 = 3
Add(r2, r0, r1) // r2 = r0 + r1- First Pass: Build address mapping from bytecode to IR instructions
- Second Pass: Translate each bytecode instruction to equivalent IR
- Stack Simulation: Maintain virtual stack state using register allocation
- Register Allocation: Allocate registers for intermediate values
pub struct RegisterVM {
registers: Vec<Value>, // Register file
variables: HashMap<String, Value>, // Named variables
ip: usize, // Instruction pointer
block: RegBlock, // IR program
halted: bool, // Execution state
}- Research Platform: Experimental register-based execution
- Performance Potential: Register operations can be more efficient
- Educational Value: Demonstrates register allocation and IR translation
- Architecture Comparison: Direct comparison between stack and register execution
- Basic Operations: Full support for arithmetic, logic, comparisons
- Variable Operations: Complete STORE, LOAD, DELETE support
- Control Flow: Jump and conditional operations (with some edge cases being refined)
- Concurrency Integration: Automatic delegation to SMP scheduler for SPAWN, SEND, RECEIVE operations
- Future Work: Direct IR execution of concurrency operations, function calls, exception handling
# Enable IR execution mode
ttvm --use-ir examples/program.ttvm
# Compare with traditional stack execution
ttvm --no-smp examples/program.ttvm- Simplicity - Clean, understandable code structure
- Modularity - Separate concerns, pluggable components
- Safety - No crashes, comprehensive error handling
- Performance - Efficient execution with optimization
- Extensibility - Easy to add features and instructions
- Educational Value - Clear demonstration of VM concepts
- Cross-Platform - Pure Rust, runs anywhere
- Hybrid Architecture - Support both stack and register execution modes