Skip to content

Latest commit

 

History

History
461 lines (355 loc) · 10.8 KB

File metadata and controls

461 lines (355 loc) · 10.8 KB

LOOM Usage Guide

Complete guide to using the LOOM WebAssembly optimizer.

Table of Contents

Quick Start

Installation

# Build from source
cargo build --release

# The binary will be at target/release/loom
./target/release/loom --help

Basic Usage

# Optimize a WAT file
loom optimize input.wat --output output.wasm

# Optimize a WASM binary
loom optimize input.wasm --output optimized.wasm

# Show optimization statistics
loom optimize input.wat --output output.wasm --stats

# Enable verification
loom optimize input.wat --output output.wasm --verify

# Output WAT text format instead of binary
loom optimize input.wasm --output output.wat --wat

Command-Line Interface

Optimize Command

loom optimize <INPUT> [OPTIONS]

Arguments:
  <INPUT>          Input WebAssembly file (.wat or .wasm)

Options:
  -o, --output <OUTPUT>    Output file path (default: output.wasm)
  --wat                    Output WAT text format instead of binary
  --stats                  Show detailed optimization statistics
  --verify                 Run verification checks after optimization

Example Commands

# Optimize and show stats
loom optimize my_module.wat --output optimized.wasm --stats

# Optimize and verify correctness
loom optimize input.wasm -o output.wasm --verify

# Convert WASM to optimized WAT
loom optimize input.wasm --output output.wat --wat

Optimization Features

LOOM applies a multi-phase optimization pipeline. Two surfaces exist with slightly different pass sets:

  • Library optimize_module() (loom-core/src/lib.rs): Inline → Constant Folding → Advanced Instructions → Simplify Locals → DCE → Code Folding → LICM → Remove Unused Branches → Optimize Added Constants → DCE → Coalesce Locals.
  • CLI loom optimize --passes (loom-cli/src/main.rs) selectable passes: inline, precompute, constant-folding, cse, advanced, branches, dce, merge-blocks, vacuum, simplify-locals.

The pass descriptions below cover the union of both pipelines (some passes run in only one surface):

Precompute (Global Constant Propagation)

  • Replaces immutable global.get with constants
  • Enables further constant folding

ISLE-Based Constant Folding

  • Folds constant arithmetic at compile time
  • Example: i32.const 10 + i32.const 20i32.const 30

Advanced Instruction Optimizations

  • Strength Reduction: Replaces expensive operations with cheaper equivalents
    • x * 8x << 3 (multiply to shift)
    • x / 4x >> 2 (divide to shift)
    • x % 32x & 31 (modulo to bitwise AND)
  • Bitwise Optimizations:
    • x | 0x
    • x & -1x
    • x ^ 0x

Common Subexpression Elimination (CSE)

  • Caches duplicate computations in local variables
  • Skips simple constants (they're cheap)
  • Example:
    ;; Before
    local.get $x
    i32.const 4
    i32.mul
    local.get $x
    i32.const 4
    i32.mul  ;; Duplicate!
    
    ;; After
    local.get $x
    i32.const 4
    i32.mul
    local.tee $temp
    local.get $temp  ;; Reuse cached result

Function Inlining

  • Inlines small, frequently-called functions
  • Reduces call overhead
  • Enables further optimizations across function boundaries

ISLE Optimization (Post-Inlining)

  • Second pass of constant folding
  • Optimizes code exposed by inlining

Code Folding

  • Flattens nested blocks
  • Removes unnecessary control flow structures

Loop-Invariant Code Motion (LICM)

  • Hoists loop-invariant computations outside loops
  • Currently handles:
    • Constants
    • Unmodified local variables

Branch Simplification

  • Simplifies conditional branches
  • Removes redundant conditions

Dead Code Elimination (DCE)

  • Removes unreachable code after terminators
  • Cleans up code that never executes

Block Merging

  • Merges consecutive blocks
  • Reduces control flow overhead

Vacuum & Simplify Locals

  • Removes empty blocks
  • Eliminates unused local variables

Verification

LOOM supports two types of verification:

1. ISLE Property-Based Verification (Always Available)

  • Checks idempotence (running optimization twice gives same result)
  • Validates constant folding correctness
  • Fast and lightweight

2. Z3 SMT-Based Formal Verification (Optional)

  • Proves optimization correctness via translation validation
  • Requires Z3 to be installed

Installing Z3

macOS (Homebrew):

brew install z3

Ubuntu/Debian:

sudo apt-get install z3

From source:

git clone https://github.com/Z3Prover/z3
cd z3
python scripts/mk_make.py
cd build
make
sudo make install

Using Z3 Verification

# Build LOOM with verification support
cargo build --release --features verification

# Run with verification
./target/release/loom optimize input.wat -o output.wasm --verify

Expected output:

🔍 Running verification...
🔬 Running Z3 SMT verification...
  ✅ Z3 verification passed: optimizations are semantically equivalent
🧪 Running ISLE property-based verification...
  Running property tests...
✓ Verification: 38/38 tests passed
✓ All verification tests passed!

If verification fails, LOOM will show:

  ❌ Z3 verification failed: counterexample found
  ⚠️  The optimization may have changed program semantics!

Performance Tips

1. Use Statistics to Guide Optimization

loom optimize input.wasm -o output.wasm --stats

Look for:

  • Instruction count reduction: Fewer instructions = faster execution
  • Binary size reduction: Smaller = faster download/load
  • Constant folds: More folds = more optimization opportunities

2. Verify Critical Code

# Always verify safety-critical or production code
loom optimize production.wasm -o prod.opt.wasm --verify --stats

3. Benchmark Before and After

# Optimize
loom optimize app.wasm -o app.opt.wasm --stats

# Compare sizes
ls -lh app.wasm app.opt.wasm

# Test performance with your WebAssembly runtime

4. Iterative Optimization

LOOM's optimizations are idempotent - running optimization multiple times produces the same result:

loom optimize input.wasm -o pass1.wasm
loom optimize pass1.wasm -o pass2.wasm
# pass1.wasm and pass2.wasm should be identical

Examples

Example 1: Optimizing Math-Heavy Code

Input (math.wat):

(module
  (func $calculate (param $x i32) (result i32)
    ;; Multiply by 8 (expensive)
    local.get $x
    i32.const 8
    i32.mul

    ;; Divide by 4 (expensive)
    i32.const 4
    i32.div_u

    ;; Add constants (can be folded)
    i32.const 10
    i32.const 20
    i32.add
    i32.add
  )
)

Optimize:

loom optimize math.wat -o math.opt.wasm --stats

Output:

Instructions: 9 → 5 (44.4% reduction)
Binary size:  156 → 98 bytes (37.2% reduction)
Constant folds: 1
Optimization time: 0 ms

What happened:

  • x * 8x << 3 (strength reduction)
  • / 4>> 2 (strength reduction)
  • 10 + 2030 (constant folding)

Example 2: Optimizing with Verification

Input (critical.wat):

(module
  (func $process (param $data i32) (result i32)
    local.get $data
    i32.const 16
    i32.div_u
    i32.const 100
    i32.add
  )
)

Optimize with verification:

cargo build --release --features verification
./target/release/loom optimize critical.wat -o critical.opt.wasm --verify --stats

Output:

✓ Parsed in 234µs
⚡ Optimizing...
✓ Optimized in 0 ms

🔍 Running verification...
🔬 Running Z3 SMT verification...
  ✅ Z3 verification passed: optimizations are semantically equivalent
🧪 Running ISLE property-based verification...
✓ Verification: 38/38 tests passed

Instructions: 5 → 4 (20.0% reduction)
Binary size:  120 → 95 bytes (20.8% reduction)

Example 3: Batch Optimization

Optimize multiple files:

#!/bin/bash
for file in modules/*.wasm; do
  echo "Optimizing $file..."
  loom optimize "$file" -o "optimized/$(basename $file)" --stats
done

Example 4: CI/CD Integration

GitHub Actions workflow:

name: Optimize WebAssembly

on: [push]

jobs:
  optimize:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Install Rust
        uses: actions-rs/toolchain@v1
        with:
          toolchain: stable

      - name: Build LOOM
        run: cargo build --release

      - name: Optimize WASM modules
        run: |
          for wasm in dist/*.wasm; do
            ./target/release/loom optimize "$wasm" -o "$wasm" --stats --verify
          done

      - name: Upload optimized modules
        uses: actions/upload-artifact@v2
        with:
          name: optimized-wasm
          path: dist/*.wasm

Benchmark Results

Based on comprehensive benchmarking:

Optimization Phase Average Time Description
Constant Folding 8-11 µs Fast and effective
Strength Reduction 10 µs Consistent performance
CSE 9-14 µs Scales with complexity
Function Inlining 16-18 µs Moderate overhead
Full Pipeline 19-28 µs Complete optimization
Parser 6.8 µs Very fast
Encoder 183 ns Extremely fast

Real-World Results

Fixture Instruction Reduction Binary Size Reduction
bench_bitops 16.7% 88.5%
test_input 22.2% 81.6%
fibonacci 0% 92.6%
quicksort varies 92.5%
game_logic varies 92.5%

Key Takeaway: Binary size reductions are consistently excellent (80-93%), while instruction count improvements vary by code complexity.

Troubleshooting

"Optimization failed"

Check that your input is valid WebAssembly:

# Validate with wabt tools
wasm-validate input.wasm

# Or try parsing without optimization
loom optimize input.wasm -o test.wasm

Z3 verification not available

If you see "Z3 verification feature not enabled":

# Rebuild with verification feature
cargo clean
cargo build --release --features verification

Performance degradation

If optimized code is slower:

  1. Check instruction count - it should decrease or stay same
  2. Profile with your WebAssembly runtime
  3. Some optimizations trade instructions for code size
  4. File an issue with your test case

Getting Help

Next Steps

  • Read formal-verification.md for detailed verification info
  • Check wasm-build.md for building loom to WebAssembly
  • Explore the fixtures in tests/fixtures/ for real-world examples