Skip to content

Latest commit

 

History

History
282 lines (199 loc) · 7.78 KB

File metadata and controls

282 lines (199 loc) · 7.78 KB

Mutation Testing Implementation for Mostro

Overview

This document describes the mutation testing implementation for Mostro, a critical Rust daemon handling Bitcoin P2P trades over Lightning Network and Nostr.

What is Mutation Testing?

Mutation testing is a technique to measure the quality and effectiveness of the test suite. Unlike code coverage (which only checks if code is executed), mutation testing verifies that tests actually detect bugs.

How it works:

  1. Creating mutants: The tool makes small, controlled changes (mutations) to the source code

    • Change == to !=, + to -, true to false
    • Remove a line of code, change a condition boundary
    • Replace && with ||, > with >=
    • Replace function return value with default
  2. Running the test suite: Tests are executed against each mutated version

  3. Measuring survival rate:

    • Mutant killed: Test failed → Good! Tests detected the artificial bug
    • Mutant survived: Test passed → Bad! Tests did not catch the change

Mutation Score

Score = (Mutants killed / Total mutants) × 100
  • > 80%: Excellent test quality
  • 50-80%: Acceptable, room for improvement
  • < 50%: Poor test quality, needs immediate attention

Why Mutation Testing for Mostro?

  1. Financial security: Trading logic, escrow handling, and dispute resolution must be bulletproof
  2. Detect weak tests: Find tests that "cover" code but don't actually verify correctness
  3. Force better assertions: Encourages specific, strict assertions instead of generic ones
  4. Find edge cases: Surviving mutants often reveal untested boundary conditions
  5. CI integration: Can fail builds if mutation score drops below threshold
  6. Documentation: Living documentation of what behavior is actually tested
  7. Refactoring safety: High mutation score gives confidence when refactoring critical code

Tool Selection: cargo-mutants

We use cargo-mutants as it is the most mature mutation testing tool for Rust.

Installation

cargo install cargo-mutants

Basic Usage

# Test all mutants
cargo mutants

# Test with specific packages
cargo mutants -p mostro-core

# Test specific files
cargo mutants --file src/flow.rs

# Test with sharding (for CI parallelization)
cargo mutants --shard 1/4

# Output to specific directory
cargo mutants --output mutants.out

Configuration

The .mutants.toml file configures mutation testing behavior:

# Examine only source files
examine_globs = [
    "src/**/*.rs",
]

# Exclude generated code, tests, and non-critical modules
exclude_globs = [
    "**/target/**",
    "**/tests/**",
    "**/examples/**",
    "src/main.rs",           # Entry point, minimal logic
    "src/cli.rs",            # CLI parsing
    "src/config/**",         # Configuration loading
    "src/scheduler.rs",      # Background jobs
    "src/bitcoin_price.rs",  # External API calls
    "src/rpc/**",            # RPC server
]

# Timeout for each mutant (seconds)
# Mostro tests involve DB operations, so we need generous timeout
timeout = 600

# Number of parallel jobs
jobs = 4

# Output directory
output_dir = "mutants.out"

# Additional cargo test arguments
test_tool_options = ["--", "--test-threads=1"]

CI/CD Integration

Mutation testing runs in CI on every PR and on main branch:

  1. PR workflow: Runs mutation testing on changed files only (faster feedback)
  2. Main branch: Runs full mutation testing weekly (baseline tracking)
  3. Release gate: Mutation score must not decrease from previous release

Initial Setup (Non-blocking)

Initially, mutation testing runs in "report only" mode:

  • Results are uploaded as artifacts
  • CI does NOT fail on low mutation score
  • Team reviews results and improves tests incrementally

Phase 2 (Enforcing)

Once baseline is established:

  • CI fails if mutation score drops below threshold
  • New code must maintain or improve mutation score

Priority Areas

Critical modules that should have high mutation scores (>80%):

Module Priority Rationale
src/flow.rs Critical Order state transitions, trade logic
src/db.rs Critical Database operations, state persistence
src/util.rs Critical Utility functions used across codebase
src/nip33.rs High Nostr event tagging
src/lnurl.rs High LNURL handling
src/messages.rs Medium Message formatting
src/models.rs Medium Data models

Implementation Phases

Phase 1: Infrastructure (This PR)

  • Install and configure cargo-mutants
  • Create .mutants.toml configuration
  • Add mutation testing CI workflow
  • Document the strategy

Phase 2: Baseline Assessment

  • Run full mutation testing on current codebase
  • Document current mutation score per module
  • Identify "survivors" (mutants that passed tests)
  • Prioritize critical modules

Phase 3: Critical Module Improvements

  • Improve tests for src/flow.rs (target: 80%)
  • Improve tests for src/db.rs (target: 80%)
  • Improve tests for src/util.rs (target: 80%)

Phase 4: Enable Enforcement

  • Set minimum mutation score threshold in CI
  • Fail builds if score drops
  • Track score trends over time

Interpreting Results

Example Output

INFO Found 245 mutants to test
INFO 189 mutants killed (77.1%)
INFO 56 mutants survived (22.9%)
INFO 0 mutants timed out
INFO Mutation score: 77.1%

Analyzing Survivors

Each surviving mutant represents a potential gap in testing:

INFO src/flow.rs:245:9: replace Order::validate -> bool with true

This mutant replaced the validate method with return true, and tests still passed. This means:

  • Either the validation logic is not tested
  • Or tests don't verify validation failures

Fixing Survivors

Add tests that would catch the mutation:

#[test]
fn test_order_validation_rejects_invalid() {
    let order = Order::new(/* invalid data */);
    assert!(!order.validate()); // This would catch the mutant
}

Running Locally

# Quick check (mutants only in changed files)
./scripts/mutation-test.sh quick

# Full run (takes ~30-60 min)
./scripts/mutation-test.sh full

# Specific module
./scripts/mutation-test.sh file src/flow.rs

# Show results summary
./scripts/mutation-test.sh report

# Or run cargo-mutants directly:
cargo mutants --file src/flow.rs --output mutants.out

Performance Considerations

Mutation testing is computationally expensive:

  • Full run: ~30-60 minutes depending on hardware
  • Each mutant requires a full test suite run
  • Use sharding for parallelization in CI
  • Start with critical modules only

Troubleshooting

Timeout Issues

If mutants timeout, increase the timeout in .mutants.toml:

timeout = 900  # 15 minutes

False Positives

Some mutations may be equivalent (changing code that doesn't affect behavior). Add to exclude list:

exclude_re = [
    "replace .*::default\\(\\) -> Self with",  # Default impls often equivalent
]

Build Failures

If mutation causes compilation errors (not test failures), cargo-mutants should handle this automatically. If not:

# Exclude files with heavy macros
exclude_globs = [
    "src/proto/**",
]

References

Checklist for This Implementation

  • .mutants.toml created with appropriate configuration
  • CI workflow added for mutation testing
  • Documentation written (docs/MUTATION_TESTING.md)
  • Non-blocking in CI (report-only mode initially)
  • Baseline mutation score documented (Phase 2)
  • Tests improved for critical survivors (Phase 3)