Thank you for your interest in contributing to mbase! This guide will help you add new codecs, commands, and understand the codebase architecture.
- Adding a New Codec (15 minutes)
- Adding a New Command (30 minutes)
- Testing Guidelines
- Architecture Overview
Create src/codec/mynew.rs:
use super::{util, Codec};
use crate::error::{MbaseError, Result};
use crate::types::{CaseSensitivity, CodecMeta, DetectCandidate, Mode, PaddingRule};
pub struct MyNewCodec;
impl Codec for MyNewCodec {
fn meta(&self) -> CodecMeta {
CodecMeta {
name: "mynew",
aliases: &["mn", "mynewcodec"],
alphabet: "0123456789ABCDEF", // Your alphabet
multibase_code: Some('x'), // Optional multibase prefix
padding: PaddingRule::None, // Or Required/Optional
case_sensitivity: CaseSensitivity::Sensitive,
description: "My new encoding format",
}
}
fn encode(&self, input: &[u8]) -> Result<String> {
// Your encoding logic here
Ok(String::new())
}
fn decode(&self, input: &str, mode: Mode) -> Result<Vec<u8>> {
// Clean whitespace according to mode
let cleaned = util::clean_for_mode(input, mode);
// Your decoding logic here
Ok(Vec::new())
}
// validate() has a default implementation that calls decode()
// Only override if you need custom validation logic
fn detect_score(&self, input: &str) -> DetectCandidate {
let mut confidence = 0.0;
let mut reasons = Vec::new();
// Check for multibase prefix
if input.starts_with('x') {
confidence = util::confidence::MULTIBASE_MATCH; // 0.95
reasons.push("multibase prefix detected".to_string());
}
// Check alphabet match
let valid_chars = input.chars()
.filter(|c| "0123456789ABCDEF".contains(*c))
.count();
let ratio = valid_chars as f64 / input.len() as f64;
if ratio == 1.0 {
confidence = confidence.max(util::confidence::ALPHABET_MATCH); // 0.70
reasons.push("all characters valid".to_string());
} else if ratio >= 0.9 {
confidence = confidence.max(util::confidence::WEAK_MATCH); // 0.30
}
DetectCandidate {
codec: self.name().to_string(),
confidence: confidence.min(1.0),
reasons,
warnings: vec![],
}
}
}Edit src/codec/mod.rs:
// Add to module declarations (around line 15)
mod mynew;
// Codec structs are NOT exported from mod.rs
// They are registered directly in registry.rsEdit src/codec/registry.rs in the register_codecs! macro invocation (~line 47-80):
register_codecs! {
// ... existing codecs ...
mynew::MyNewCodec, // Add your codec here (alphabetical order recommended)
}That's it! The macro automatically:
- Registers your codec in the global registry
- Builds the name and alias maps
- Detects duplicate multibase codes at compile time
- Generates test expectations
cargo test
cargo run -- list # Should see your codec
cargo run -- enc --codec mynew -i "Hello"
cargo run -- detect -i "xYourEncodedData"The util module provides common helpers:
// Clean whitespace according to mode
let cleaned = util::clean_for_mode(input, mode);
// Validate alphabet (rejects invalid characters)
util::validate_alphabet(input, "0123456789", mode)?;
// With padding support
util::validate_alphabet_with_padding(input, "ABCD", true)?;Use named constants instead of magic numbers:
use super::util::confidence;
// Available constants:
confidence::MULTIBASE_MATCH // 0.95 - Has correct multibase prefix
confidence::ALPHABET_MATCH // 0.70 - All characters in alphabet
confidence::PARTIAL_MATCH // 0.50 - Partial match
confidence::WEAK_MATCH // 0.30 - Weak indicatorsMost codecs can use the default validate() implementation which calls decode():
// DEFAULT - No need to implement validate()
impl Codec for MyCodec {
// ... meta, encode, decode ...
// validate() automatically calls self.decode()
}Only override validate() if you need custom logic:
// CUSTOM - When validation differs from decode
fn validate(&self, input: &str, mode: Mode) -> Result<()> {
util::validate_alphabet(input, MY_ALPHABET, mode)?;
// Additional checks...
Ok(())
}Map external library errors to structured variants:
use crate::error::MbaseError;
// GOOD - Preserve error context
.map_err(|e| match e {
ExternalError::InvalidChar { ch, pos } => {
MbaseError::InvalidCharacter {
char: ch,
position: pos
}
},
ExternalError::BadLength(msg) => {
MbaseError::InvalidLength(msg)
},
_ => MbaseError::InvalidInput(e.to_string()),
})
// AVOID - Loses error information
.map_err(|e| MbaseError::InvalidInput(e.to_string()))Available error variants:
InvalidInput(String)- Generic validation errorInvalidCharacter { char, position }- Specific bad characterInvalidLength(String)- Wrong input lengthChecksumMismatch- Checksum validation failedIoError(io::Error)- File I/O problems
Add unit tests in your codec file:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_encode_decode_roundtrip() {
let codec = MyNewCodec;
let data = b"Hello World";
let encoded = codec.encode(data).unwrap();
let decoded = codec.decode(&encoded, Mode::Strict).unwrap();
assert_eq!(decoded, data);
}
#[test]
fn test_empty_input() {
let codec = MyNewCodec;
assert_eq!(codec.encode(&[]).unwrap(), "");
assert_eq!(codec.decode("", Mode::Strict).unwrap(), Vec::<u8>::new());
}
#[test]
fn test_lenient_mode_whitespace() {
let codec = MyNewCodec;
let encoded = "AB CD EF";
let result = codec.decode(encoded, Mode::Lenient);
assert!(result.is_ok());
}
#[test]
fn test_strict_mode_rejects_whitespace() {
let codec = MyNewCodec;
let encoded = "AB CD EF";
let result = codec.decode(encoded, Mode::Strict);
assert!(result.is_err());
}
#[test]
fn test_invalid_characters() {
let codec = MyNewCodec;
let result = codec.decode("INVALID!", Mode::Strict);
assert!(result.is_err());
}
#[test]
fn test_multibase_detection() {
let codec = MyNewCodec;
let score = codec.detect_score("xABCDEF");
assert!(score.confidence >= 0.9);
}
}Create src/commands/mynew.rs:
use crate::error::Result;
use crate::io::read_input;
use mbase::types::{Context, InputSource};
pub fn run_mynew(ctx: &Context, input: &InputSource) -> Result<String> {
let data = read_input(input)?;
// Access registry via context
let codec = ctx.registry.get("base64")?;
// Your command logic here
let output = String::from_utf8_lossy(&data).to_string();
Ok(output)
}Edit src/commands/mod.rs:
mod mynew;
pub use mynew::run_mynew;Edit src/cli.rs:
#[derive(Subcommand)]
pub enum Command {
// ... existing commands ...
#[command(about = "My new command description")]
Mynew {
#[arg(long, short = 'i', default_value = "-")]
r#in: String,
#[arg(long, short = 'o', default_value = "-")]
out: String,
},
}Edit src/commands/mod.rs to add your command struct:
pub struct MynewCommand {
pub input: InputSource,
pub output: OutputDest,
}
impl CommandHandler for MynewCommand {
fn execute(&self, ctx: &Context) -> Result<()> {
let result = run_mynew(ctx, &self.input)?;
let config = OutputConfig {
dest: self.output.clone(),
force: true,
};
write_output(result.as_bytes(), &config)?;
Ok(())
}
}Edit src/main.rs in the run() function:
fn run(cli: Cli) -> error::Result<()> {
let ctx = Context::default();
let handler: Box<dyn CommandHandler> = match cli.command {
// ... existing commands ...
Command::Mynew { r#in, out } => Box::new(commands::MynewCommand {
input: types::InputSource::parse(&r#in),
output: types::OutputDest::parse(&out),
}),
};
handler.execute(&ctx)
}Edit tests/cli.rs:
#[test]
fn test_mynew_command() {
cmd()
.arg("mynew")
.arg("-i").arg("test input")
.assert()
.success();
}# Run all tests
cargo test
# Run specific test file
cargo test --test cli
# Run tests for specific module
cargo test codec::base64
# Run with output visible
cargo test -- --nocapture
# Run integration tests only
cargo test --test '*'Your codec should have tests for:
- Roundtrip encoding/decoding - encode → decode → original data
- Empty input - Both encoding and decoding empty data
- Mode handling - Strict vs Lenient mode behavior
- Invalid input - Proper error handling
- Edge cases - Padding, leading zeros, special characters
- Detection - Confidence scoring works correctly
For robust codecs, consider property-based tests:
#[test]
fn test_arbitrary_roundtrip() {
use rand::Rng;
let mut rng = rand::thread_rng();
for _ in 0..100 {
let len = rng.gen_range(0..1000);
let data: Vec<u8> = (0..len).map(|_| rng.gen()).collect();
let codec = MyCodec;
let encoded = codec.encode(&data).unwrap();
let decoded = codec.decode(&encoded, Mode::Strict).unwrap();
assert_eq!(decoded, data);
}
}mbase/
├── src/
│ ├── main.rs # Entry point, CLI dispatch
│ ├── cli.rs # Clap CLI definitions
│ ├── error.rs # Error types, exit codes
│ ├── types.rs # Core types (Mode, CodecMeta, etc.)
│ ├── codec/
│ │ ├── mod.rs # Codec trait definition
│ │ ├── registry.rs # Global codec registry
│ │ ├── util.rs # Shared codec utilities
│ │ └── *.rs # Individual codec implementations (18 files)
│ ├── commands/
│ │ ├── mod.rs # Command exports
│ │ └── *.rs # Command implementations (9 files)
│ └── io/
│ ├── input.rs # Input reading (files, stdin, strings)
│ └── output.rs # Output writing (files, stdout)
└── tests/
├── cli.rs # Integration tests
└── codec_registration.rs # Registry verification tests
pub trait Codec: Send + Sync {
fn meta(&self) -> CodecMeta;
fn encode(&self, input: &[u8]) -> Result<String>;
fn decode(&self, input: &str, mode: Mode) -> Result<Vec<u8>>;
fn detect_score(&self, input: &str) -> DetectCandidate;
// Default implementations:
fn validate(&self, input: &str, mode: Mode) -> Result<()> {
self.decode(input, mode)?;
Ok(())
}
fn name(&self) -> &'static str {
self.meta().name
}
}pub enum Mode {
Strict, // Reject whitespace, enforce case, strict validation
Lenient, // Allow whitespace, case-insensitive, permissive
}pub enum MbaseError {
InvalidInput(String),
InvalidCharacter { char: char, position: usize },
InvalidLength(String),
ChecksumMismatch,
CodecNotFound(String),
IoError(io::Error),
// ... more variants
}The global registry uses a singleton pattern with OnceLock:
static REGISTRY: OnceLock<Registry> = OnceLock::new();
impl Registry {
pub fn global() -> &'static Registry {
REGISTRY.get_or_init(Registry::new)
}
pub fn get(&self, name: &str) -> Result<&dyn Codec> {
// Lookup by name or alias
}
}Commands receive the registry via a Context struct for testability:
pub struct Context {
pub registry: &'static Registry,
}
impl Default for Context {
fn default() -> Self {
Self { registry: Registry::global() }
}
}Used in command implementations:
pub fn run_encode(ctx: &Context, codec: &str, input: &InputSource) -> Result<String> {
let codec = ctx.registry.get(codec)?;
let data = read_input(input)?;
codec.encode(&data)
}Commands use the CommandHandler trait for uniform execution:
pub trait CommandHandler {
fn execute(&self, ctx: &Context) -> Result<()>;
}Each command is a struct implementing CommandHandler:
pub struct DetectCommand {
pub input: InputSource,
pub json: bool,
pub top: usize,
}
impl CommandHandler for DetectCommand {
fn execute(&self, ctx: &Context) -> Result<()> {
// 1. Call business logic function
let candidates = run_detect(ctx, &self.input, self.top)?;
// 2. Handle output formatting
if self.json {
println!("{}", serde_json::to_string_pretty(&candidates)?);
} else {
for candidate in candidates {
println!("{}: {:.0}%", candidate.codec, candidate.confidence * 100.0);
}
}
Ok(())
}
}- Business logic in
src/commands/*.rs- Pure functions accepting&Context - I/O and formatting in
CommandHandler::execute()- Handles JSON/text output - Dispatch in
src/main.rs- Creates command structs and callsexecute()
Example flow:
// Business logic (testable, pure)
pub fn run_detect(ctx: &Context, input: &InputSource, top: usize) -> Result<Vec<DetectCandidate>> {
let data = read_input(input)?;
let mut candidates = ctx.registry.list()
.iter()
.map(|meta| {
let codec = ctx.registry.get(meta.name).unwrap();
codec.detect_score(&data)
})
.collect::<Vec<_>>();
candidates.sort_by(|a, b| b.confidence.partial_cmp(&a.confidence).unwrap());
Ok(candidates.into_iter().take(top).collect())
}
// Command handler (I/O, formatting)
impl CommandHandler for DetectCommand {
fn execute(&self, ctx: &Context) -> Result<()> {
let candidates = run_detect(ctx, &self.input, self.top)?;
// ... formatting logic ...
}
}# Build
cargo build
# Run
cargo run -- enc --codec base64 -i "Hello"
# Test
cargo test
# Check for issues
cargo clippy
cargo fmt --check# Format code
cargo fmt
# Run all tests
cargo test --all
# Check clippy warnings
cargo clippy --all-targets
# Verify CLI works
cargo run -- list
cargo run -- enc --codec base64 -i "test"Follow conventional commits:
feat: add base93 codec support
fix: handle empty input in base64 padding
docs: update CONTRIBUTING with testing guide
refactor: extract validation utility function
test: add roundtrip tests for base58
Problem: Codec implementation exists but not registered in registry.rs
Symptom: cargo test fails in codec_registration test
Fix: Add to registry as described in step 3 above
Problem: Codec panics or returns error on empty input
Best Practice: Always handle empty input gracefully:
if input.is_empty() {
return Ok(Vec::new()); // or Ok(String::new()) for encode
}Problem: Using magic numbers like 0.7 instead of named constants
Fix: Use util::confidence::* constants
Problem: Only testing Mode::Strict, forgetting Mode::Lenient
Fix: Add tests for both modes, especially whitespace handling
Problem: Converting all errors to generic strings
Fix: Map to specific error variants when possible (see Error Handling section)
- Questions? Open an issue on GitHub
- Bug found? Open an issue with minimal reproduction
- Feature idea? Open an issue for discussion first
- Formatting: Use
cargo fmt(follows rustfmt defaults) - Linting: Address
cargo clippywarnings - Naming: Follow Rust conventions (snake_case for functions, PascalCase for types)
- Comments: Prefer self-documenting code; use comments for "why" not "what"
- Tests: Test names should describe the scenario:
test_empty_input_returns_empty_vec
By contributing, you agree that your contributions will be licensed under the project's license.