Problem: You want to run neural network inference without manually specifying tokenizer files.
Solution: Use BitNet-rs automatic tokenizer discovery to extract tokenizers from GGUF metadata.
Time Required: 5 minutes
- BitNet-rs installed (
cargo build --no-default-features --features cpu) - GGUF model file (e.g., from HuggingFace Hub)
The simplest approach lets BitNet-rs handle everything:
# Download model with embedded tokenizer
cargo run -p xtask -- download-model --id microsoft/bitnet-b1.58-2B-4T-gguf --file ggml-model-i2_s.gguf
# Run inference with automatic tokenizer discovery
cargo run -p xtask -- infer --model models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf --prompt "Explain quantization:"What Happens:
- BitNet-rs opens GGUF file with memory mapping
- Extracts tokenizer from
tokenizer.jsonortokenizer.ggml.tokensmetadata - Detects model architecture (BitNet, LLaMA, GPT-2, etc.)
- Applies model-specific tokenizer configuration
- Ready for inference
For programmatic control, use the Rust API:
use bitnet_tokenizers::{TokenizerDiscovery, TokenizerStrategyResolver};
use std::path::Path;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Automatic discovery from GGUF
let discovery = TokenizerDiscovery::from_gguf(Path::new("model.gguf"))?;
let resolver = TokenizerStrategyResolver::new(discovery).await?;
let tokenizer = resolver.resolve_with_fallback().await?;
// Use tokenizer
let tokens = tokenizer.encode("Hello, world!", true, false)?;
println!("Tokens: {:?}", tokens);
Ok(())
}See what BitNet-rs discovered:
# Verify model and show tokenizer metadata
cargo run -p xtask -- verify --model model.ggufuse bitnet_tokenizers::TokenizerDiscovery;
use std::path::Path;
fn inspect_discovery() -> Result<(), Box<dyn std::error::Error>> {
let discovery = TokenizerDiscovery::from_gguf(Path::new("model.gguf"))?;
println!("Model type: {}", discovery.model_type());
println!("Vocabulary size: {}", discovery.vocab_size());
println!("GPU optimization: {}", discovery.requires_large_vocab_optimization());
// Check discovered strategy
let strategy = discovery.discover_tokenizer_strategy()?;
println!("Strategy: {}", strategy.description());
Ok(())
}If GGUF doesn't have embedded tokenizer, BitNet-rs uses fallback chain:
# Place tokenizer.json in same directory as model.gguf
cp tokenizer.json /path/to/models/
# BitNet-rs automatically finds co-located tokenizer
cargo run -p xtask -- infer --model /path/to/models/model.gguf --prompt "Test"Fallback Order:
- Embedded: Extract from GGUF metadata
- Co-located: Search model directory for
tokenizer.json,tokenizer.model, etc. - Cache: Check
$BITNET_CACHE_DIRand HuggingFace cache - Download: Fetch from HuggingFace Hub (if online)
- Mock: Testing only (disabled in strict mode)
Enable strict mode for production to prevent mock fallbacks:
# Strict mode: fail if no real tokenizer available
export BITNET_STRICT_TOKENIZERS=1
cargo run -p xtask -- infer --model model.gguf --prompt "Production test"// Rust API: Enable strict mode
std::env::set_var("BITNET_STRICT_TOKENIZERS", "1");
let resolver = TokenizerStrategyResolver::new(discovery).await?;
let tokenizer = resolver.resolve_with_fallback().await?;
// ✅ Guaranteed real tokenizer or error# GGUF contains `tokenizer.json` string metadata
cargo run -p xtask -- infer --model llama3.gguf --prompt "Test"
# ✅ Extracts HuggingFace tokenizer automatically# GGUF contains `tokenizer.ggml.tokens` array metadata
cargo run -p xtask -- infer --model llama2.gguf --prompt "Test"
# ✅ Creates tokenizer from embedded vocabulary# Directory structure:
# /models/gpt2/
# ├── model.gguf
# └── tokenizer.json
cargo run -p xtask -- infer --model /models/gpt2/model.gguf --prompt "Test"
# ✅ Finds co-located tokenizer.json automatically# Disable downloads in air-gapped environment
export BITNET_OFFLINE=1
cargo run -p xtask -- infer --model model.gguf --prompt "Test"
# ✅ Uses only embedded, co-located, or cached tokenizersProblem: "No compatible tokenizer found (strict mode)"
Solution: Provide explicit tokenizer or disable strict mode:
cargo run -p xtask -- infer --model model.gguf --tokenizer tokenizer.json --prompt "Test"
# OR
unset BITNET_STRICT_TOKENIZERSProblem: "Vocabulary size mismatch"
Solution: GGUF metadata may be incorrect. Use explicit tokenizer:
cargo run -p xtask -- infer --model model.gguf --tokenizer correct_tokenizer.json --prompt "Test"Problem: "Download failed from HuggingFace Hub"
Solution: Enable offline mode or cache tokenizer manually:
export BITNET_OFFLINE=1
# OR
cp tokenizer.json ~/.cache/bitnet/llama/32000/- Extract Embedded Tokenizers - Manual extraction guide
- Troubleshoot Tokenizer Discovery - Advanced fallback configuration and error handling
- Environment Variables - Offline mode and strict configuration
Task Complete! You now know how to use automatic tokenizer discovery with GGUF models. ✅