Audience: Developers and researchers working with GGUF models who need to ensure model quality and catch quantization errors before deployment.
Goal: Learn the complete 3-stage validation workflow to verify GGUF models have correct LayerNorm weights and healthy projection weights.
BitNet-rs provides a comprehensive validation system to catch common model export issues:
- Quantized LayerNorm weights: LayerNorm gamma weights quantized to I2_S/Q4 (should be F16/F32)
- Incorrect projection scales: Inverted I2_S dequantization or corrupted weight scales
- Tokenizer mismatches: Wrong tokenizer causing gibberish outputs
- Export corruption: Metadata errors or tensor misalignment
The validation system uses a 3-stage pipeline:
- LayerNorm & Projection RMS Check: Architecture-aware statistical validation
- Model Loading Check: Verify weights load correctly with healthy RMS values
- Linguistic Sanity Check: Greedy inference produces coherent output
# Validate with automatic architecture detection
./scripts/validate_gguf.sh \
models/bitnet-2b.gguf \
models/tokenizer.json
# Output:
# ===================================================
# 1/3: LayerNorm and Projection Weight Statistics Check
# ===================================================
# ✅ LN RMS gate passed (bitnet-b1.58:f16)
# ✅ Projection RMS gate passed
#
# ===================================================
# 2/3: Projection Weight RMS Check (via model loading)
# ===================================================
# ✅ Projection weights loaded
#
# ===================================================
# 3/3: Greedy Inference Probe
# ===================================================
# ✅ Output contains recognizable words
#
# ✅✅✅ ALL VALIDATION CHECKS PASSED ✅✅✅# Export F16 GGUF with LayerNorm preservation
./scripts/export_clean_gguf.sh \
models/safetensors-checkpoint \
models/tokenizer.json \
models/clean
# Validate the exported model
./scripts/validate_gguf.sh \
models/clean/clean-f16.gguf \
models/tokenizer.jsonAfter the model and tokenizer are present locally, use the strict CPU proof command to check the user-facing CLI path without mock or minimal-loader ambiguity:
BITNET_DISABLE_MINIMAL_LOADER=1 \
BITNET_STRICT_MODE=1 \
RUST_LOG=warn \
cargo run --locked -p bitnet-cli --no-default-features --features cpu,full-cli -- run \
--model models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf \
--tokenizer models/microsoft-bitnet-b1.58-2B-4T-gguf/tokenizer.json \
--strict-loader \
--strict-tokenizer \
--prompt "Answer with a single digit: 2+2=" \
--max-tokens 1 \
--temperature 0.0 \
--greedy \
--json-out target/cpu-proof.jsonThis is a strict smoke/proof command, not a model-quality or performance claim. The JSON artifact must distinguish the enhanced loader from compatibility fallback. Receipt validation, kernel IDs, and throughput claims are handled by follow-up CPU proof items.
Purpose: Detect quantized LayerNorm weights and projection weight anomalies using architecture-aware statistical validation.
What it checks:
- LayerNorm gamma RMS values are in expected envelope (architecture-specific)
- Projection weight RMS values are reasonable for the model format
- Uses pattern-based thresholds tailored to BitNet b1.58 F16/I2_S or LLaMA-style models
How to run:
# Auto-detect architecture from GGUF metadata
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats --gate auto \
models/model.gguf
# With strict mode (fail on warnings)
BITNET_STRICT_MODE=1 \
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats --gate auto \
models/model.ggufExpected output (healthy model):
model_sha256: a1b2c3d4e5f6...
ruleset: bitnet-b1.58:f16
blk.0.attn_norm.weight [LN] rms=0.9523 ✅
blk.0.ffn_norm.weight [LN] rms=0.0847 ✅
blk.1.attn_norm.weight [LN] rms=0.9412 ✅
blk.1.ffn_norm.weight [LN] rms=0.0851 ✅
...
output_norm.weight [LN] rms=0.9998 ✅
blk.0.attn_q.weight [PROJ] rms=0.0214 ✅
blk.0.attn_k.weight [PROJ] rms=0.0218 ✅
blk.0.attn_v.weight [PROJ] rms=0.0216 ✅
...
✅ LN RMS gate passed (bitnet-b1.58:f16)
✅ Projection RMS gate passed (bitnet-b1.58:f16)
Expected output (quantized LayerNorm - BAD):
model_sha256: f9e8d7c6b5a4...
ruleset: bitnet-b1.58:f16
blk.0.attn_norm.weight [LN] rms=0.0127 ❌
blk.0.ffn_norm.weight [LN] rms=0.0093 ❌
blk.1.attn_norm.weight [LN] rms=0.0131 ❌
...
❌ LN RMS gate failed: 24/24 out of envelope (bitnet-b1.58:f16)
ERROR: Model has suspicious LayerNorm weights (quantized or corrupted).
Recommendation: Regenerate GGUF with LayerNorm weights in float format (F16/F32).
See docs/howto/export-clean-gguf.md for proper export workflow.
Exit codes:
0: All checks passed8(EXIT_LN_SUSPICIOUS): LayerNorm or projection validation failed in strict mode
See also: Validation Gates Reference for detailed threshold definitions.
Purpose: Verify weights load correctly and have healthy RMS values during actual model initialization.
What it checks:
- All projection weights (Q/K/V/O, FFN gate/up/down) load successfully
- RMS values are in expected range (typically O(10³) for quantized weights)
- No NaN/Inf values in loaded tensors
- I2_S dequantization produces reasonable scales
How to run:
# Enable RUST_LOG=info to see projection RMS values
RUST_LOG=info \
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
run --model models/model.gguf --tokenizer models/tokenizer.json \
--prompt "Warmup." --max-new-tokens 1 --temperature 0.0Expected output:
INFO PROJ load: blk.0.attn_q.weight RMS=0.0214 (inv=false)
INFO PROJ load: blk.0.attn_k.weight RMS=0.0218 (inv=false)
INFO PROJ load: blk.0.attn_v.weight RMS=0.0216 (inv=false)
INFO PROJ load: blk.0.attn_o.weight RMS=0.0219 (inv=false)
INFO PROJ load: blk.0.ffn_gate.weight RMS=0.0201 (inv=false)
INFO PROJ load: blk.0.ffn_up.weight RMS=0.0198 (inv=false)
INFO PROJ load: blk.0.ffn_down.weight RMS=0.0203 (inv=false)
...
Warning signs:
# Extremely high RMS (inverted scales?)
INFO PROJ load: blk.0.attn_q.weight RMS=150.3 (inv=false) ⚠️
# Wildly different RMS values (corruption?)
INFO PROJ load: blk.0.attn_q.weight RMS=0.02 (inv=false)
INFO PROJ load: blk.0.attn_k.weight RMS=100.5 (inv=false) ⚠️
Exit codes:
0: Model loaded successfully1: Model loading failed (missing tensors, format errors, etc.)
Purpose: Ensure the model produces coherent output, not gibberish or tied logits.
What it checks:
- Greedy deterministic inference produces recognizable words
- Output contains at least one word with 3+ ASCII letters
- No immediate tokenizer decode errors
- Model doesn't repeat same token indefinitely
How to run:
# Deterministic greedy inference
BITNET_DETERMINISTIC=1 \
BITNET_SEED=42 \
RAYON_NUM_THREADS=1 \
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
run --model models/model.gguf --tokenizer models/tokenizer.json \
--prompt "The capital of France is" \
--max-new-tokens 8 \
--temperature 0.0Expected output (healthy):
The capital of France is Paris.
Warning signs (issues):
# Gibberish (quantized LayerNorm or wrong tokenizer)
The capital of France is █▓▒░█▓▒
# Repetition (tied logits or attention collapse)
The capital of France is the the the the the the
# Empty or decode errors (tokenizer mismatch)
The capital of France is
Exit codes:
0: Linguistic sanity check passed1: Inference failed to run- Non-zero: Check logs for specific failure mode
Automatically selects validation rules based on GGUF metadata:
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats --gate auto \
models/model.ggufAuto-detection logic:
| Architecture | File Type | Selected Ruleset | LayerNorm Envelope | Projection Envelope |
|---|---|---|---|---|
bitnet or b1.58 |
1 (F16) |
bitnet-b1.58:f16 |
Pattern-based (0.05-2.0 typical) | [0.01, 0.40] |
bitnet or b1.58 |
Other (quantized) | bitnet-b1.58:i2_s |
Pattern-based (0.01-2.0 typical) | [0.002, 0.20] |
| Other | Any | generic |
[0.80, 1.20] | None |
When to use:
- ✅ Standard BitNet b1.58 models (F16 or I2_S)
- ✅ LLaMA/Mistral/standard RMSNorm architectures
- ✅ CI/CD pipelines requiring deterministic validation
- ✅ When you trust your GGUF metadata is correct
Environment variables:
# Set auto mode via environment
export BITNET_VALIDATION_GATE=auto
cargo run -p bitnet-cli -- inspect --ln-stats model.ggufUse custom validation policies for non-standard architectures:
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats \
--gate policy \
--policy examples/policies/custom-model.yml \
--policy-key my-model:f16 \
models/model.ggufWhen to use:
- ✅ Custom or experimental architectures
- ✅ Models with unusual LayerNorm patterns
- ✅ Overriding auto-detection for specific requirements
- ✅ Testing new policy definitions
Example policy file:
version: 1
rules:
my-model:f16:
name: "My Custom Model F16"
ln:
- pattern: "attn_norm\\.weight$"
min: 0.85
max: 1.15
description: "Attention LayerNorm (observed RMS ~0.92-1.05)"
- pattern: "ffn_norm\\.weight$"
min: 0.35
max: 0.60
description: "FFN LayerNorm (architectural low gamma)"
proj_weight_rms_min: 0.015
proj_weight_rms_max: 0.35Environment variables:
export BITNET_VALIDATION_GATE=policy
export BITNET_VALIDATION_POLICY=examples/policies/custom-model.yml
export BITNET_VALIDATION_POLICY_KEY=my-model:f16
cargo run -p bitnet-cli -- inspect --ln-stats model.ggufSee also: Policy Examples README for creating custom policies.
Disable validation entirely:
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats --gate none \
models/model.ggufWhen to use:
⚠️ Debugging validation system itself⚠️ Experimental models where validation rules don't exist yet⚠️ Testing inference without validation overhead
Warning: This disables important safety checks. Only use for development.
Scenario: You have a GGUF model from Hugging Face or a third-party export tool and need to verify it's valid for inference.
Steps:
# 1. Inspect LayerNorm and projection statistics
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats --gate auto \
models/model.gguf
# 2. Run full 3-stage validation
./scripts/validate_gguf.sh \
models/model.gguf \
models/tokenizer.json
# 3. If validation passes, model is ready for use
cargo run -p bitnet-cli --no-default-features --features cpu -- \
run --model models/model.gguf --tokenizer models/tokenizer.json \
--prompt "Your prompt here"If validation fails:
See Troubleshooting section below.
Scenario: You have a SafeTensors checkpoint (from training or fine-tuning) and need to create a validated GGUF.
Steps:
# 1. Export to F16 GGUF with LayerNorm preservation
./scripts/export_clean_gguf.sh \
models/safetensors-checkpoint \
models/tokenizer.json \
models/clean
# Output:
# INFO: Using Rust st2gguf converter
# INFO: Converting SafeTensors to GGUF (F16 output, LayerNorm preserved)...
# ✅ Export complete!
# Output: models/clean/clean-f16.gguf
# Fingerprint: sha256-abc123...
# 2. Validate the exported GGUF
./scripts/validate_gguf.sh \
models/clean/clean-f16.gguf \
models/tokenizer.json
# 3. If validation passes, you're done!
# If validation fails, check export logs and retryAdvanced: Use Rust st2gguf directly
# Build st2gguf converter
cargo build --release -p bitnet-st2gguf
# Convert with strict validation
target/release/st2gguf \
--input models/checkpoint.safetensors \
--output models/clean-f16.gguf \
--config models/config.json \
--strict
# Validate
./scripts/validate_gguf.sh \
models/clean-f16.gguf \
models/tokenizer.jsonSee also: Export Clean GGUF Guide for detailed export instructions.
Scenario: You have a custom or experimental architecture that doesn't match BitNet b1.58 or standard LLaMA patterns.
Steps:
# 1. Inspect LayerNorm statistics to understand patterns
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats --gate none \
models/custom-model.gguf > ln-stats.txt
# Review the output to identify RMS patterns
cat ln-stats.txt
# 2. Create custom policy based on observed patterns
cp examples/policies/custom-model-example.yml my-model-policy.yml
nano my-model-policy.yml
# Define rules based on your inspection:
# - LayerNorm patterns and RMS envelopes
# - Projection weight RMS ranges
# - Architecture-specific quirks
# 3. Validate with custom policy
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats \
--gate policy \
--policy my-model-policy.yml \
--policy-key my-model:f16 \
models/custom-model.gguf
# 4. Run linguistic sanity check
BITNET_DETERMINISTIC=1 BITNET_SEED=42 \
cargo run -p bitnet-cli --no-default-features --features cpu -- \
run --model models/custom-model.gguf --tokenizer models/tokenizer.json \
--prompt "Test prompt" --max-new-tokens 32 --temperature 0.0
# 5. If output is coherent, commit your policy
git add my-model-policy.yml
git commit -m "feat(validation): add policy for my-model architecture"See also: Policy Examples README for policy creation guide.
Scenario: You have a known-bad model (quantized LayerNorm) and need to unblock inference development while waiting for proper GGUF regeneration.
Steps:
# 1. Diagnose the issue
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats --gate auto \
models/bad-model.gguf
# Output:
# ❌ LN RMS gate failed: 24/24 out of envelope
# blk.0.attn_norm.weight RMS=0.0127 [SUSPICIOUS - expected ~1.0]
# 2. Create correction policy (see docs/explanation/correction-policy.md)
nano config/correction-policy.yml
# Example correction policy:
# version: 1
# models:
# - fingerprint: "sha256-abc123..."
# corrections:
# - type: LN_GAMMA_RESCALE_RMS
# target_rms: 1.0
# 3. Enable runtime corrections (DEVELOPMENT ONLY)
export BITNET_CORRECTION_POLICY=./config/correction-policy.yml
export BITNET_ALLOW_RUNTIME_CORRECTIONS=1
export BITNET_DETERMINISTIC=1
export BITNET_SEED=42
# 4. Run inference with corrections
cargo run -p bitnet-cli --no-default-features --features cpu -- \
run --model models/bad-model.gguf --tokenizer models/tokenizer.json \
--prompt "Test prompt"
# 5. IMPORTANT: Regenerate clean GGUF for production use
./scripts/export_clean_gguf.sh \
models/original-checkpoint \
models/tokenizer.json \
models/clean
# 6. Validate clean GGUF and retire correction policy
unset BITNET_CORRECTION_POLICY BITNET_ALLOW_RUNTIME_CORRECTIONS
./scripts/validate_gguf.sh models/clean/clean-f16.gguf models/tokenizer.jsonSee also: Correction Policy Documentation for detailed correction workflow.
Symptom:
❌ LN RMS gate failed: 24/24 out of envelope (bitnet-b1.58:f16)
blk.0.attn_norm.weight RMS=0.0127 [SUSPICIOUS - expected ~1.0]
Root Cause: LayerNorm gamma weights were quantized during export (should be F16/F32).
Solutions:
-
Best solution: Regenerate GGUF with LayerNorm weights in float format
# Using Rust st2gguf (automatic LayerNorm preservation) cargo run --release -p bitnet-st2gguf -- \ --input models/checkpoint.safetensors \ --output models/clean-f16.gguf \ --strict # Validate ./scripts/validate_gguf.sh models/clean-f16.gguf models/tokenizer.json
-
Temporary workaround: Use correction policy (development only)
See Workflow 4 above.
-
Alternative: Check if you're using the wrong policy
# BitNet I2_S models legitimately have low attn_norm RMS (~0.01-0.02) # Use correct policy for quantized models cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \ inspect --ln-stats \ --policy examples/policies/bitnet-b158-i2s-quantized.yml \ --policy-key bitnet-b1.58:i2_s \ models/model-i2s.gguf
Symptom:
⚠️ WARNING: suspicious projection weights detected (6/144 tensors)
blk.0.attn_q.weight RMS=150.3 [OUT OF RANGE: expected 0.01-0.40]
Root Cause: I2_S dequantization scales are inverted or weights are corrupted.
Solutions:
-
Inspect RMS distribution:
RUST_LOG=info \ cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \ run --model model.gguf --tokenizer tokenizer.json \ --prompt "Test" --max-new-tokens 1 2>&1 | grep "PROJ load"
Look for patterns:
- Q/K/V have very high RMS (~100-150) but FFN is normal (~0.8-1.0) → Inverted scales
- All projections have similar anomalous RMS → Export corruption
- Single layer has issues → Layer-specific corruption
-
Re-export from source checkpoint:
./scripts/export_clean_gguf.sh \ models/source-checkpoint \ models/tokenizer.json \ models/clean
-
Check GGUF metadata:
# Verify file_type matches actual quantization cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \ inspect --metadata model.gguf
Symptom:
The capital of France is █▓▒░█▓▒░▓▒
⚠️ Output does not contain recognizable words
Root Causes and Solutions:
-
Tokenizer mismatch:
# Try different tokenizer cargo run -p bitnet-cli --no-default-features --features cpu -- \ run --model model.gguf --tokenizer different-tokenizer.json \ --prompt "The capital of France is"
-
Quantized LayerNorm:
See LayerNorm RMS Validation Failed above.
-
RoPE parameter mismatch:
Check
config.jsonfor RoPE settings:rope_theta(base frequency)rope_scaling(scaling factors)- Verify they match model training configuration
-
Model corruption:
# Check SHA256 fingerprint sha256sum model.gguf # Re-download or re-export if hash doesn't match
Symptom:
Error: policy key not found: my-model:f16
Solutions:
-
List available policy keys:
# View policy file structure cat examples/policies/custom-model-example.yml # Look for keys under "rules:" section # Example: # rules: # bitnet-b1.58:f16: ← This is the policy key # name: "BitNet b1.58 F16"
-
Use correct key format:
# Key format: architecture:variant cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \ inspect --ln-stats \ --policy examples/policies/bitnet-b158-f16-clean.yml \ --policy-key bitnet-b1.58:f16 \ model.gguf -
Create missing policy:
See Workflow 3: Validate Custom Architecture above.
Purpose: Examine LayerNorm and projection weight statistics with architecture-aware validation.
Syntax:
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats \
[--gate none|auto|policy] \
[--policy PATH] \
[--policy-key KEY] \
[--json] \
MODELArguments:
| Argument | Required | Description |
|---|---|---|
MODEL |
Yes | Path to GGUF model file |
--ln-stats |
Yes | Enable LayerNorm statistics analysis |
--gate |
No | Validation mode: none, auto, policy (default: auto) |
--policy |
No | Path to YAML policy file (required for gate=policy) |
--policy-key |
No | Policy key for rules lookup (default: uses architecture from GGUF) |
--json |
No | Output results as JSON |
Examples:
# Auto-detect architecture
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats --gate auto model.gguf
# Use custom policy
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats \
--gate policy \
--policy examples/policies/custom.yml \
--policy-key my-model:f16 \
model.gguf
# JSON output for CI
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats --gate auto --json model.gguf > validation.jsonExit Codes:
| Code | Name | Description |
|---|---|---|
0 |
EXIT_SUCCESS |
All validation checks passed |
8 |
EXIT_LN_SUSPICIOUS |
LayerNorm or projection validation failed (strict mode only) |
See also: Validation Gates Reference for technical details.
Purpose: Run complete 3-stage validation pipeline (LayerNorm, projection, linguistic sanity).
Syntax:
./scripts/validate_gguf.sh MODEL TOKENIZERArguments:
| Argument | Required | Description |
|---|---|---|
MODEL |
Yes | Path to GGUF model file |
TOKENIZER |
Yes | Path to tokenizer.json file |
Examples:
# Basic validation
./scripts/validate_gguf.sh \
models/bitnet-2b.gguf \
models/tokenizer.json
# Validation in CI (exit code check)
./scripts/validate_gguf.sh model.gguf tokenizer.json
if [ $? -ne 0 ]; then
echo "Validation failed - model did not pass checks"
exit 1
fiExit Codes:
| Code | Description |
|---|---|
0 |
All validation checks passed |
10 |
LayerNorm validation failed |
13 |
Model loading failed |
14 |
Inference probe failed |
15 |
Linguistic sanity check failed |
Purpose: Convert SafeTensors to clean F16 GGUF with LayerNorm preservation.
Syntax:
./scripts/export_clean_gguf.sh MODEL_DIR TOKENIZER OUT_DIRArguments:
| Argument | Required | Description |
|---|---|---|
MODEL_DIR |
Yes | Directory containing SafeTensors or HF checkpoint |
TOKENIZER |
Yes | Path to tokenizer.json file |
OUT_DIR |
Yes | Output directory for GGUF |
Environment Variables:
| Variable | Values | Description |
|---|---|---|
CONVERTER |
Path or rust/st2gguf |
Override converter selection |
STRICT |
1 |
Enable strict validation in st2gguf |
Examples:
# Export with automatic converter selection
./scripts/export_clean_gguf.sh \
models/safetensors-checkpoint \
models/tokenizer.json \
models/clean
# Force Rust st2gguf converter with strict validation
CONVERTER=rust STRICT=1 \
./scripts/export_clean_gguf.sh \
models/checkpoint \
models/tokenizer.json \
models/cleanOutput Files:
| File | Description |
|---|---|
clean-f16.gguf |
Main GGUF model (F16 precision) |
clean-f16.fingerprint |
SHA256 fingerprint (sha256-...) |
clean-f16.meta.json |
Export metadata (source, date, converter, etc.) |
See also: Export Clean GGUF Guide for detailed export documentation.
Receipts generated during CPU inference must contain CPU quantized kernels to ensure honest computation claims.
Valid CPU Kernel Patterns:
- I2S:
i2s_gemv,i2s_matmul_*,quantized_matmul_i2s - TL1 (ARM NEON):
tl1_neon_*,tl1_lookup_* - TL2 (x86 AVX):
tl2_avx_*,tl2_avx512_*
Rejected Fallback Patterns:
- Dequantization:
dequant_*,dequant_i2s_to_fp32 - FP32 Computation:
fp32_matmul,fp32_gemm - Generic Fallback:
fallback_*,scalar_* - Mock/Test:
mock_*,test_stub
Validation Commands:
# Run benchmark and generate receipt
cargo run -p xtask -- benchmark --model model.gguf --tokens 128
# Verify receipt contains CPU quantized kernels
cargo run -p xtask -- verify-receipt ci/inference.json
# The validator checks:
# - backend="cpu" requires CPU quantized kernel IDs
# - Rejects dequant_*, fp32_*, fallback_* patterns
# - Uses starts_with matching (not contains) to prevent false positives
# - Detects silent CPU fallback when backend claims GPU but uses CPU kernelsExample Valid CPU Receipt:
{
"schema_version": "1.0.0",
"backend": "cpu",
"compute_path": "real",
"kernels": [
"i2s_gemv",
"tl1_neon_matmul",
"quantized_matmul_i2s"
],
"tokens_per_second": 18.5,
"tokens_generated": 128
}Example Invalid CPU Receipt (Fallback Detected):
{
"schema_version": "1.0.0",
"backend": "cpu",
"compute_path": "real",
"kernels": [
"dequant_i2s_to_fp32", // ❌ Dequantization fallback
"fp32_matmul" // ❌ FP32 computation
],
"tokens_per_second": 18.5,
"tokens_generated": 128
}Exit Codes:
| Code | Description |
|---|---|
0 |
Receipt validation passed |
1 |
Receipt validation failed (fallback kernels detected) |
See also: Issue #462 for receipt CPU validation implementation (88% mutation testing score).
| Variable | Values | Default | Description |
|---|---|---|---|
BITNET_VALIDATION_GATE |
none, auto, policy |
auto |
Validation gate mode |
BITNET_VALIDATION_POLICY |
Path | None | Policy file path (for gate=policy) |
BITNET_VALIDATION_POLICY_KEY |
String | Architecture | Policy key (for gate=policy) |
BITNET_STRICT_MODE |
0, 1 |
0 |
Enable strict validation (fail on warnings) |
| Variable | Values | Default | Description |
|---|---|---|---|
BITNET_DETERMINISTIC |
0, 1 |
0 |
Enable deterministic inference |
BITNET_SEED |
Integer | Random | Random seed (requires BITNET_DETERMINISTIC=1) |
RAYON_NUM_THREADS |
Integer | Auto | Thread count for parallel operations |
RUST_LOG |
error, warn, info, debug, trace |
error |
Logging level |
| Variable | Values | Default | Description |
|---|---|---|---|
BITNET_CORRECTION_POLICY |
Path | None | Correction policy file path |
BITNET_ALLOW_RUNTIME_CORRECTIONS |
0, 1 |
0 |
Enable runtime corrections (dev only) |
BITNET_ALLOW_RUNTIME_CORRECTIONS=1 to prevent production deployment with workarounds.
# .github/workflows/validate-model.yml
name: Validate GGUF Model
on:
push:
paths:
- 'models/**/*.gguf'
workflow_dispatch:
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Build BitNet-rs CLI
run: |
cargo build --release -p bitnet-cli \
--no-default-features --features cpu,full-cli
- name: Run Validation
run: |
./scripts/validate_gguf.sh \
models/model.gguf \
models/tokenizer.json
- name: Check for corrections (security gate)
run: |
if [ -n "$BITNET_CORRECTION_POLICY" ] || \
[ -n "$BITNET_ALLOW_RUNTIME_CORRECTIONS" ]; then
echo "ERROR: Runtime correction flags detected"
echo "Corrections are dev-only - regenerate clean GGUF"
exit 1
fi-
Always use strict mode:
BITNET_STRICT_MODE=1 ./scripts/validate_gguf.sh model.gguf tokenizer.json
-
Block correction flags:
# Fail CI if correction flags are set if [ -n "$BITNET_ALLOW_RUNTIME_CORRECTIONS" ]; then echo "ERROR: Corrections not allowed in CI" exit 1 fi
-
Validate on every model change:
on: push: paths: - 'models/**/*.gguf' - 'models/**/tokenizer.json'
-
Archive validation reports:
- name: Upload validation report uses: actions/upload-artifact@v3 with: name: validation-report path: validation.json
A:
- Validation policies define acceptable RMS ranges for weights (example:
examples/policies/bitnet-b158-f16-clean.yml) - Correction policies provide runtime fixes for known-bad models (example: see
docs/explanation/correction-policy.md)
Validation policies are used during inspection to check if a model is healthy. Correction policies are temporary workarounds to fix known defects at runtime (dev only).
A:
Use strict mode in:
- ✅ CI/CD pipelines
- ✅ Production validation
- ✅ Release qualification
- ✅ When you need zero-tolerance for warnings
Skip strict mode when:
⚠️ Debugging validation rules⚠️ Working with experimental models⚠️ You understand the warnings and accept the risk
A:
No. LayerNorm weights should never be quantized, even in I2_S models. The validator uses architecture-specific rules that account for legitimate RMS variations in quantized models (e.g., BitNet I2_S attn_norm legitimately has RMS ~0.01-0.02).
If validation fails, either:
- Regenerate GGUF with LayerNorm weights in float format, or
- Create a custom policy if your architecture legitimately has unusual RMS values
A:
Use explicit policy mode:
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
inspect --ln-stats \
--gate policy \
--policy examples/policies/correct-policy.yml \
--policy-key correct-key \
model.ggufIf auto-detection is consistently wrong, file an issue with:
- GGUF metadata (
cargo run -p bitnet-cli -- inspect --metadata model.gguf) - Expected architecture
- Inspection output
A:
- Test on multiple checkpoints: Validate 3-5 models from different training stages
- Check false positives: Policy should not reject healthy models
- Check false negatives: Policy should catch known-bad models
- Compare with reference: Use known-good model as baseline
- Test inference: Models passing validation should produce coherent output
See Workflow 3 for detailed custom policy creation.
- Export Clean GGUF Guide: How to create clean GGUF models with proper LayerNorm format
- Validation Gates Reference: Technical details on validation system architecture
- Correction Policy Documentation: Runtime correction system for known-bad models
- Policy Examples README: Example policies and creation guide
- Build Commands Reference: CLI build instructions with
full-clifeature - CLAUDE.md Quick Reference: Quick command reference and troubleshooting
| Task | Command | Purpose |
|---|---|---|
| Validate existing GGUF | ./scripts/validate_gguf.sh model.gguf tokenizer.json |
3-stage validation pipeline |
| Inspect LayerNorm stats | cargo run -p bitnet-cli -- inspect --ln-stats --gate auto model.gguf |
Architecture-aware RMS validation |
| Export clean GGUF | ./scripts/export_clean_gguf.sh checkpoint/ tokenizer.json output/ |
Convert SafeTensors to F16 GGUF |
| Custom policy validation | cargo run -p bitnet-cli -- inspect --ln-stats --gate policy --policy policy.yml model.gguf |
Validate with custom rules |
| Strict mode validation | BITNET_STRICT_MODE=1 cargo run -p bitnet-cli -- inspect --ln-stats model.gguf |
Fail on warnings |
Key Principle: Always validate models before deployment. Clean models must pass all 3 stages without corrections or workarounds.
For questions or issues, see:
- GitHub Issues: BitNet-rs/issues
- Documentation Index:
docs/directory - Quick Reference: CLAUDE.md