How to Validate BitNet Models

Audience: Developers and researchers working with GGUF models who need to ensure model quality and catch quantization errors before deployment.

Goal: Learn the complete 3-stage validation workflow to verify GGUF models have correct LayerNorm weights and healthy projection weights.

Overview

BitNet-rs provides a comprehensive validation system to catch common model export issues:

Quantized LayerNorm weights: LayerNorm gamma weights quantized to I2_S/Q4 (should be F16/F32)
Incorrect projection scales: Inverted I2_S dequantization or corrupted weight scales
Tokenizer mismatches: Wrong tokenizer causing gibberish outputs
Export corruption: Metadata errors or tensor misalignment

The validation system uses a 3-stage pipeline:

LayerNorm & Projection RMS Check: Architecture-aware statistical validation
Model Loading Check: Verify weights load correctly with healthy RMS values
Linguistic Sanity Check: Greedy inference produces coherent output

Quick Start

Validate an Existing GGUF

# Validate with automatic architecture detection
./scripts/validate_gguf.sh \
  models/bitnet-2b.gguf \
  models/tokenizer.json

# Output:
# ===================================================
# 1/3: LayerNorm and Projection Weight Statistics Check
# ===================================================
# ✅ LN RMS gate passed (bitnet-b1.58:f16)
# ✅ Projection RMS gate passed
#
# ===================================================
# 2/3: Projection Weight RMS Check (via model loading)
# ===================================================
# ✅ Projection weights loaded
#
# ===================================================
# 3/3: Greedy Inference Probe
# ===================================================
# ✅ Output contains recognizable words
#
# ✅✅✅ ALL VALIDATION CHECKS PASSED ✅✅✅

Convert SafeTensors to Clean GGUF

# Export F16 GGUF with LayerNorm preservation
./scripts/export_clean_gguf.sh \
  models/safetensors-checkpoint \
  models/tokenizer.json \
  models/clean

# Validate the exported model
./scripts/validate_gguf.sh \
  models/clean/clean-f16.gguf \
  models/tokenizer.json

Strict CPU Proof Smoke

After the model and tokenizer are present locally, use the strict CPU proof command to check the user-facing CLI path without mock or minimal-loader ambiguity:

BITNET_DISABLE_MINIMAL_LOADER=1 \
BITNET_STRICT_MODE=1 \
RUST_LOG=warn \
cargo run --locked -p bitnet-cli --no-default-features --features cpu,full-cli -- run \
  --model models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf \
  --tokenizer models/microsoft-bitnet-b1.58-2B-4T-gguf/tokenizer.json \
  --strict-loader \
  --strict-tokenizer \
  --prompt "Answer with a single digit: 2+2=" \
  --max-tokens 1 \
  --temperature 0.0 \
  --greedy \
  --json-out target/cpu-proof.json

This is a strict smoke/proof command, not a model-quality or performance claim. The JSON artifact must distinguish the enhanced loader from compatibility fallback. Receipt validation, kernel IDs, and throughput claims are handled by follow-up CPU proof items.

The 3-Stage Validation Pipeline

Stage 1: LayerNorm & Projection RMS Check

Purpose: Detect quantized LayerNorm weights and projection weight anomalies using architecture-aware statistical validation.

What it checks:

LayerNorm gamma RMS values are in expected envelope (architecture-specific)
Projection weight RMS values are reasonable for the model format
Uses pattern-based thresholds tailored to BitNet b1.58 F16/I2_S or LLaMA-style models

How to run:

# Auto-detect architecture from GGUF metadata
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate auto \
  models/model.gguf

# With strict mode (fail on warnings)
BITNET_STRICT_MODE=1 \
  cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate auto \
  models/model.gguf

Expected output (healthy model):

model_sha256: a1b2c3d4e5f6...
ruleset: bitnet-b1.58:f16

blk.0.attn_norm.weight                                     [LN]     rms=0.9523   ✅
blk.0.ffn_norm.weight                                      [LN]     rms=0.0847   ✅
blk.1.attn_norm.weight                                     [LN]     rms=0.9412   ✅
blk.1.ffn_norm.weight                                      [LN]     rms=0.0851   ✅
...
output_norm.weight                                         [LN]     rms=0.9998   ✅

blk.0.attn_q.weight                                        [PROJ]   rms=0.0214   ✅
blk.0.attn_k.weight                                        [PROJ]   rms=0.0218   ✅
blk.0.attn_v.weight                                        [PROJ]   rms=0.0216   ✅
...

✅ LN RMS gate passed (bitnet-b1.58:f16)
✅ Projection RMS gate passed (bitnet-b1.58:f16)

Expected output (quantized LayerNorm - BAD):

model_sha256: f9e8d7c6b5a4...
ruleset: bitnet-b1.58:f16

blk.0.attn_norm.weight                                     [LN]     rms=0.0127   ❌
blk.0.ffn_norm.weight                                      [LN]     rms=0.0093   ❌
blk.1.attn_norm.weight                                     [LN]     rms=0.0131   ❌
...

❌ LN RMS gate failed: 24/24 out of envelope (bitnet-b1.58:f16)

ERROR: Model has suspicious LayerNorm weights (quantized or corrupted).
Recommendation: Regenerate GGUF with LayerNorm weights in float format (F16/F32).
See docs/howto/export-clean-gguf.md for proper export workflow.

Exit codes:

0: All checks passed
8 (EXIT_LN_SUSPICIOUS): LayerNorm or projection validation failed in strict mode

See also: Validation Gates Reference for detailed threshold definitions.

Stage 2: Model Loading Check

Purpose: Verify weights load correctly and have healthy RMS values during actual model initialization.

What it checks:

All projection weights (Q/K/V/O, FFN gate/up/down) load successfully
RMS values are in expected range (typically O(10³) for quantized weights)
No NaN/Inf values in loaded tensors
I2_S dequantization produces reasonable scales

How to run:

# Enable RUST_LOG=info to see projection RMS values
RUST_LOG=info \
  cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  run --model models/model.gguf --tokenizer models/tokenizer.json \
  --prompt "Warmup." --max-new-tokens 1 --temperature 0.0

Expected output:

INFO PROJ load: blk.0.attn_q.weight RMS=0.0214 (inv=false)
INFO PROJ load: blk.0.attn_k.weight RMS=0.0218 (inv=false)
INFO PROJ load: blk.0.attn_v.weight RMS=0.0216 (inv=false)
INFO PROJ load: blk.0.attn_o.weight RMS=0.0219 (inv=false)
INFO PROJ load: blk.0.ffn_gate.weight RMS=0.0201 (inv=false)
INFO PROJ load: blk.0.ffn_up.weight RMS=0.0198 (inv=false)
INFO PROJ load: blk.0.ffn_down.weight RMS=0.0203 (inv=false)
...

Warning signs:

# Extremely high RMS (inverted scales?)
INFO PROJ load: blk.0.attn_q.weight RMS=150.3 (inv=false)  ⚠️

# Wildly different RMS values (corruption?)
INFO PROJ load: blk.0.attn_q.weight RMS=0.02 (inv=false)
INFO PROJ load: blk.0.attn_k.weight RMS=100.5 (inv=false)  ⚠️

Exit codes:

0: Model loaded successfully
1: Model loading failed (missing tensors, format errors, etc.)

Stage 3: Linguistic Sanity Check

Purpose: Ensure the model produces coherent output, not gibberish or tied logits.

What it checks:

Greedy deterministic inference produces recognizable words
Output contains at least one word with 3+ ASCII letters
No immediate tokenizer decode errors
Model doesn't repeat same token indefinitely

How to run:

# Deterministic greedy inference
BITNET_DETERMINISTIC=1 \
BITNET_SEED=42 \
RAYON_NUM_THREADS=1 \
  cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  run --model models/model.gguf --tokenizer models/tokenizer.json \
  --prompt "The capital of France is" \
  --max-new-tokens 8 \
  --temperature 0.0

Expected output (healthy):

The capital of France is Paris.

Warning signs (issues):

# Gibberish (quantized LayerNorm or wrong tokenizer)
The capital of France is █▓▒░█▓▒

# Repetition (tied logits or attention collapse)
The capital of France is the the the the the the

# Empty or decode errors (tokenizer mismatch)
The capital of France is

Exit codes:

0: Linguistic sanity check passed
1: Inference failed to run
Non-zero: Check logs for specific failure mode

Validation Modes

Auto-Detection Mode (Recommended)

Automatically selects validation rules based on GGUF metadata:

cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate auto \
  models/model.gguf

Auto-detection logic:

Architecture	File Type	Selected Ruleset	LayerNorm Envelope	Projection Envelope
`bitnet` or `b1.58`	`1` (F16)	`bitnet-b1.58:f16`	Pattern-based (0.05-2.0 typical)	[0.01, 0.40]
`bitnet` or `b1.58`	Other (quantized)	`bitnet-b1.58:i2_s`	Pattern-based (0.01-2.0 typical)	[0.002, 0.20]
Other	Any	`generic`	[0.80, 1.20]	None

When to use:

✅ Standard BitNet b1.58 models (F16 or I2_S)
✅ LLaMA/Mistral/standard RMSNorm architectures
✅ CI/CD pipelines requiring deterministic validation
✅ When you trust your GGUF metadata is correct

Environment variables:

# Set auto mode via environment
export BITNET_VALIDATION_GATE=auto
cargo run -p bitnet-cli -- inspect --ln-stats model.gguf

Policy Mode (Custom Architectures)

Use custom validation policies for non-standard architectures:

cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats \
  --gate policy \
  --policy examples/policies/custom-model.yml \
  --policy-key my-model:f16 \
  models/model.gguf

When to use:

✅ Custom or experimental architectures
✅ Models with unusual LayerNorm patterns
✅ Overriding auto-detection for specific requirements
✅ Testing new policy definitions

Example policy file:

version: 1

rules:
  my-model:f16:
    name: "My Custom Model F16"
    ln:
      - pattern: "attn_norm\\.weight$"
        min: 0.85
        max: 1.15
        description: "Attention LayerNorm (observed RMS ~0.92-1.05)"

      - pattern: "ffn_norm\\.weight$"
        min: 0.35
        max: 0.60
        description: "FFN LayerNorm (architectural low gamma)"

    proj_weight_rms_min: 0.015
    proj_weight_rms_max: 0.35

Environment variables:

export BITNET_VALIDATION_GATE=policy
export BITNET_VALIDATION_POLICY=examples/policies/custom-model.yml
export BITNET_VALIDATION_POLICY_KEY=my-model:f16
cargo run -p bitnet-cli -- inspect --ln-stats model.gguf

See also: Policy Examples README for creating custom policies.

None Mode (Skip Validation)

Disable validation entirely:

cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate none \
  models/model.gguf

When to use:

⚠️ Debugging validation system itself
⚠️ Experimental models where validation rules don't exist yet
⚠️ Testing inference without validation overhead

Warning: This disables important safety checks. Only use for development.

Complete Workflows

Workflow 1: Validate Existing GGUF

Scenario: You have a GGUF model from Hugging Face or a third-party export tool and need to verify it's valid for inference.

Steps:

# 1. Inspect LayerNorm and projection statistics
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate auto \
  models/model.gguf

# 2. Run full 3-stage validation
./scripts/validate_gguf.sh \
  models/model.gguf \
  models/tokenizer.json

# 3. If validation passes, model is ready for use
cargo run -p bitnet-cli --no-default-features --features cpu -- \
  run --model models/model.gguf --tokenizer models/tokenizer.json \
  --prompt "Your prompt here"

If validation fails:

See Troubleshooting section below.

Workflow 2: Convert SafeTensors to Clean GGUF

Scenario: You have a SafeTensors checkpoint (from training or fine-tuning) and need to create a validated GGUF.

Steps:

# 1. Export to F16 GGUF with LayerNorm preservation
./scripts/export_clean_gguf.sh \
  models/safetensors-checkpoint \
  models/tokenizer.json \
  models/clean

# Output:
# INFO: Using Rust st2gguf converter
# INFO: Converting SafeTensors to GGUF (F16 output, LayerNorm preserved)...
# ✅ Export complete!
#   Output: models/clean/clean-f16.gguf
#   Fingerprint: sha256-abc123...

# 2. Validate the exported GGUF
./scripts/validate_gguf.sh \
  models/clean/clean-f16.gguf \
  models/tokenizer.json

# 3. If validation passes, you're done!
# If validation fails, check export logs and retry

Advanced: Use Rust st2gguf directly

# Build st2gguf converter
cargo build --release -p bitnet-st2gguf

# Convert with strict validation
target/release/st2gguf \
  --input models/checkpoint.safetensors \
  --output models/clean-f16.gguf \
  --config models/config.json \
  --strict

# Validate
./scripts/validate_gguf.sh \
  models/clean-f16.gguf \
  models/tokenizer.json

See also: Export Clean GGUF Guide for detailed export instructions.

Workflow 3: Validate Custom Architecture

Scenario: You have a custom or experimental architecture that doesn't match BitNet b1.58 or standard LLaMA patterns.

Steps:

# 1. Inspect LayerNorm statistics to understand patterns
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate none \
  models/custom-model.gguf > ln-stats.txt

# Review the output to identify RMS patterns
cat ln-stats.txt

# 2. Create custom policy based on observed patterns
cp examples/policies/custom-model-example.yml my-model-policy.yml
nano my-model-policy.yml

# Define rules based on your inspection:
# - LayerNorm patterns and RMS envelopes
# - Projection weight RMS ranges
# - Architecture-specific quirks

# 3. Validate with custom policy
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats \
  --gate policy \
  --policy my-model-policy.yml \
  --policy-key my-model:f16 \
  models/custom-model.gguf

# 4. Run linguistic sanity check
BITNET_DETERMINISTIC=1 BITNET_SEED=42 \
  cargo run -p bitnet-cli --no-default-features --features cpu -- \
  run --model models/custom-model.gguf --tokenizer models/tokenizer.json \
  --prompt "Test prompt" --max-new-tokens 32 --temperature 0.0

# 5. If output is coherent, commit your policy
git add my-model-policy.yml
git commit -m "feat(validation): add policy for my-model architecture"

See also: Policy Examples README for policy creation guide.

Workflow 4: Policy-Based Runtime Corrections (Development Only)

Scenario: You have a known-bad model (quantized LayerNorm) and need to unblock inference development while waiting for proper GGUF regeneration.

⚠️ WARNING: This is a temporary workaround for development only. CI blocks correction flags to prevent production deployment.

Steps:

# 1. Diagnose the issue
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate auto \
  models/bad-model.gguf

# Output:
# ❌ LN RMS gate failed: 24/24 out of envelope
# blk.0.attn_norm.weight RMS=0.0127 [SUSPICIOUS - expected ~1.0]

# 2. Create correction policy (see docs/explanation/correction-policy.md)
nano config/correction-policy.yml

# Example correction policy:
# version: 1
# models:
#   - fingerprint: "sha256-abc123..."
#     corrections:
#       - type: LN_GAMMA_RESCALE_RMS
#         target_rms: 1.0

# 3. Enable runtime corrections (DEVELOPMENT ONLY)
export BITNET_CORRECTION_POLICY=./config/correction-policy.yml
export BITNET_ALLOW_RUNTIME_CORRECTIONS=1
export BITNET_DETERMINISTIC=1
export BITNET_SEED=42

# 4. Run inference with corrections
cargo run -p bitnet-cli --no-default-features --features cpu -- \
  run --model models/bad-model.gguf --tokenizer models/tokenizer.json \
  --prompt "Test prompt"

# 5. IMPORTANT: Regenerate clean GGUF for production use
./scripts/export_clean_gguf.sh \
  models/original-checkpoint \
  models/tokenizer.json \
  models/clean

# 6. Validate clean GGUF and retire correction policy
unset BITNET_CORRECTION_POLICY BITNET_ALLOW_RUNTIME_CORRECTIONS
./scripts/validate_gguf.sh models/clean/clean-f16.gguf models/tokenizer.json

See also: Correction Policy Documentation for detailed correction workflow.

Troubleshooting Validation Failures

Issue: LayerNorm RMS Validation Failed

Symptom:

❌ LN RMS gate failed: 24/24 out of envelope (bitnet-b1.58:f16)
blk.0.attn_norm.weight RMS=0.0127 [SUSPICIOUS - expected ~1.0]

Root Cause: LayerNorm gamma weights were quantized during export (should be F16/F32).

Solutions:

Best solution: Regenerate GGUF with LayerNorm weights in float format

# Using Rust st2gguf (automatic LayerNorm preservation)
cargo run --release -p bitnet-st2gguf -- \
  --input models/checkpoint.safetensors \
  --output models/clean-f16.gguf \
  --strict

# Validate
./scripts/validate_gguf.sh models/clean-f16.gguf models/tokenizer.json

Temporary workaround: Use correction policy (development only)

See Workflow 4 above.

Alternative: Check if you're using the wrong policy

# BitNet I2_S models legitimately have low attn_norm RMS (~0.01-0.02)
# Use correct policy for quantized models
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats \
  --policy examples/policies/bitnet-b158-i2s-quantized.yml \
  --policy-key bitnet-b1.58:i2_s \
  models/model-i2s.gguf

Issue: Projection Weight RMS Out of Range

Symptom:

⚠️ WARNING: suspicious projection weights detected (6/144 tensors)
blk.0.attn_q.weight RMS=150.3 [OUT OF RANGE: expected 0.01-0.40]

Root Cause: I2_S dequantization scales are inverted or weights are corrupted.

Solutions:

Inspect RMS distribution:
```
RUST_LOG=info \
  cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  run --model model.gguf --tokenizer tokenizer.json \
  --prompt "Test" --max-new-tokens 1 2>&1 | grep "PROJ load"
```
Look for patterns:
- Q/K/V have very high RMS (~100-150) but FFN is normal (~0.8-1.0) → Inverted scales
- All projections have similar anomalous RMS → Export corruption
- Single layer has issues → Layer-specific corruption

Re-export from source checkpoint:

./scripts/export_clean_gguf.sh \
  models/source-checkpoint \
  models/tokenizer.json \
  models/clean

Check GGUF metadata:

# Verify file_type matches actual quantization
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --metadata model.gguf

Issue: Gibberish Output in Linguistic Sanity Check

Symptom:

The capital of France is █▓▒░█▓▒░▓▒
⚠️ Output does not contain recognizable words

Root Causes and Solutions:

Tokenizer mismatch:

# Try different tokenizer
cargo run -p bitnet-cli --no-default-features --features cpu -- \
  run --model model.gguf --tokenizer different-tokenizer.json \
  --prompt "The capital of France is"

Quantized LayerNorm:

See LayerNorm RMS Validation Failed above.
RoPE parameter mismatch:

Check config.json for RoPE settings:
- rope_theta (base frequency)
- rope_scaling (scaling factors)
- Verify they match model training configuration

Model corruption:

# Check SHA256 fingerprint
sha256sum model.gguf

# Re-download or re-export if hash doesn't match

Issue: Policy Key Not Found

Symptom:

Error: policy key not found: my-model:f16

Solutions:

List available policy keys:

# View policy file structure
cat examples/policies/custom-model-example.yml

# Look for keys under "rules:" section
# Example:
# rules:
#   bitnet-b1.58:f16:  ← This is the policy key
#     name: "BitNet b1.58 F16"

Use correct key format:

# Key format: architecture:variant
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats \
  --policy examples/policies/bitnet-b158-f16-clean.yml \
  --policy-key bitnet-b1.58:f16 \
  model.gguf

Create missing policy:

See Workflow 3: Validate Custom Architecture above.

Command Reference

Inspect Command

Purpose: Examine LayerNorm and projection weight statistics with architecture-aware validation.

Syntax:

cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats \
  [--gate none|auto|policy] \
  [--policy PATH] \
  [--policy-key KEY] \
  [--json] \
  MODEL

Arguments:

Argument	Required	Description
`MODEL`	Yes	Path to GGUF model file
`--ln-stats`	Yes	Enable LayerNorm statistics analysis
`--gate`	No	Validation mode: `none`, `auto`, `policy` (default: `auto`)
`--policy`	No	Path to YAML policy file (required for `gate=policy`)
`--policy-key`	No	Policy key for rules lookup (default: uses architecture from GGUF)
`--json`	No	Output results as JSON

Examples:

# Auto-detect architecture
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate auto model.gguf

# Use custom policy
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats \
  --gate policy \
  --policy examples/policies/custom.yml \
  --policy-key my-model:f16 \
  model.gguf

# JSON output for CI
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate auto --json model.gguf > validation.json

Exit Codes:

Code	Name	Description
`0`	`EXIT_SUCCESS`	All validation checks passed
`8`	`EXIT_LN_SUSPICIOUS`	LayerNorm or projection validation failed (strict mode only)

See also: Validation Gates Reference for technical details.

Validation Script

Purpose: Run complete 3-stage validation pipeline (LayerNorm, projection, linguistic sanity).

Syntax:

./scripts/validate_gguf.sh MODEL TOKENIZER

Arguments:

Argument	Required	Description
`MODEL`	Yes	Path to GGUF model file
`TOKENIZER`	Yes	Path to tokenizer.json file

Examples:

# Basic validation
./scripts/validate_gguf.sh \
  models/bitnet-2b.gguf \
  models/tokenizer.json

# Validation in CI (exit code check)
./scripts/validate_gguf.sh model.gguf tokenizer.json
if [ $? -ne 0 ]; then
  echo "Validation failed - model did not pass checks"
  exit 1
fi

Exit Codes:

Code	Description
`0`	All validation checks passed
`10`	LayerNorm validation failed
`13`	Model loading failed
`14`	Inference probe failed
`15`	Linguistic sanity check failed

Export Script

Purpose: Convert SafeTensors to clean F16 GGUF with LayerNorm preservation.

Syntax:

./scripts/export_clean_gguf.sh MODEL_DIR TOKENIZER OUT_DIR

Arguments:

Argument	Required	Description
`MODEL_DIR`	Yes	Directory containing SafeTensors or HF checkpoint
`TOKENIZER`	Yes	Path to tokenizer.json file
`OUT_DIR`	Yes	Output directory for GGUF

Environment Variables:

Variable	Values	Description
`CONVERTER`	Path or `rust`/`st2gguf`	Override converter selection
`STRICT`	`1`	Enable strict validation in st2gguf

Examples:

# Export with automatic converter selection
./scripts/export_clean_gguf.sh \
  models/safetensors-checkpoint \
  models/tokenizer.json \
  models/clean

# Force Rust st2gguf converter with strict validation
CONVERTER=rust STRICT=1 \
  ./scripts/export_clean_gguf.sh \
  models/checkpoint \
  models/tokenizer.json \
  models/clean

Output Files:

File	Description
`clean-f16.gguf`	Main GGUF model (F16 precision)
`clean-f16.fingerprint`	SHA256 fingerprint (sha256-...)
`clean-f16.meta.json`	Export metadata (source, date, converter, etc.)

See also: Export Clean GGUF Guide for detailed export documentation.

Receipt Validation

CPU Receipt Validation

Receipts generated during CPU inference must contain CPU quantized kernels to ensure honest computation claims.

Valid CPU Kernel Patterns:

I2S: i2s_gemv, i2s_matmul_*, quantized_matmul_i2s
TL1 (ARM NEON): tl1_neon_*, tl1_lookup_*
TL2 (x86 AVX): tl2_avx_*, tl2_avx512_*

Rejected Fallback Patterns:

Dequantization: dequant_*, dequant_i2s_to_fp32
FP32 Computation: fp32_matmul, fp32_gemm
Generic Fallback: fallback_*, scalar_*
Mock/Test: mock_*, test_stub

Validation Commands:

# Run benchmark and generate receipt
cargo run -p xtask -- benchmark --model model.gguf --tokens 128

# Verify receipt contains CPU quantized kernels
cargo run -p xtask -- verify-receipt ci/inference.json

# The validator checks:
# - backend="cpu" requires CPU quantized kernel IDs
# - Rejects dequant_*, fp32_*, fallback_* patterns
# - Uses starts_with matching (not contains) to prevent false positives
# - Detects silent CPU fallback when backend claims GPU but uses CPU kernels

Example Valid CPU Receipt:

{
  "schema_version": "1.0.0",
  "backend": "cpu",
  "compute_path": "real",
  "kernels": [
    "i2s_gemv",
    "tl1_neon_matmul",
    "quantized_matmul_i2s"
  ],
  "tokens_per_second": 18.5,
  "tokens_generated": 128
}

Example Invalid CPU Receipt (Fallback Detected):

{
  "schema_version": "1.0.0",
  "backend": "cpu",
  "compute_path": "real",
  "kernels": [
    "dequant_i2s_to_fp32",  // ❌ Dequantization fallback
    "fp32_matmul"            // ❌ FP32 computation
  ],
  "tokens_per_second": 18.5,
  "tokens_generated": 128
}

Exit Codes:

Code	Description
`0`	Receipt validation passed
`1`	Receipt validation failed (fallback kernels detected)

See also: Issue #462 for receipt CPU validation implementation (88% mutation testing score).

Environment Variables

Validation Configuration

Variable	Values	Default	Description
`BITNET_VALIDATION_GATE`	`none`, `auto`, `policy`	`auto`	Validation gate mode
`BITNET_VALIDATION_POLICY`	Path	None	Policy file path (for `gate=policy`)
`BITNET_VALIDATION_POLICY_KEY`	String	Architecture	Policy key (for `gate=policy`)
`BITNET_STRICT_MODE`	`0`, `1`	`0`	Enable strict validation (fail on warnings)

Inference Configuration

Variable	Values	Default	Description
`BITNET_DETERMINISTIC`	`0`, `1`	`0`	Enable deterministic inference
`BITNET_SEED`	Integer	Random	Random seed (requires `BITNET_DETERMINISTIC=1`)
`RAYON_NUM_THREADS`	Integer	Auto	Thread count for parallel operations
`RUST_LOG`	`error`, `warn`, `info`, `debug`, `trace`	`error`	Logging level

Correction Configuration (Development Only)

Variable	Values	Default	Description
`BITNET_CORRECTION_POLICY`	Path	None	Correction policy file path
`BITNET_ALLOW_RUNTIME_CORRECTIONS`	`0`, `1`	`0`	Enable runtime corrections (dev only)

⚠️ Warning: CI blocks BITNET_ALLOW_RUNTIME_CORRECTIONS=1 to prevent production deployment with workarounds.

Integration with CI/CD

GitHub Actions Example

# .github/workflows/validate-model.yml
name: Validate GGUF Model

on:
  push:
    paths:
      - 'models/**/*.gguf'
  workflow_dispatch:

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Rust
        uses: dtolnay/rust-toolchain@stable

      - name: Build BitNet-rs CLI
        run: |
          cargo build --release -p bitnet-cli \
            --no-default-features --features cpu,full-cli

      - name: Run Validation
        run: |
          ./scripts/validate_gguf.sh \
            models/model.gguf \
            models/tokenizer.json

      - name: Check for corrections (security gate)
        run: |
          if [ -n "$BITNET_CORRECTION_POLICY" ] || \
             [ -n "$BITNET_ALLOW_RUNTIME_CORRECTIONS" ]; then
            echo "ERROR: Runtime correction flags detected"
            echo "Corrections are dev-only - regenerate clean GGUF"
            exit 1
          fi

CI Best Practices

Always use strict mode:

BITNET_STRICT_MODE=1 ./scripts/validate_gguf.sh model.gguf tokenizer.json

Block correction flags:

# Fail CI if correction flags are set
if [ -n "$BITNET_ALLOW_RUNTIME_CORRECTIONS" ]; then
  echo "ERROR: Corrections not allowed in CI"
  exit 1
fi

Validate on every model change:

on:
  push:
    paths:
      - 'models/**/*.gguf'
      - 'models/**/tokenizer.json'

Archive validation reports:

- name: Upload validation report
  uses: actions/upload-artifact@v3
  with:
    name: validation-report
    path: validation.json

FAQ

Q: What's the difference between validation and correction policies?

Validation policies define acceptable RMS ranges for weights (example: examples/policies/bitnet-b158-f16-clean.yml)
Correction policies provide runtime fixes for known-bad models (example: see docs/explanation/correction-policy.md)

Validation policies are used during inspection to check if a model is healthy. Correction policies are temporary workarounds to fix known defects at runtime (dev only).

Q: When should I use `BITNET_STRICT_MODE=1`?

Use strict mode in:

✅ CI/CD pipelines
✅ Production validation
✅ Release qualification
✅ When you need zero-tolerance for warnings

Skip strict mode when:

⚠️ Debugging validation rules
⚠️ Working with experimental models
⚠️ You understand the warnings and accept the risk

Q: Can I skip LayerNorm validation for quantized models?

No. LayerNorm weights should never be quantized, even in I2_S models. The validator uses architecture-specific rules that account for legitimate RMS variations in quantized models (e.g., BitNet I2_S attn_norm legitimately has RMS ~0.01-0.02).

If validation fails, either:

Regenerate GGUF with LayerNorm weights in float format, or
Create a custom policy if your architecture legitimately has unusual RMS values

Q: What if auto-detection selects the wrong ruleset?

Use explicit policy mode:

cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats \
  --gate policy \
  --policy examples/policies/correct-policy.yml \
  --policy-key correct-key \
  model.gguf

If auto-detection is consistently wrong, file an issue with:

GGUF metadata (cargo run -p bitnet-cli -- inspect --metadata model.gguf)
Expected architecture
Inspection output

Q: How do I know if my custom policy is correct?

Test on multiple checkpoints: Validate 3-5 models from different training stages
Check false positives: Policy should not reject healthy models
Check false negatives: Policy should catch known-bad models
Compare with reference: Use known-good model as baseline
Test inference: Models passing validation should produce coherent output

See Workflow 3 for detailed custom policy creation.

Summary

Task	Command	Purpose
Validate existing GGUF	`./scripts/validate_gguf.sh model.gguf tokenizer.json`	3-stage validation pipeline
Inspect LayerNorm stats	`cargo run -p bitnet-cli -- inspect --ln-stats --gate auto model.gguf`	Architecture-aware RMS validation
Export clean GGUF	`./scripts/export_clean_gguf.sh checkpoint/ tokenizer.json output/`	Convert SafeTensors to F16 GGUF
Custom policy validation	`cargo run -p bitnet-cli -- inspect --ln-stats --gate policy --policy policy.yml model.gguf`	Validate with custom rules
Strict mode validation	`BITNET_STRICT_MODE=1 cargo run -p bitnet-cli -- inspect --ln-stats model.gguf`	Fail on warnings

Key Principle: Always validate models before deployment. Clean models must pass all 3 stages without corrections or workarounds.

For questions or issues, see:

GitHub Issues: BitNet-rs/issues
Documentation Index: docs/ directory
Quick Reference: CLAUDE.md

FilesExpand file tree

validate-models.md

Latest commit

History

validate-models.md

File metadata and controls

How to Validate BitNet Models

Overview

Quick Start

Validate an Existing GGUF

Convert SafeTensors to Clean GGUF

Strict CPU Proof Smoke

The 3-Stage Validation Pipeline

Stage 1: LayerNorm & Projection RMS Check

Stage 2: Model Loading Check

Stage 3: Linguistic Sanity Check

Validation Modes

Auto-Detection Mode (Recommended)

Policy Mode (Custom Architectures)

None Mode (Skip Validation)

Complete Workflows

Workflow 1: Validate Existing GGUF

Workflow 2: Convert SafeTensors to Clean GGUF

Workflow 3: Validate Custom Architecture

Workflow 4: Policy-Based Runtime Corrections (Development Only)

Troubleshooting Validation Failures

Issue: LayerNorm RMS Validation Failed

Issue: Projection Weight RMS Out of Range

Issue: Gibberish Output in Linguistic Sanity Check

Issue: Policy Key Not Found

Command Reference

Inspect Command

Validation Script

Export Script

Receipt Validation

CPU Receipt Validation

Environment Variables

Validation Configuration

Inference Configuration

Correction Configuration (Development Only)

Integration with CI/CD

GitHub Actions Example

CI Best Practices

FAQ

Q: What's the difference between validation and correction policies?

Q: When should I use BITNET_STRICT_MODE=1?

Q: Can I skip LayerNorm validation for quantized models?

Q: What if auto-detection selects the wrong ruleset?

Q: How do I know if my custom policy is correct?

Related Documentation

Summary

Q: When should I use `BITNET_STRICT_MODE=1`?