Skip to content

feat(hslm): progressive quantization — FP32 warmup → ternary anneal #321

@gHashTag

Description

@gHashTag

Task

Start training in full precision, progressively anneal to ternary over training schedule.
Lets model establish good representations before ternary constraints.

Scientific Background

BitNet b1.58 Training Recipe (JMLR 2026)

  • Two-stage: high LR (4-8×10⁻⁴) for 80% → cosine anneal for 20%
  • Higher LR improves ternary convergence (counterintuitive!)
  • Weight decay: 0.1 → near zero (decaying, not constant)
  • Warmup: 375 steps linear

Progressive Quantization (ACM 2024)

  • Train 10% at FP32 → 20% at INT8 → 20% at INT4 → 50% at ternary
  • Gradually increasing constraint preserves gradient flow
  • Temperature-annealed STE: soft quantization → hard discrete

BitNet Shadow Weights

  • Maintain FP16 shadow weights during training
  • Shadow weights exist only for gradient flow
  • Quantize to ternary for forward pass only
  • Discard shadow weights after training

Implementation

const QuantSchedule = struct {
    fn getPrecision(step: u64, total: u64) Precision {
        const ratio = @as(f32, @floatFromInt(step)) / @as(f32, @floatFromInt(total));
        if (ratio < 0.1) return .fp32;      // warm start
        if (ratio < 0.3) return .int8;      // gentle quantize
        if (ratio < 0.5) return .int4;      // harder
        return .ternary;                     // full constraint
    }
};

Temperature annealing alternative:

Q_soft(w, T) = tanh(w / T)  // T→0: hard ternary, T→∞: identity
Schedule: T = 10.0 → 0.01 over training (exponential decay)

Changes

  • src/hslm/trainer.zig: QuantSchedule with step-dependent precision
  • src/hslm/quantize.zig: temperature-annealed soft quantization
  • Shadow weights buffer (FP32, same size as model)
  • Flag: --progressive-quant to enable

Expected

  • 10-15% PPL improvement from better initialization
  • Most impactful for initial 50K steps
  • Compound: progressive quant + TTQ + OHEM = potentially PPL < 60

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent:spawnAuto-spawn agent container

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions