Commit 5ba7745
fix: Add ternary weight quantization at model load time
BitNet b1.58 requires both:
- 8-bit activation quantization (per-token)
- Ternary weight quantization (per-tensor mean-based)
Now quantizes all linear projection weights to ternary values
during model loading for correct inference.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>1 parent 03bbdc2 commit 5ba7745
2 files changed
Lines changed: 412 additions & 3 deletions
0 commit comments