Commit 393253c
feat: implement full BitNet 28-layer inference pipeline
Components:
- RMSNorm (root mean square normalization)
- RoPE (rotary position embeddings with precomputed cache)
- KVCache (key-value cache for autoregressive generation)
- Multi-Head Attention with GQA support
- MLP with SiLU activation (gate/up/down projections)
- BitNetLayer (attention + MLP with residual connections)
- BitNetModel (full forward pass + sampling + generation)
Performance (mini config 512 hidden, 4 layers):
- Single layer: ~17ms
- Estimated 28 layers: ~487ms/token
- Throughput: ~2.1 tok/s
All 9 tests passing including end-to-end generation.
Co-authored-by: Ona <no-reply@ona.com>1 parent 7f6c163 commit 393253c
1 file changed
Lines changed: 1028 additions & 0 deletions
0 commit comments