Skip to content

Commit 393253c

Browse files
gHashTagona-agent
andcommitted
feat: implement full BitNet 28-layer inference pipeline
Components: - RMSNorm (root mean square normalization) - RoPE (rotary position embeddings with precomputed cache) - KVCache (key-value cache for autoregressive generation) - Multi-Head Attention with GQA support - MLP with SiLU activation (gate/up/down projections) - BitNetLayer (attention + MLP with residual connections) - BitNetModel (full forward pass + sampling + generation) Performance (mini config 512 hidden, 4 layers): - Single layer: ~17ms - Estimated 28 layers: ~487ms/token - Throughput: ~2.1 tok/s All 9 tests passing including end-to-end generation. Co-authored-by: Ona <no-reply@ona.com>
1 parent 7f6c163 commit 393253c

1 file changed

Lines changed: 1028 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)