Quadtrix v1.0
Efficiency metrics
First release — token-level language model trained on CPU.
Training run
| Metric |
Value |
| Loss reduction |
69.7% (10.82 → 3.25) |
| Best loss |
3.252 (step 2510) |
| Peak throughput |
435 tok/s |
| Wall time |
~61 min |
Loss curve
Model config
| Parameter |
Value |
| Parameters |
6,684,497 |
| Architecture |
4 layers · 4 heads · 64d embedding |
| Batch · block size |
16 · 32 |
| Learning rate |
1e-3 |
| Dropout |
0.1 |
| Train tokens |
7,065,137 |
| Val tokens |
785,016 |
How to run
python engine/main.py
python engine/inference.py
Notes
- Training ran on CPU (PyTorch 2.4.1) with steady 60% bf16 MFU throughout
- Loss converged from 10.82 → 3.25 over 2,690 steps in ~61 minutes
- Gradient norms stable; no spikes or divergence observed
- Checkpoint saved at step 2510 (best validation loss)