Skip to content

Commit b8d2aef

Browse files
committed
docs: update model size and inference time tables for LLMs
1 parent 61560ff commit b8d2aef

2 files changed

Lines changed: 76 additions & 16 deletions

File tree

docs/docs/02-benchmarks/inference-time.md

Lines changed: 38 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -106,14 +106,44 @@ The values below represent the averages across all runs for the benchmark image.
106106

107107
## LLMs
108108

109-
| Model | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] |
110-
| --------------------- | :--------------------------------: | :--------------------------------: | :------------------------------: | :-------------------------------------: | :-----------------------------: |
111-
| LLAMA3_2_1B | 16.1 | 11.4 || 15.6 | 19.3 |
112-
| LLAMA3_2_1B_SPINQUANT | 40.6 | 16.7 | 16.5 | 40.3 | 48.2 |
113-
| LLAMA3_2_1B_QLORA | 31.8 | 11.4 | 11.2 | 37.3 | 44.4 |
114-
| LLAMA3_2_3B ||||| 7.1 |
115-
| LLAMA3_2_3B_SPINQUANT | 17.2 | 8.2 || 16.2 | 19.4 |
116-
| LLAMA3_2_3B_QLORA | 14.5 ||| 14.8 | 18.1 |
109+
| Model | Google Pixel 10 (XNNPACK) [tokens/s] | iPhone 17 Pro (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] |
110+
| ------------------------------ | :----------------------------------: | :--------------------------------: | :-----------------------------: | :------------------------------: |
111+
| LLAMA3_2_1B | 8 | 8 | 15 | N/A |
112+
| LLAMA3_2_1B_QLORA | 22 | 22 | 45 | 19 |
113+
| LLAMA3_2_1B_SPINQUANT | 24 | 36 | 48 | 17 |
114+
| LLAMA3_2_3B | 2 | 3 | 6 | N/A |
115+
| LLAMA3_2_3B_QLORA | 8 | 7 | 17 | N/A |
116+
| LLAMA3_2_3B_SPINQUANT | 11 | 12 | 18 | N/A |
117+
| QWEN3_0_6B | 7 | 9 | 15 | 9 |
118+
| QWEN3_0_6B_QUANTIZED | 20 | 27 | 37 | 35 |
119+
| QWEN3_1_7B | 3 | 5 | 8 | N/A |
120+
| QWEN3_1_7B_QUANTIZED | 10 | 14 | 20 | 13 |
121+
| QWEN3_4B | 2 | N/A | 4 | N/A |
122+
| QWEN3_4B_QUANTIZED | 5 | 7 | 10 | N/A |
123+
| HAMMER2_1_0_5B | 13 | 13 | 25 | 16 |
124+
| HAMMER2_1_0_5B_QUANTIZED | 34 | 97 | 72 | 56 |
125+
| HAMMER2_1_1_5B | 5 | 5 | 10 | N/A |
126+
| HAMMER2_1_1_5B_QUANTIZED | 14 | 16 | 36 | 22 |
127+
| HAMMER2_1_3B | 2 | 3 | 5 | N/A |
128+
| HAMMER2_1_3B_QUANTIZED | 9 | 10 | 20 | N/A |
129+
| SMOLLM2_1_135M | 25 | 24 | 33 | 42 |
130+
| SMOLLM2_1_135M_QUANTIZED | 20 | 32 | 64 | 47 |
131+
| SMOLLM2_1_360M | 12 | 13 | 20 | 15 |
132+
| SMOLLM2_1_360M_QUANTIZED | 12 | 15 | 29 | 18 |
133+
| SMOLLM2_1_1_7B | 3 | 5 | 7 | N/A |
134+
| SMOLLM2_1_1_7B_QUANTIZED | 12 | 14 | 27 | 23 |
135+
| QWEN2_5_0_5B | 12 | 12 | 21 | 15 |
136+
| QWEN2_5_0_5B_QUANTIZED | 33 | 31 | 55 | 48 |
137+
| QWEN2_5_1_5B | 5 | 5 | 9 | N/A |
138+
| QWEN2_5_1_5B_QUANTIZED | 15 | 15 | 28 | 16 |
139+
| QWEN2_5_3B | 2 | 3 | 5 | N/A |
140+
| QWEN2_5_3B_QUANTIZED | 9 | 10 | 18 | N/A |
141+
| PHI_4_MINI_4B | 2 | 3 | 4 | N/A |
142+
| PHI_4_MINI_4B_QUANTIZED | 4 | 7 | 10 | N/A |
143+
| LFM2_5_350M | 16 | 26 | 34 | 21 |
144+
| LFM2_5_350M_QUANTIZED | 58 | 67 | 103 | 51 |
145+
| LFM2_5_1_2B_INSTRUCT | 6 | 10 | 13 | N/A |
146+
| LFM2_5_1_2B_INSTRUCT_QUANTIZED | 8 | 26 | 47 | 24 |
117147

118148
❌ - Insufficient RAM.
119149

docs/docs/02-benchmarks/model-size.md

Lines changed: 38 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -60,14 +60,44 @@ title: Model Size
6060

6161
## LLMs
6262

63-
| Model | XNNPACK [GB] |
64-
| --------------------- | :----------: |
65-
| LLAMA3_2_1B | 2.47 |
66-
| LLAMA3_2_1B_SPINQUANT | 1.14 |
67-
| LLAMA3_2_1B_QLORA | 1.18 |
68-
| LLAMA3_2_3B | 6.43 |
69-
| LLAMA3_2_3B_SPINQUANT | 2.55 |
70-
| LLAMA3_2_3B_QLORA | 2.65 |
63+
| Model | XNNPACK [GB] |
64+
| ------------------------------ | :----------: |
65+
| LLAMA3_2_1B | 2.47 |
66+
| LLAMA3_2_1B_SPINQUANT | 1.14 |
67+
| LLAMA3_2_1B_QLORA | 1.18 |
68+
| LLAMA3_2_3B | 6.43 |
69+
| LLAMA3_2_3B_SPINQUANT | 2.55 |
70+
| LLAMA3_2_3B_QLORA | 2.65 |
71+
| QWEN3_0.6B | 1.11 |
72+
| QWEN3_0.6B_QUANTIZED | 0.47 |
73+
| QWEN3_1.7B | 3.21 |
74+
| QWEN3_1.7B_QUANTIZED | 1.21 |
75+
| QWEN3_4B | 7.49 |
76+
| QWEN3_4B_QUANTIZED | 2.50 |
77+
| QWEN2_5_0.5B | 0.92 |
78+
| QWEN2_5_0.5B_QUANTIZED | 0.39 |
79+
| QWEN2_5_1.5B | 2.88 |
80+
| QWEN2_5_1.5B_QUANTIZED | 1.06 |
81+
| QWEN2_5_3B | 5.75 |
82+
| QWEN2_5_3B_QUANTIZED | 1.95 |
83+
| HAMMER2_1_0.5B | 0.92 |
84+
| HAMMER2_1_0.5B_QUANTIZED | 0.39 |
85+
| HAMMER2_1_1.5B | 2.88 |
86+
| HAMMER2_1_1.5B_QUANTIZED | 1.06 |
87+
| HAMMER2_1_3B | 5.75 |
88+
| HAMMER2_1_3B_QUANTIZED | 1.91 |
89+
| PHI4_MINI | 7.15 |
90+
| PHI4_MINI_QUANTIZED | 2.62 |
91+
| SMOLLM2_135M | 0.25 |
92+
| SMOLLM2_135M_QUANTIZED | 0.52 |
93+
| SMOLLM2_360M | 0.67 |
94+
| SMOLLM2_360M_QUANTIZED | 1.27 |
95+
| SMOLLM2_1.7B | 3.19 |
96+
| SMOLLM2_1.7B_QUANTIZED | 0.95 |
97+
| LFM2_5_1.2B_INSTRUCT | 2.43 |
98+
| LFM2_5_1.2B_INSTRUCT_QUANTIZED | 0.74 |
99+
| LFM2_5_350M_FP16 | 0.79 |
100+
| LFM2_5_350M_QUANTIZED | 0.26 |
71101

72102
## Speech to text
73103

0 commit comments

Comments
 (0)