Skip to content

Commit 708be7e

Browse files
committed
Add benchmark results for different quantizations
1 parent 9684f36 commit 708be7e

1 file changed

Lines changed: 30 additions & 1 deletion

File tree

README.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,9 +207,12 @@ In order to test the inference speed on your machine, you can run the following
207207
# run the benchmark of PyTorch
208208
python scripts/benchmark.py
209209

210-
# run the benchmark of vit.cpp
210+
# run the benchmark of vit.cpp for non-qunatized model
211211
./scripts/benchmark.sh
212212

213+
# to run the benchamrk for qunatized models; 4 threads and quantize flag
214+
./scripts/benchmark.sh 4 1
215+
213216
Both scripts use 4 threads by default. In Python, the `threadpoolctl` library is used to limit the number of threads used by PyTorch.
214217

215218
## Quantization
@@ -234,6 +237,32 @@ For example, you can run the following to convert the model to q5_1:
234237

235238
Then you can use `tiny-ggml-model-f16-quant.gguf` just like the model in F16.
236239

240+
### Results
241+
242+
Here are the benchmarks for the different models and quantizations on my machine:
243+
244+
| Model | Quantization | Speed (ms) | Mem (MB) |
245+
| :----: | :----------: | :-----------: | :---------------: |
246+
| tiny | q4_0 | 100 ms | 12 MB |
247+
| tiny | q4_1 | 102 ms | 12 MB |
248+
| tiny | q5_0 | 116 ms | 13 MB |
249+
| tiny | q5_1 | 112 ms | 13 MB |
250+
| tiny | q8_0 | 92 ms | 15 MB |
251+
| small | q4_0 | 261 ms | 23 MB |
252+
| small | q4_1 | 229 ms | 24 MB |
253+
| small | q5_0 | 291 ms | 25 MB |
254+
| small | q5_1 | 276 ms | 27 MB |
255+
| small | q8_0 | 232 ms | 33 MB |
256+
| base | q4_0 | 714 ms | 61 MB |
257+
| base | q4_1 | 657 ms | 66 MB |
258+
| base | q5_0 | 879 ms | 71 MB |
259+
| base | q5_1 | 838 ms | 76 MB |
260+
| base | q8_0 | 658 ms | 102 MB |
261+
| large | q4_0 | 2189 ms | 181 MB |
262+
| large | q4_1 | 1935 ms | 199 MB |
263+
| large | q5_0 | 2708 ms | 217 MB |
264+
| large | q5_1 | 2560 ms | 235 MB |
265+
| large | q8_0 | 2042 ms | 325 MB |
237266

238267
## To-Do List
239268

0 commit comments

Comments
 (0)