@@ -240,29 +240,30 @@ Then you can use `tiny-ggml-model-f16-quant.gguf` just like the model in F16.
240240### Results
241241
242242Here are the benchmarks for the different models and quantizations on my machine:
243+ For accurate estimation of run times, these benchmarks were run 100 times each.
243244
244245| Model | Quantization | Speed (ms) | Mem (MB) |
245246| :----: | :----------: | :-----------: | :---------------: |
246- | tiny | q4_0 | 100 ms | 12 MB |
247- | tiny | q4_1 | 102 ms | 12 MB |
247+ | tiny | q4_0 | 105 ms | 12 MB |
248+ | tiny | q4_1 | 97 ms | 12 MB |
248249| tiny | q5_0 | 116 ms | 13 MB |
249250| tiny | q5_1 | 112 ms | 13 MB |
250- | tiny | q8_0 | 92 ms | 15 MB |
251- | small | q4_0 | 261 ms | 23 MB |
252- | small | q4_1 | 229 ms | 24 MB |
253- | small | q5_0 | 291 ms | 25 MB |
254- | small | q5_1 | 276 ms | 27 MB |
255- | small | q8_0 | 232 ms | 33 MB |
256- | base | q4_0 | 714 ms | 61 MB |
257- | base | q4_1 | 657 ms | 66 MB |
258- | base | q5_0 | 879 ms | 71 MB |
259- | base | q5_1 | 838 ms | 76 MB |
260- | base | q8_0 | 658 ms | 102 MB |
251+ | tiny | q8_0 | 90 ms | 15 MB |
252+ | small | q4_0 | 240 ms | 23 MB |
253+ | small | q4_1 | 224 ms | 24 MB |
254+ | small | q5_0 | 288 ms | 25 MB |
255+ | small | q5_1 | 277 ms | 27 MB |
256+ | small | q8_0 | 228 ms | 33 MB |
257+ | base | q4_0 | 704 ms | 61 MB |
258+ | base | q4_1 | 626 ms | 66 MB |
259+ | base | q5_0 | 851 ms | 71 MB |
260+ | base | q5_1 | 806 ms | 76 MB |
261+ | base | q8_0 | 659 ms | 102 MB |
261262| large | q4_0 | 2189 ms | 181 MB |
262- | large | q4_1 | 1935 ms | 199 MB |
263- | large | q5_0 | 2708 ms | 217 MB |
264- | large | q5_1 | 2560 ms | 235 MB |
265- | large | q8_0 | 2042 ms | 325 MB |
263+ | large | q4_1 | 1919 ms | 199 MB |
264+ | large | q5_0 | 2676 ms | 217 MB |
265+ | large | q5_1 | 2547 ms | 235 MB |
266+ | large | q8_0 | 1994 ms | 325 MB |
266267
267268## To-Do List
268269
0 commit comments