Skip to content

Commit 9d34231

Browse files
authored
llama-quant : default ftype param Q5_1 --> Q8_0 (#20828)
Change the default `ftype` in `llama_model_quantize_params` from `LLAMA_FTYPE_MOSTLY_Q5_1` to `LLAMA_FTYPE_MOSTLY_Q8_0`. In case some external program naively uses the default quantization params, we should probably default to a known-good type like Q8_0 rather than Q5_1, which is rather old.
1 parent 8ea8fee commit 9d34231

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

src/llama-quant.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1283,7 +1283,7 @@ static void llama_model_quantize_impl(const std::string & fname_inp, const std::
12831283
llama_model_quantize_params llama_model_quantize_default_params() {
12841284
llama_model_quantize_params result = {
12851285
/*.nthread =*/ 0,
1286-
/*.ftype =*/ LLAMA_FTYPE_MOSTLY_Q5_1,
1286+
/*.ftype =*/ LLAMA_FTYPE_MOSTLY_Q8_0,
12871287
/*.output_tensor_type =*/ GGML_TYPE_COUNT,
12881288
/*.token_embedding_type =*/ GGML_TYPE_COUNT,
12891289
/*.allow_requantize =*/ false,

0 commit comments

Comments
 (0)