Commit 10e5b14
authored
llama-quant : correct
* llama-quant : correct `n_attention_wv` usage
In ggml-org#19770, I introduced a regression in the way the
`quantize_state_impl` counter values were initialized. I was
incrementing and using `n_attention_wv` in the same loop, when it should
have been fixed by the time we're deciding tensor types in
`llama_tensor_get_type_impl` (for `use_more_bits`).
I never observed a difference in any of [my
tests](ggml-org#19770 (comment))
- it was only after @bartowski kindly pointed this out that I realized
it was incorrect. (Thanks!)
* simplifyn_attention_wv usage (ggml-org#20357)1 parent 90b2731 commit 10e5b14
1 file changed
Lines changed: 16 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
870 | 870 | | |
871 | 871 | | |
872 | 872 | | |
873 | | - | |
874 | | - | |
875 | | - | |
876 | 873 | | |
877 | 874 | | |
878 | 875 | | |
| |||
979 | 976 | | |
980 | 977 | | |
981 | 978 | | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
982 | 995 | | |
983 | 996 | | |
984 | 997 | | |
| |||
991 | 1004 | | |
992 | 1005 | | |
993 | 1006 | | |
994 | | - | |
995 | | - | |
996 | | - | |
997 | | - | |
998 | | - | |
999 | | - | |
1000 | | - | |
1001 | | - | |
1002 | | - | |
1003 | | - | |
1004 | 1007 | | |
1005 | 1008 | | |
1006 | 1009 | | |
| |||
0 commit comments