Commit ef22b3e
authored
docs: fix metrics endpoint description in server README (ggml-org#22879)
* docs: fix metrics endpoint description in server README
Required model query parameter for router mode described.
Removed metrics:
- llamacpp:kv_cache_usage_ratio
- llamacpp:kv_cache_tokens
Added metrics:
- llamacpp:prompt_seconds_total
- llamacpp:tokens_predicted_seconds_total
- llamacpp:n_decode_total
- llamacpp:n_busy_slots_per_decode
* server: fix metrics type for n_busy_slots_per_decode metric1 parent 68e7ea3 commit ef22b3e
2 files changed
Lines changed: 21 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1043 | 1043 | | |
1044 | 1044 | | |
1045 | 1045 | | |
1046 | | - | |
1047 | | - | |
1048 | | - | |
1049 | | - | |
1050 | | - | |
1051 | | - | |
1052 | | - | |
1053 | | - | |
1054 | | - | |
1055 | | - | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
1056 | 1063 | | |
1057 | 1064 | | |
1058 | 1065 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3622 | 3622 | | |
3623 | 3623 | | |
3624 | 3624 | | |
3625 | | - | |
3626 | | - | |
3627 | | - | |
3628 | | - | |
3629 | 3625 | | |
3630 | 3626 | | |
3631 | 3627 | | |
| |||
3643 | 3639 | | |
3644 | 3640 | | |
3645 | 3641 | | |
| 3642 | + | |
| 3643 | + | |
| 3644 | + | |
| 3645 | + | |
3646 | 3646 | | |
3647 | 3647 | | |
3648 | 3648 | | |
| |||
0 commit comments