Commit d735725
committed
server: expose speculative decoding counters in Prometheus metrics
Adds two new counters to the /metrics endpoint:
- llamacpp:spec_tokens_drafted_total
- llamacpp:spec_tokens_accepted_total
These are accumulated via server_metrics::on_prediction() using the
per-slot n_draft_total and n_draft_accepted fields already tracked
during speculative decoding. Acceptance rate can be derived as
spec_tokens_accepted_total / spec_tokens_drafted_total.1 parent fcae601 commit d735725
3 files changed
Lines changed: 22 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1064 | 1064 | | |
1065 | 1065 | | |
1066 | 1066 | | |
| 1067 | + | |
| 1068 | + | |
1067 | 1069 | | |
1068 | 1070 | | |
1069 | 1071 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
564 | 564 | | |
565 | 565 | | |
566 | 566 | | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
567 | 570 | | |
568 | 571 | | |
569 | 572 | | |
| |||
582 | 585 | | |
583 | 586 | | |
584 | 587 | | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
585 | 591 | | |
586 | 592 | | |
587 | 593 | | |
| |||
2001 | 2007 | | |
2002 | 2008 | | |
2003 | 2009 | | |
| 2010 | + | |
| 2011 | + | |
| 2012 | + | |
2004 | 2013 | | |
2005 | 2014 | | |
2006 | 2015 | | |
| |||
3713 | 3722 | | |
3714 | 3723 | | |
3715 | 3724 | | |
| 3725 | + | |
| 3726 | + | |
| 3727 | + | |
| 3728 | + | |
| 3729 | + | |
| 3730 | + | |
| 3731 | + | |
| 3732 | + | |
3716 | 3733 | | |
3717 | 3734 | | |
3718 | 3735 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
526 | 526 | | |
527 | 527 | | |
528 | 528 | | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
529 | 532 | | |
530 | 533 | | |
531 | 534 | | |
| |||
0 commit comments