speculative_decoding/README.md at master · kkrushnyakov/speculative_decoding

This is a tool to run llama.cpp in a remote Docker container with different parameters and to collect benchmarks from logs. It is designed to measure the influence of runtime parameters on model performance, especially for speculative decoding.