This issue tracks progress on running Bamba on vLLM. Success for this issue implies the following: - [ ] Running the model successfully from the HF checkpoint in vLLM (https://github.com/vllm-project/vllm/pull/10909) - [ ] Ensuring chunked prefill and TP work in vLLM - [ ] Closing the performance gap in vLLM wrt Llama of similar sizes - [ ] Reporting the performance results in a blog post cc @raghukiran1224 @fabianlim @AdnanHoque
This issue tracks progress on running Bamba on vLLM.
Success for this issue implies the following:
cc @raghukiran1224 @fabianlim @AdnanHoque