Commit 4fdf652
feat(vllm): Update to vllm v0.17.1, lmcache to v0.4.1, and switch vllm-tensorizer build to dedicated buildkit endpoints (#132)
* Update vllm version to v0.17.1.
* Update lmcache to v0.4.1. Also move LMCache version parameter to build config
* Keep the same flashinfer version (v0.6.4). Checked openai's vllm image and vllm's v.0.17.1 runtime requirements to confirm the version
* The vllm upgrade caused the docker buildkit job to consistently OOM. Per CBS team suggestion, we switch to dedicated buildkit endpoints for vllm -tensorizer to launch pods with much higher mem limit (~500G vs ~60G)
---------
Co-authored-by: rexwang8 <rexkingsbackyard@gmail.com>1 parent c684e31 commit 4fdf652
4 files changed
Lines changed: 13 additions & 4 deletions
File tree
- .github
- configurations
- workflows
- vllm-tensorizer
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
5 | 7 | | |
6 | 8 | | |
7 | 9 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
27 | 32 | | |
28 | 33 | | |
29 | 34 | | |
| |||
62 | 67 | | |
63 | 68 | | |
64 | 69 | | |
65 | | - | |
| 70 | + | |
66 | 71 | | |
67 | 72 | | |
68 | | - | |
| 73 | + | |
69 | 74 | | |
70 | 75 | | |
71 | 76 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| 30 | + | |
29 | 31 | | |
30 | 32 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
| 82 | + | |
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| |||
0 commit comments