Commit daad723
authored
[feat]Add Layerwise Connector (#656)
<!-- Thanks for sending a pull request!
BEFORE SUBMITTING, PLEASE READ OUR OFFICIAL WEBSITE.
-->
# Purpose
Introduce Layerwise Connector. The LayerwiseConnector is designed to
overlap computation with load/dump operations, thereby speeding up the
prefill phase.
As soon as the attention computation for layer i finishes, its KV-cache
is dumped immediately; we no longer wait until every layer is done.
Before a forward pass begins, all KV-caches are loaded asynchronously,
and layer i can start its attention computation as soon as its own
KV-cache becomes available—no need to wait for the entire load to
complete.
> **:warning: Note**
> **Only PipelineStore is available for layerwise, NfsStore doesn't
support layerwise**
# Modifications
# Test
## Performance
H20-QwQ32B-TP4-prefix cache-80%hit-TTFT
| input | output | parallel| recalculation |PipelineStore with layerwise
|speedup | NfsStore | speedup |
|------|----------|--------|--------|---------------|--------|-------------|--------|
| 4000 | 1000 | 1 | 551.99 | 169.26 | 226.12%| 177.69 | 210.65%|
| 8000 | 1000 | 1 | 1102.31| 298.52 | 269.26%| 327.72 | 236.36%|
| 16000 | 1000 | 1 | 2356.01| 610.73 | 285.77%| 688.89 | 242.00%|
| 32000 | 1000 | 1 | 5341.1 | 1384.49 | 285.78%| 1544.76 | 245.76%|
| 4000 | 1000 | 8 | 2642.04| 981.39 | 169.21%| 1038.27 | 154.47%|
| 8000 | 1000 | 8 | 5031.3 | 1706.1 | 194.90%| 1858.99 | 170.65%|
| 16000 | 1000 | 8 | 10840.92|3250.35 | 233.53%| 3544.2 | 205.88%|
| 32000 | 1000 | 8 | 24709.55|6848.46 | 260.80%| 7958.29 | 210.49%|
| 4000 | 1000 | 16 | 4791.96| 1628.33 | 194.29%| 1747.01 | 174.29%|
| 8000 | 1000 | 16 | 9489.08| 3002.13 | 216.08%| 3269.28 | 190.25%|
| 16000 | 1000 | 16 | 20556.38|5677.6 | 262.06%| 6342.14 | 224.12%|
| 32000 | 1000 | 16 | 46992.56|12584.95 | 273.40%| 14296.63 | 228.70%|
H20-QwQ32B-TP4-20%hit-TTFT
| input | output | parallel| recalculation |PipelineStore with layerwise
|speedup | NfsStore | speedup |
|----------|----------|--------|----------|---------------|--------|-------------|--------|
| 4000 | 1000 | 1 | 551.99 | 473.26 | 16.64% | 483.74 | 14.11% |
| 8000 | 1000 | 1 | 1102.31 | 936.96 | 17.65% | 961.04 | 14.70% |
| 16000 | 1000 | 1 | 2356.01 | 1986.57 | 18.60% | 2044.30 | 15.25% |
| 32000 | 1000 | 1 | 5341.10 | 4615.83 | 15.71% | 4732.61 | 12.86% |
| 4000 | 1000 | 8 | 2642.04 | 2352.42 | 12.31% | 2507.12 | 5.38% |
| 8000 | 1000 | 8 | 5031.30 | 4481.31 | 12.27% | 4849.90 | 3.74% |
| 16000 | 1000 | 8 | 10840.92 | 9342.19 | 16.04% | 10010.21 | 8.30% |
| 32000 | 1000 | 8 | 24709.55 | 21135.45 | 16.91% | 22135.89 | 11.63% |
| 4000 | 1000 | 16 | 4791.96 | 4173.80 | 14.81% | 4498.91 | 6.51% |
| 8000 | 1000 | 16 | 9489.08 | 8264.28 | 14.82% | 9005.86 | 5.37% |
| 16000 | 1000 | 16 | 20556.38 | 17299.15 | 18.83% | 18669.48 | 10.11% |
| 32000 | 1000 | 16 | 46992.56 | 39894.73 | 17.79% | 41823.67 | 12.36% |1 parent 6f90147 commit daad723
5 files changed
Lines changed: 236 additions & 44 deletions
File tree
- docs/source
- getting-started
- user-guide/prefix-cache
- ucm
- integration/vllm
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
| 136 | + | |
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
| 110 | + | |
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
105 | | - | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
106 | 121 | | |
107 | 122 | | |
108 | 123 | | |
| |||
113 | 128 | | |
114 | 129 | | |
115 | 130 | | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | 131 | | |
120 | 132 | | |
121 | 133 | | |
| |||
146 | 158 | | |
147 | 159 | | |
148 | 160 | | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
149 | 170 | | |
150 | 171 | | |
151 | 172 | | |
| |||
0 commit comments