Commit 3eca3e2
ssjia
Update base for Update on "[ET-VK][conv1d] Implement height-packed depthwise conv1d operator"
Implement a depthwise conv1d operator using height-packed layout where channels
are the packed dimension (WHCN dim 1). Depthwise conv applies a separate filter
to each channel independently (groups=C), so 4 channels can be processed in
parallel using element-wise vec4 FMA over kernel positions.
Thread mapping: X=C/4, Y=L_out, Z=N. Each thread computes one output texel
(4 channels at one spatial position). Inner loop iterates over kernel positions
K with bounds-checked input access for padding.
Weight [C,1,K] is prepacked as channels-packed so each vec4 load gives 4
channels' weights at one kernel position. Supports both buffer and texture3d
storage, fp32/fp16, optional bias, and arbitrary stride/padding/dilation.
Registered as et_vk.conv1d_dw.default (standalone custom op).
Performance on Adreno 750 (S24):
- [1,128,4096] K=31 buffer f16: 231 GFLOP/s
- [1,128,4096] K=31 buffer f32: 155 GFLOP/s
- [1,512,2048] K=5 buffer f32: 66 GFLOP/s
Differential Revision: [D97344091](https://our.internmc.facebook.com/intern/diff/D97344091/)
[ghstack-poisoned]1 parent b8ba505 commit 3eca3e2
2 files changed
Lines changed: 41 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| 59 | + | |
| 60 | + | |
59 | 61 | | |
60 | 62 | | |
61 | 63 | | |
62 | 64 | | |
| 65 | + | |
| 66 | + | |
63 | 67 | | |
64 | 68 | | |
65 | 69 | | |
| |||
190 | 194 | | |
191 | 195 | | |
192 | 196 | | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
193 | 205 | | |
194 | 206 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
| 20 | + | |
19 | 21 | | |
20 | 22 | | |
21 | 23 | | |
| |||
117 | 119 | | |
118 | 120 | | |
119 | 121 | | |
| 122 | + | |
| 123 | + | |
120 | 124 | | |
121 | 125 | | |
122 | 126 | | |
123 | 127 | | |
124 | 128 | | |
| 129 | + | |
| 130 | + | |
125 | 131 | | |
126 | 132 | | |
127 | 133 | | |
| |||
181 | 187 | | |
182 | 188 | | |
183 | 189 | | |
184 | | - | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
185 | 193 | | |
186 | 194 | | |
187 | 195 | | |
| |||
199 | 207 | | |
200 | 208 | | |
201 | 209 | | |
202 | | - | |
203 | | - | |
| 210 | + | |
| 211 | + | |
204 | 212 | | |
205 | 213 | | |
206 | 214 | | |
207 | 215 | | |
208 | 216 | | |
209 | 217 | | |
210 | | - | |
211 | | - | |
212 | | - | |
| 218 | + | |
213 | 219 | | |
214 | 220 | | |
215 | 221 | | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
216 | 225 | | |
217 | 226 | | |
218 | 227 | | |
| |||
240 | 249 | | |
241 | 250 | | |
242 | 251 | | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
243 | 255 | | |
244 | | - | |
245 | 256 | | |
246 | 257 | | |
247 | 258 | | |
248 | | - | |
| 259 | + | |
249 | 260 | | |
250 | 261 | | |
251 | 262 | | |
252 | 263 | | |
253 | 264 | | |
254 | 265 | | |
255 | 266 | | |
256 | | - | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
257 | 277 | | |
258 | 278 | | |
259 | 279 | | |
| |||
0 commit comments