Commit a0e7458
ssjia
[ET-VK] Lower reduce_peak_memory threshold from 500 MB to 10 MB
During prepack, staging buffers accumulate in buffers_to_clear_ until
flush() is called. Previously, the reduce_peak_memory path (which calls
submit_and_wait + flush to free staging buffers incrementally) only
triggered when total constant data exceeded 500 MB. This meant models
with moderate weight sizes (e.g. 42 MB) never benefited from incremental
cleanup, causing all staging buffers to coexist in memory until the
final flush.
Lowering the threshold to 10 MB enables incremental staging buffer
cleanup for most models. On SceneX V9 FP16 (42 MB weights, Samsung S24
Adreno 750), this reduces transient VMA peak during prepack from 89.6 MB
to 57.3 MB (-36%) at a cost of ~15 ms additional load latency (+4.4%).
Steady-state memory and inference performance are unaffected.
Authored with Claude.
Differential Revision: [D100332227](https://our.internmc.facebook.com/intern/diff/D100332227/)
[ghstack-poisoned]1 parent 930ecfd commit a0e7458
1 file changed
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1134 | 1134 | | |
1135 | 1135 | | |
1136 | 1136 | | |
1137 | | - | |
| 1137 | + | |
1138 | 1138 | | |
| 1139 | + | |
1139 | 1140 | | |
1140 | 1141 | | |
1141 | 1142 | | |
| |||
0 commit comments