Offload model to RAM and run expert and dflash model in the GPU
Offload model to RAM and run expert and dflash model in the GPU