You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Narrow test_hf_vllm_export_offload to skip save_pretrained round-trip
Monkey-patch save_pretrained to a no-op so the test exercises only the
PR's new inplace_mem_efficient=True contribution (per-layer
enable_weight_access_and_writeback dispatch + inplace fake-quant
writeback) without tripping transformers load_offloaded_parameter on
SequentialHook — a pre-existing upstream limitation unrelated to this
PR's new code.
Broaden the folded-weights assertion to cover all decoder layers (not
just the offloaded layer 0) so regressions in the on-GPU inplace path
are also caught. The vllm_fq_modelopt_state.pth contents are still
asserted since torch.save happens before save_pretrained.
Signed-off-by: realAsma <akuriparambi@nvidia.com>
0 commit comments