You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use mlx-vlm-style predicate chaining for quantization
Keep the local nn.quantize call but switch the class_predicate to the
compose-with-model.quant_predicate pattern from mlx-vlm: chain the
default skip-vision / group-size predicate with the model's own
predicate, and record any per-layer dict results so the load path
re-quantizes the same way.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments