chore: add support for quantized versions of CV models CLIP, Style Transfer, EfficientNetV2, SSDLite#940
Conversation
|
I will run the new models later today to see if they work. I think you should also benchmark them and add the results to our docs. You can ask @IgorSwat for the tips about benchmarking ;D |
|
I've added the profiling results to the corresponding READMEs in the internal exports gitlab. They all look fine to me (>80% delegated ops), but you can also take a look to make sure everything is correct. |
NorbertKlockiewicz
left a comment
There was a problem hiding this comment.
Please change examples in demo apps to use quantized models by default.
|
Please add in benchmark section how memory usage was measured |
| ? `${URL_PREFIX}-efficientnet-v2-s/${NEXT_VERSION_TAG}/coreml/efficientnet_v2_s_coreml_fp32.pte` | ||
| : `${URL_PREFIX}-efficientnet-v2-s/${NEXT_VERSION_TAG}/xnnpack/efficientnet_v2_s_xnnpack_fp32.pte`; | ||
| const EFFICIENTNET_V2_S_QUANTIZED_MODEL = | ||
| Platform.OS === `ios` | ||
| ? `${URL_PREFIX}-efficientnet-v2-s/${NEXT_VERSION_TAG}/coreml/efficientnet_v2_s_coreml_fp16.pte` | ||
| : `${URL_PREFIX}-efficientnet-v2-s/${NEXT_VERSION_TAG}/xnnpack/efficientnet_v2_s_xnnpack_int8.pte`; |
There was a problem hiding this comment.
I'm thinking if we should silently run the coreml, or let the user handle it? Some users might prefer consistency across platforms, rather than performance. cc @NorbertKlockiewicz
There was a problem hiding this comment.
I think we should discuss this and plan accordingly. The simplest way would be to have an constant for both xnnpack and coreml model, but with quantized models that's already 4 constants. Maybe we can figure out a better way so the users will be able to easily switch between models delegated to different backends. Answering your question, we should definitely let users select it.
There was a problem hiding this comment.
I guess we can add constants like EFFICIENTNET_V2_S_XNNPACK_FP32 that signal unambiguously both the backend and the precision for "power users" who want control; and additionally have simple constants like just EFFICIENTNET_V2_S which will just be the fastest variant of the model on the given platform.
There was a problem hiding this comment.
Let's keep it as it is now and we we will do it properly in other PR.
7f903fd to
e2f9f97
Compare
e2f9f97 to
6556b74
Compare
|
Just a question, why for react-native-executorch-ssdlite320-mobilenet-v3-large for xnnpack there is only f32 version? And is there any reason why there is no CoreML for react-native-executorch-clip-vit-base-patch32. If there were some problems with exports, I think we should create issues on ExecuTorch repo for that. |
|
@msluszniak I didn't export the CLIP Vision model to CoreML backend because the xnnpack variant is already extremely fast on iOS (<20ms |
|
The question is if they are super-fast also on low-tier devices. I guess that the fact that models are super-fast on top devices, doesn't mean that we can speed them up for slower devices ;) but of course you are in the better position right now to decide if this is worth trying. |
msluszniak
left a comment
There was a problem hiding this comment.
Ok, I tested all the models, and everything works correctly 🚀
Description
Adds support for quantized versions of CV models CLIP, Style Transfer, EfficientNetV2, SSDLite and updates paths to non-quantized models exported with ExecuTorch v1.1.0.
Introduces a breaking change?
Type of change
Tested on
Testing instructions
SSDLITE_320_MOBILENET_V3_LARGEEFFICIENTNET_V2_S,EFFICIENTNET_V2_S_QUANTIZEDSTYLE_TRANSFER_CANDY,STYLE_TRANSFER_MOSAIC,STYLE_TRANSFER_UDNIE,STYLE_TRANSFER_RAIN_PRINCESS,STYLE_TRANSFER_CANDY_QUANTIZED,STYLE_TRANSFER_MOSAIC_QUANTIZED,STYLE_TRANSFER_UDNIE_QUANTIZED,STYLE_TRANSFER_RAIN_PRINCESS_QUANTIZED,CLIP_VIT_BASE_PATCH32_IMAGE,CLIP_VIT_BASE_PATCH32_IMAGE_QUANTIZEDScreenshots
Related issues
Closes #719
Checklist
Additional notes