Converted Core ML Model Zoo.
Core ML is a machine learning framework by Apple. If you are iOS developer, you can easly use machine learning models in your Xcode project.
Take a look this model zoo, and if you found the CoreML model you want, download the model from google drive link and bundle it in your project. Or if the model have sample project link, try it and see how to use the model in the project. You are free to do or not.
If you like this repository, please give me a star so I can do my best.
-
Stable Diffusion :text2image
-
Face Manipulation :NEW
-
Image Harmonization :NEW
-
Image Deblurring :NEW
-
Text-to-Speech :NEW
-
Multilingual OCR :NEW
You can get the model converted to CoreML format from the link of Google drive. See the section below for how to use it in Xcode. The license for each model conforms to the license for the original project.
| Google Drive Link | Size | Dataset | Original Project | License |
|---|---|---|---|---|
| Efficientnetb0 | 22.7 MB | ImageNet | TensorFlowHub | Apache2.0 |
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| Efficientnetv2 | 85.8 MB | ImageNet | Google/autoML | Apache2.0 | 2021 |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| VisionTransformer-B16 | 347.5 MB | ImageNet | google-research/vision_transformer | Apache2.0 | 2021 |
Local Features Coupling Global Representations for Visual Recognition.
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| Conformer-tiny-p16 | 94.1 MB | ImageNet | pengzhiliang/Conformer | Apache2.0 | 2021 |
Data-efficient Image Transformers
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| DeiT-base384 | 350.5 MB | ImageNet | facebookresearch/deit | Apache2.0 | 2021 |
Making VGG-style ConvNets Great Again
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| RepVGG-A0 | 33.3 MB | ImageNet | DingXiaoH/RepVGG | MIT | 2021 |
Designing Network Design Spaces
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| regnet_y_400mf | 16.5 MB | ImageNet | TORCHVISION.MODELS | MIT | 2020 |
CVNets: A library for training computer vision networks
| Google Drive Link | Size | Dataset | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| MobileViTv2 | 18.8 MB | ImageNet | apple/ml-cvnets | apple | 2022 |
Lightweight classification model optimized for mobile devices. Ultra-fast inference with 67.7% top-1 accuracy.
| Google Drive Link | Size | Dataset | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| MobileNetV3-Small (TBD) | 4.9 MB | ImageNet | pytorch/vision | BSD-3 | 2019 | MobileNetV3SmallDemo |
A ConvNet for the 2020s. Pure CNN architecture that competes with Vision Transformers. 82.5% top-1 accuracy.
| Google Drive Link | Size | Dataset | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| ConvNeXt-Tiny (TBD) | 54.6 MB | ImageNet | facebookresearch/ConvNeXt | MIT | 2022 | ConvNeXtTinyDemo |
Official CoreML model and sample app available:
- CoreML Model: apple/coreml-FastViT-T8
- iOS Sample: huggingface/coreml-examples/FastViTSample
- Source: apple/ml-fastvit
Official CoreML model and benchmark app available:
- CoreML Model + iOS App: apple/ml-mobileone
Rethinking Vision Transformers for MobileNet Size and Speed. Lightweight ViT for mobile. 76.2% top-1 accuracy.
| Google Drive Link | Size | Dataset | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| EfficientFormerV2-S0 (TBD) | 7.2 MB | ImageNet | snap-research/EfficientFormer | Apache2.0 | 2023 | EfficientFormerV2Demo |
GhostNetV2: Enhance Cheap Operation with Long-Range Attention. Ghost module with DFC attention. 75.3% top-1 accuracy.
| Google Drive Link | Size | Dataset | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| GhostNetV2-100 (TBD) | 11.9 MB | ImageNet | huawei-noah/Efficient-AI-Backbones | Apache2.0 | 2022 | GhostNetV2Demo |
MetaFormer is Actually What You Need for Vision. Uses simple pooling instead of attention. 77.2% top-1 accuracy.
| Google Drive Link | Size | Dataset | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| PoolFormer-S12 (TBD) | 22.9 MB | ImageNet | sail-sg/poolformer | Apache2.0 | 2022 | PoolFormerDemo |
LeViT: A Vision Transformer in ConvNet's Clothing. Fast hybrid CNN-Transformer. 76.6% top-1 accuracy.
| Google Drive Link | Size | Dataset | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| LeViT-128S (TBD) | 16.0 MB | ImageNet | facebookresearch/LeViT | Apache2.0 | 2021 | LeViTDemo |
| Google Drive Link | Size | Output | Original Project | License | Note | Sample Project |
|---|---|---|---|---|---|---|
| YOLOv5s | 29.3MB | Confidence(MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4)) | ultralytics/yolov5 | GNU | Non Maximum Suppression has been added. | CoreML-YOLOv5 |
| Google Drive Link | Size | Output | Original Project | License | Note | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|
| YOLOv7 | 147.9MB | Confidence(MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4)) | WongKinYiu/yolov7 | GNU | Non Maximum Suppression has been added. | CoreML-YOLOv5 |
| Google Drive Link | Size | Output | Original Project | License | Note | Sample Project |
|---|---|---|---|---|---|---|
| YOLOv8s | 45.1MB | Confidence(MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4)) | ultralytics/ultralytics | GNU | Non Maximum Suppression has been added. | CoreML-YOLOv5 |
| Google Drive Link | Size | Output | Original Project | License |
|---|---|---|---|---|
| U2Net | 175.9 MB | Image(GRAYSCALE 320 × 320) | xuebinqin/U-2-Net | Apache |
| U2Netp | 4.6 MB | Image(GRAYSCALE 320 × 320) | xuebinqin/U-2-Net | Apache |
| Google Drive Link | Size | Output | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| IS-Net | 176.1 MB | Image(GRAYSCALE 1024 × 1024) | xuebinqin/DIS | Apache | 2022 | |
| IS-Net-General-Use | 176.1 MB | Image(GRAYSCALE 1024 × 1024) | xuebinqin/DIS | Apache | 2022 |
RMBG1.4 - The IS-Net enhanced with our unique training scheme and proprietary dataset.
| Google Drive Link | Size | Output | Original Project | License | year | Conversion Script |
|---|---|---|---|---|---|---|
| RMBG.mlpackage/RMBG.mlmodel | 176 MB | Image(GrayScale 1024x1024) | briaai/RMBG-1.4 | Creative Commons | 2024 |
| Google Drive Link | Size | Output | Original Project | License | Sample Project |
|---|---|---|---|---|---|
| face-Parsing | 53.2 MB | MultiArray(1 x 512 × 512) | zllrunning/face-parsing.PyTorch | MIT | CoreML-face-parsing |
Simple and Efficient Design for Semantic Segmentation with Transformers
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| SegFormer_mit-b0_1024x1024_cityscapes | 14.9 MB | MultiArray(512 × 1024) | NVlabs/SegFormer | NVIDIA | 2021 |
Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| BiSeNetV2_1024x1024_cityscapes | 12.8 MB | MultiArray | ycszen/BiSeNet | Apache2.0 | 2021 |
Disentangled Non-Local Neural Networks
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| dnl_r50-d8_512x512_80k_ade20k | 190.8 MB | MultiArray[512x512] | ADE20K | yinmh17/DNL-Semantic-Segmentation | Apache2.0 | 2020 |
Interlaced Sparse Self-Attention for Semantic Segmentation
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| isanet_r50-d8_512x512_80k_ade20k | 141.5 MB | MultiArray[512x512] | ADE20K | openseg-group/openseg.pytorch | MIT | ArXiv'2019/IJCV'2021 |
Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| fastfcn_r50-d32_jpu_aspp_512x512_80k_ade20k | 326.2 MB | MultiArray[512x512] | ADE20K | wuhuikai/FastFCN | MIT | ArXiv'2019 |
Non-local Networks Meet Squeeze-Excitation Networks and Beyond
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| gcnet_r50-d8_512x512_20k_voc12aug | 189 MB | MultiArray[512x512] | PascalVOC | xvjiarui/GCNet | Apache License 2.0 | ICCVW'2019/TPAMI'2020 |
Dual Attention Network for Scene Segmentation(CVPR2019)
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| danet_r50-d8_512x1024_40k_cityscapes | 189.7 MB | MultiArray[512x1024] | CityScapes | junfu1115/DANet | MIT | CVPR2019 |
Panoptic Feature Pyramid Networks
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| fpn_r50_512x1024_80k_cityscapes | 108.6 MB | MultiArray[512x1024] | CityScapes | facebookresearch/detectron2 | Apache License 2.0 | 2019 |
Code for binary segmentation of various cloths.
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| clothSegmentation | 50.1 MB | Image(GrayScale 640x960) | fashion-2019-FGVC6 | facebookresearch/detectron2 | MIT | 2020 |
EasyPortrait - Face Parsing and Portrait Segmentation Dataset.
| Google Drive Link | Size | Output | Original Project | License | year | Swift sample | Conversion Script |
|---|---|---|---|---|---|---|---|
| easyportrait-segformer512-fp | 7.6 MB | Image(GrayScale 512x512) * 9 | hukenovs/easyportrait | Creative Commons | 2023 | easyportrait-coreml |
DeepLabV3 with MobileNetV3-Large backbone. 21-class PASCAL VOC semantic segmentation (person, car, cat, dog, etc.).
| Google Drive Link | Size | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| DeepLabV3-MobileNetV3 (TBD) | 21.1 MB | MultiArray (1x21x512x512) | pytorch/vision | BSD-3 | 2019 | DeepLabV3Demo |
Lite R-ASPP with MobileNetV3-Large backbone. Ultra-lightweight 21-class semantic segmentation (57.9 mIoU). Only 6.3 MB.
| Google Drive Link | Size | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| LRASPP-MobileNetV3 (TBD) | 6.3 MB | MultiArray (1x21x512x512) | pytorch/vision | BSD-3 | 2019 | LRASPPDemo |
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| Real ESRGAN4x | 66.9 MB | Image(RGB 2048x2048) | xinntao/Real-ESRGAN | BSD 3-Clause License | 2021 |
| Real ESRGAN Anime4x | 66.9 MB | Image(RGB 2048x2048) | xinntao/Real-ESRGAN | BSD 3-Clause License | 2021 |
Towards Real-World Blind Face Restoration with Generative Facial Prior
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| GFPGAN | 337.4 MB | Image(RGB 512x512) | TencentARC/GFPGAN | Apache2.0 | 2021 |
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| BSRGAN | 66.9 MB | Image(RGB 2048x2048) | cszn/BSRGAN | 2021 |
| Google Drive Link | Size | Output | Original Project | License | year | Conversion Script |
|---|---|---|---|---|---|---|
| A-ESRGAN | 63.8 MB | Image(RGB 1024x1024) | aesrgan/A-ESRGANN | BSD 3-Clause License | 2021 |
Best-Buddy GANs for Highly Detailed Image Super-Resolution
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| Beby-GAN | 66.9 MB | Image(RGB 2048x2048) | dvlab-research/Simple-SR | MIT | 2021 |
The Residual in Residual Dense Network for image super-scaling.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| RRDN | 16.8 MB | Image(RGB 2048x2048) | idealo/image-super-resolution | Apache2.0 | 2018 |
Fast-SRGAN.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| Fast-SRGAN | 628 KB | Image(RGB 1024x1024) | HasnainRaz/Fast-SRGAN | MIT | 2019 |
Enhanced-SRGAN.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| ESRGAN | 66.9 MB | Image(RGB 2048x2048) | xinntao/ESRGAN | Apache 2.0 | 2018 |
Pretrained: 4xESRGAN
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| UltraSharp | 34 MB | Image(RGB 1024x1024) | Kim2019/ | CC-BY-NC-SA-4.0 | 2021 |
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| SRGAN | 6.1 MB | Image(RGB 2048x2048) | dongheehand/SRGAN-PyTorch | 2017 |
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| SRResNet | 6.1 MB | Image(RGB 2048x2048) | dongheehand/SRGAN-PyTorch | 2017 |
Lightweight Image Super-Resolution with Enhanced CNN.
| Google Drive Link | Size | Output | Original Project | License | year | Conversion Script |
|---|---|---|---|---|---|---|
| LESRCNN | 4.3 MB | Image(RGB 512x512) | hellloxiaotian/LESRCNN | 2020 |
Metric Learning based Interactive Modulation for Real-World Super-Resolution
| Google Drive Link | Size | Output | Original Project | License | year | Conversion Script |
|---|---|---|---|---|---|---|
| MMRealSRGAN | 104.6 MB | Image(RGB 1024x1024) | TencentARC/MM-RealSR | BSD 3-Clause | 2022 | |
| MMRealSRNet | 104.6 MB | Image(RGB 1024x1024) | TencentARC/MM-RealSR | BSD 3-Clause | 2022 |
Pytorch implementation of "Unsupervised Degradation Representation Learning for Blind Super-Resolution", CVPR 2021
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| DASR | 12.1 MB | Image(RGB 1024x1024) | The-Learning-And-Vision-Atelier-LAVA/DASR | MIT | 2022 |
Learning Temporal Consistency for Low Light Video Enhancement from Single Images.
| Google Drive Link | Size | Output | Original Project | License | Year |
|---|---|---|---|---|---|
| StableLLVE | 17.3 MB | Image(RGB 512x512) | zkawfanx/StableLLVE | MIT | 2021 |
Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement
| Google Drive Link | Size | Output | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| Zero-DCE | 320KB | Image(RGB 512x512) | Li-Chongyi/Zero-DCE | See Repo | 2021 |
Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement
| Google Drive Link | Size | Output | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| ZRetinexformer FiveK | 3.4MB | Image(RGB 512x512) | caiyuanhao1998/Retinexformer | MIT | 2023 | |
| ZRetinexformer NTIRE | 3.4MB | Image(RGB 512x512) | caiyuanhao1998/Retinexformer | MIT | 2023 |
Multi-Stage Progressive Image Restoration.
Debluring
Denoising
Deraining
| Google Drive Link | Size | Output | Original Project | License | Year |
|---|---|---|---|---|---|
| MPRNetDebluring | 137.1 MB | Image(RGB 512x512) | swz30/MPRNet | MIT | 2021 |
| MPRNetDeNoising | 108 MB | Image(RGB 512x512) | swz30/MPRNet | MIT | 2021 |
| MPRNetDeraining | 24.5 MB | Image(RGB 512x512) | swz30/MPRNet | MIT | 2021 |
Learning Enriched Features for Fast Image Restoration and Enhancement.
Denoising
Super Resolution
Contrast Enhancement
Low Light Enhancement
| Google Drive Link | Size | Output | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| MIRNetv2Denoising | 42.5 MB | Image(RGB 512x512) | swz30/MIRNetv2 | ACADEMIC PUBLIC LICENSE | 2022 | |
| MIRNetv2SuperResolution | 42.5 MB | Image(RGB 512x512) | swz30/MIRNetv2 | ACADEMIC PUBLIC LICENSE | 2022 | |
| MIRNetv2ContrastEnhancement | 42.5 MB | Image(RGB 512x512) | swz30/MIRNetv2 | ACADEMIC PUBLIC LICENSE | 2022 | |
| MIRNetv2LowLightEnhancement | 42.5 MB | Image(RGB 512x512) | swz30/MIRNetv2 | ACADEMIC PUBLIC LICENSE | 2022 |
| Google Drive Link | Size | Output | Original Project | License | Sample Project |
|---|---|---|---|---|---|
| MobileStyleGAN | 38.6MB | Image(Color 1024 × 1024) | bes-dev/MobileStyleGAN.pytorch | Nvidia Source Code License-NC | CoreML-StyleGAN |
| Google Drive Link | Size | Output | Original Project |
|---|---|---|---|
| DCGAN | 9.2MB | MultiArray | TensorFlowCore |
| Google Drive Link | Size | Output | Original Project | License | Usage |
|---|---|---|---|---|---|
| Anime2Sketch | 217.7MB | Image(Color 512 × 512) | Mukosame/Anime2Sketch | MIT | Drop an image to preview |
| Google Drive Link | Size | Output | Original Project | Conversion Script |
|---|---|---|---|---|
| AnimeGAN2Face_Paint_512_v2 | 8.6MB | Image(Color 512 × 512) | bryandlee/animegan2-pytorch |
| Google Drive Link | Size | Output | Original Project | License | Note |
|---|---|---|---|---|---|
| Photo2Cartoon | 15.2 MB | Image(Color 256 × 256) | minivision-ai/photo2cartoon | MIT | The output is little bit different from the original model. It cause some operations were converted replaced manually. |
| Google Drive Link | Size | Output | Original Project | Sample |
|---|---|---|---|---|
| AnimeGANv2_Hayao | 8.7MB | Image(256 x 256) | TachibanaYoshino/AnimeGANv2 | AnimeGANv2-iOS |
| Google Drive Link | Size | Output | Original Project |
|---|---|---|---|
| AnimeGANv2_Paprika | 8.7MB | Image(256 x 256) | TachibanaYoshino/AnimeGANv2 |
| Google Drive Link | Size | Output | Original Project |
|---|---|---|---|
| WarpGAN Caricature | 35.5MB | Image(256 x 256) | seasonSH/WarpGAN |
| Google Drive Link | Size | Output | Original Project |
|---|---|---|---|
| UGATIT_selfie2anime | 266.2MB(quantized) | Image(256x256) | taki0112/UGATIT |
| Google Drive Link | Size | Output | Original Project |
|---|---|---|---|
| CartoonGAN_Shinkai | 44.6MB | MultiArray | mnicnc404/CartoonGan-tensorflow |
| CartoonGAN_Hayao | 44.6MB | MultiArray | mnicnc404/CartoonGan-tensorflow |
| CartoonGAN_Hosoda | 44.6MB | MultiArray | mnicnc404/CartoonGan-tensorflow |
| CartoonGAN_Paprika | 44.6MB | MultiArray | mnicnc404/CartoonGan-tensorflow |
| Google Drive Link | Size | Output | Original Project | License | Year |
|---|---|---|---|---|---|
| fast-neural-style-transfer-cuphead | 6.4MB | Image(RGB 960x640) | eriklindernoren/Fast-Neural-Style-Transfer | MIT | 2019 |
| fast-neural-style-transfer-starry-night | 6.4MB | Image(RGB 960x640) | eriklindernoren/Fast-Neural-Style-Transfer | MIT | 2019 |
| fast-neural-style-transfer-mosaic | 6.4MB | Image(RGB 960x640) | eriklindernoren/Fast-Neural-Style-Transfer | MIT | 2019 |
Learning to Cartoonize Using White-box Cartoon Representations
| Google Drive Link | Size | Output | Original Project | License | Year |
|---|---|---|---|---|---|
| White_box_Cartoonization | 5.9MB | Image(1536x1536) | SystemErrorWang/White-box-Cartoonization | creativecommons | CVPR2020 |
White-box facial image cartoonizaiton
| Google Drive Link | Size | Output | Original Project | License | Year |
|---|---|---|---|---|---|
| FacialCartoonization | 8.4MB | Image(256x256) | SystemErrorWang/FacialCartoonization | creativecommons | 2020 |
| Google Drive Link | Size | Output | Original Project | License | Note | Sample Project |
|---|---|---|---|---|---|---|
| AOT-GAN-for-Inpainting | 60.8MB | MLMultiArray(3,512,512) | researchmm/AOT-GAN-for-Inpainting | Apache2.0 | To use see sample. | john-rocky/Inpainting-CoreML |
| Google Drive Link | Size | Input | Output | Original Project | License | Note | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| Lama | 216.6MB | Image (Color 800 × 800), Image (GrayScale 800 × 800) | Image (Color 800 × 800) | advimman/lama | Apache2.0 | To use see sample. | john-rocky/lama-cleaner-iOS | mallman/CoreMLaMa |
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
| Google Drive Link | Size | Output | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| MiDaS_Small | 66.3MB | MultiArray(1x256x256) | isl-org/MiDaS | MIT | 2022 |
| Google Drive Link | Original Model | Original Project | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|---|
| stable-diffusion-v1-5 | runwayml/stable-diffusion-v1-5 | runwayml/stable-diffusion | Open RAIL M license | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2022 |
Pastel Mix - a stylized latent diffusion model.This model is intended to produce high-quality, highly detailed anime style with just a few prompts.
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| pastelMixStylizedAnime_pastelMixPrunedFP16 | andite/pastel-mix | Fantasy.ai | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| AOM3_orangemixs | WarriorMama777/OrangeMixs | CreativeML OpenRAIL-M | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| Counterfeit-V2.5 | gsdf/Counterfeit-V2.5 | - | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| anything-v4.5 | andite/anything-v4.0 | Fantasy.ai | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| Openjourney | prompthero/openjourney | - | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| dreamlike-photoreal-2.0 | dreamlike-art/dreamlike-photoreal-2.0 | CreativeML OpenRAIL-M | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
import Vision
lazy var coreMLRequest:VNCoreMLRequest = {
let model = try! VNCoreMLModel(for: modelname().model)
let request = VNCoreMLRequest(model: model, completionHandler: self.coreMLCompletionHandler)
return request
}()
let handler = VNImageRequestHandler(ciImage: ciimage,options: [:])
DispatchQueue.global(qos: .userInitiated).async {
try? handler.perform([coreMLRequest])
}
If the model has Image type output:
let result = request?.results?.first as! VNPixelBufferObservation
let uiimage = UIImage(ciImage: CIImage(cvPixelBuffer: result.pixelBuffer))Else the model has Multiarray type output:
For visualizing multiArray as image, Mr. Hollance’s “CoreML Helpers” are very convenient. CoreML Helpers
Converting from MultiArray to Image with CoreML Helpers.
func coreMLCompletionHandler(request:VNRequest?、error:Error?){
let = coreMLRequest.results?.first as!VNCoreMLFeatureValueObservation
let multiArray = result.featureValue.multiArrayValue
let cgimage = multiArray?.cgImage(min:-1、max:1、channel:nil)
Option 2,Use CoreGANContainer. You can use models with dragging&dropping into the container project.
You can make the model size lighter with Quantization if you want. https://coremltools.readme.io/docs/quantization
The lower the number of bits, more the chances of degrading the model accuracy. The loss in accuracy varies with the model.
import coremltools as ct
from coremltools.models.neural_network import quantization_utils
# load full precision model
model_fp32 = ct.models.MLModel('model.mlmodel')
model_fp16 = quantization_utils.quantize_weights(model_fp32, nbits=16)
# nbits can be 16(half size model), 8(1/4), 4(1/8), 2, 1Portrait Animation (Kuaishou, 2024). Animate any portrait photo with expression transfer from a driving video. Multi-model pipeline.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [LivePortrait_MotionExtractor (TBD)] | 54 MB | 256x256 image | keypoints, pose, expression | KwaiVGI/LivePortrait | MIT | 2024 | LivePortraitDemo |
| [LivePortrait_AppearanceExtractor (TBD)] | 1.6 MB | 256x256 image | 3D feature volume | ||||
| [LivePortrait_WarpingNetwork (TBD)] | 91 MB | features + keypoints | warped features | ||||
| [LivePortrait_SPADEGenerator (TBD)] | 106 MB | warped features | 512x512 output |
First Order Motion Model. Face reenactment -- transfer facial expressions and head pose from one person to another.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [FOMM_KPDetector (TBD)] | 27 MB | 256x256 image | 10 keypoints + Jacobians | AliaksandrSiarohin/first-order-model | MIT | 2019 | FOMMDemo |
| [FOMM_Generator (TBD)] | 87 MB | source + keypoint pairs | 256x256 output |
Audio-Driven Talking Head. Make any portrait speak from audio input.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [Wav2Lip (TBD)] | 69 MB | face(6ch,96x96) + mel(1,1,80,16) | lip-synced face(96x96) | Rudrabha/Wav2Lip | See repo | 2020 | Wav2LipDemo |
Face Swap. Transfer face identity between photos using ArcFace embeddings + generator.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [SimSwap_ArcFace (TBD)] | 100 MB | 112x112 face | 512-d identity embedding | neuralchen/SimSwap | See repo | 2020 | SimSwapDemo |
| [SimSwap_Generator (TBD)] | 105 MB | 224x224 target + 512-d id | 224x224 swapped face |
3D Dense Face Alignment. Reconstruct 3D face mesh from single photo using MobileNet backbone (only 6.3 MB).
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [3DDFA_V2 (TBD)] | 6.3 MB | 120x120 face | 62 3DMM params (pose+shape+expression) | cleardusk/3DDFA_V2 | MIT | 2020 | Face3DDemo |
Deep Portrait Relighting. Change lighting direction in portraits using Spherical Harmonics.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [DPR_Relighting (TBD)] | 1.4 MB | 512x512 luminance + 9 SH coefficients | relit portrait | zhhoper/DPR | See repo | 2019 | RelightDemo |
Color-Dual-Transformer Network. Make composited foreground objects blend naturally with the background.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [CDTNet_Harmonization (TBD)] | 5.4 MB | 256x256 composite + mask | harmonized image | bcmi/CDTNet | See repo | 2022 | CDTNetDemo |
Hybrid Transformer Demucs by Meta. Separate music into 4 stems: vocals, drums, bass, other.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [HTDemucs (TBD)] | 100 MB | STFT freq(1,8,2049,336) + waveform(1,2,343980) | 4 separated stems | facebookresearch/demucs | MIT | 2023 | DemucsDemo |
Note: STFT/iSTFT must be performed app-side using Accelerate/vDSP. See sample app for integration details.
Swin Transformer Based Video Motion Magnification. Amplify invisible micro-motions in video (e.g., visualize heartbeat, structural vibrations).
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [STB_VMM (TBD)] | 65 MB | 2 frames(384x384) + magnification factor | magnified frame(384x384) | RLado/STB-VMM | GPL-3.0 | 2023 | MotionMagDemo |
Nonlinear Activation Free Network. State-of-the-art image deblurring without nonlinear activation functions.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [NAFNet_Deblur (TBD)] | 130 MB | 256x256 blurry image | 256x256 deblurred image | megvii-research/NAFNet | MIT | 2022 | NAFNetDemo |
Depth Anything V2 (TsingHua, 2024). State-of-the-art monocular depth estimation.
Official CoreML model and iOS sample app available:
- CoreML Model: apple/coreml-depth-anything-v2-small
- iOS Sample: huggingface/coreml-examples/depth-anything-example
YOLOv10 Nano (Tsinghua, 2024). NMS-free real-time object detection. Consistent dual assignments for training eliminates the need for Non-Maximum Suppression, reducing latency. Nano variant is only ~8 MB.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [YOLOv10N (TBD)] | 8 MB | 640x640 image | bounding boxes + class scores (80 COCO classes) | THU-MIG/yolov10 | AGPL-3.0 | 2024 | YOLOv10Demo |
Bilateral Reference Network (2024). State-of-the-art dichotomous image segmentation for high-quality background removal. Excels at fine details like hair, fur, and transparent objects.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [BiRefNet (TBD)] | 80 MB | 1024x1024 image | 1024x1024 alpha mask | ZhengPeng7/BiRefNet | MIT | 2024 | BiRefNetDemo |
OpenAI Whisper (OpenAI, 2023). Multilingual speech-to-text model supporting 99+ languages.
Full CoreML implementation available:
- argmaxinc/WhisperKit — Optimized CoreML models (Tiny to Large) with full encoder+decoder pipeline, Swift Package, MIT license
- CoreML Models: argmaxinc/whisperkit-coreml
Kokoro-82M (2025). #1 on TTS Arena. Ultra-lightweight text-to-speech model with only 82M parameters, supporting 54 voices across 8 languages (EN, JP, FR, ES, IT, PT, HI, ZH). Runs 3.3x real-time on iPhone 13 Pro. CoreML conversion and iOS Swift package already available.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [Kokoro82M (TBD)] | 80 MB (quantized) | phoneme tokens + voice style | 24kHz audio waveform | hexgrad/Kokoro-82M | Apache 2.0 | 2025 | KokoroDemo |
Note: Pre-converted CoreML model available at FluidInference/kokoro-82m-coreml. iOS Swift package at mlalma/kokoro-ios.
SmolVLM2-500M (HuggingFace, 2025). The world's smallest video-language model. Describe images, answer visual questions, read text (OCR), and understand video — all on-device. Only 500M parameters, runs on iPhone via MLX Swift.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [SmolVLM2_VisionEncoder (TBD)] | 245 MB (Q8) | 384x384 image + text tokens | text response | HuggingFaceTB/SmolVLM2-500M-Video-Instruct | Apache 2.0 | 2025 | SmolVLMDemo |
Note: GGUF models for llama.cpp available at ggml-org/SmolVLM2-500M-Video-Instruct-GGUF.
YOLOE-S (Tsinghua, ICCV 2025). Real-time open-vocabulary object detection and segmentation. Detect any object by text description, visual reference, or in prompt-free mode. +3.5 AP over YOLO-World with 1.4x faster inference. Zero overhead compared to closed-set YOLOs.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [YOLOE_S (TBD)] | 50 MB | 640x640 image + text prompt | bounding boxes + segmentation masks | THU-MIG/yoloe | AGPL-3.0 | 2025 | YOLOEDemo |
Built-in to Apple Vision framework:
VNDetectHumanBodyPoseRequest— 19 body keypoints, no model download neededVNDetectHumanBodyPose3DRequest— 3D pose estimation (iOS 17+)- For more keypoints (hands, face), see also
VNDetectHumanHandPoseRequest
PP-OCRv5 (Baidu, 2025). Ultra-lightweight multilingual OCR supporting 100+ languages. Two-stage pipeline: text detection + text recognition. Total model size under 20 MB. Handles scene text, handwriting, documents, and more.
| Model | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| [PPOCRv5_Det (TBD)] | 10 MB | 640x640 image | text region heatmap | PaddlePaddle/PaddleOCR | Apache 2.0 | 2025 | PPOCRv5Demo |
| [PPOCRv5_Rec (TBD)] | 10 MB | 48x320 text crop | character sequence |
Cover image was taken from Ghibli free images.
On YOLOv5 convertion, dbsystel/yolov5-coreml-tools give me the super inteligent convert script.
And all of original projects
Daisuke Majima Freelance engineer. iOS/MachineLearning/AR I can work on mobile ML projects and AR project. Feel free to contact: rockyshikoku@gmail.com














































































































