CoreML-Models

Converted Core ML Model Zoo.

Core ML is a machine learning framework by Apple. If you are iOS developer, you can easly use machine learning models in your Xcode project.

How to use

Take a look this model zoo, and if you found the CoreML model you want, download the model from google drive link and bundle it in your project. Or if the model have sample project link, try it and see how to use the model in the project. You are free to do or not.

If you like this repository, please give me a star so I can do my best.

Section Link

Image Classifier
- Efficientnetb0
- Efficientnetv2
- VisionTransformer
- Conformer
- DeiT
- RepVGG
- RegNet
- MobileViTv2
- MobileNetV3-Small
- ConvNeXt-Tiny
- FastViT-T8
- MobileOne-S0
- EfficientFormerV2-S0
- GhostNetV2-100
- PoolFormer-S12
- LeViT-128S
Object Detection
- YOLOv5s
- YOLOv7
- YOLOv8
Segmentation
- U2Net
- IS-Net
- RMBG1.4
- face-parsing
- Segformer
- BiseNetv2
- DNL
- ISANet
- FastFCN
- GCNet
- DANet
- Semantic FPN
- cloths_segmentation
- easyportrait
- DeepLabV3-MobileNetV3
- LRASPP-MobileNetV3
Super Resolution
- Real ESRGAN
- GFPGAN
- BSRGAN
- A-ESRGAN
- Beby-GAN
- RRDN
- Fast-SRGAN
- ESRGAN
- UltraSharp
- SRGAN
- SRResNet
- LESRCNN
- MMRealSR
- DASR
Low Light Enhancement
- StableLLVE
- Zero-DCE
- Retinexformer
Image Restoration
- MPRNet
- MIRNetv2
Image Generation
- MobileStyleGAN
- DCGAN
Image2Image
- Anime2Sketch
- AnimeGAN2Face_Paint_512_v2
- Photo2Cartoon
- AnimeGANv2_Hayao
- AnimeGANv2_Paprika
- WarpGAN Caricature
- UGATIT_selfie2anime
- Fast-Neural-Style-Transfer
- White_box_Cartoonization
- FacialCartoonization
Inpainting
- AOT-GAN-for-Inpainting
- Lama
Monocular Depth Estimation
- MiDaS
Stable Diffusion :text2image
- stable-diffusion-v1-5
- pastel-mix
- Orange Mix
- Counterfeit-V2.5
- anything-v4.5
- Openjourney
- dreamlike-photoreal-2.0
Face Manipulation :NEW
- LivePortrait
- FOMM
- Wav2Lip
- SimSwap
- 3DDFA_V2
- DPR Portrait Relighting
Image Harmonization :NEW
- CDTNet
Audio Source Separation :NEW
- HTDemucs
Video Motion Magnification :NEW
- STB-VMM
Image Deblurring :NEW
- NAFNet
Monocular Depth Estimation (Next-Gen) — Official CoreML
Object Detection (Next-Gen) :NEW
- YOLOv10-N
Background Removal (SOTA) :NEW
- BiRefNet
Speech Recognition — WhisperKit
Text-to-Speech :NEW
- Kokoro-82M
Vision-Language Model :NEW
- SmolVLM2-500M
Open-Vocabulary Detection :NEW
- YOLOE-S
Pose Estimation — Apple Vision API
Multilingual OCR :NEW
- PP-OCRv5

How to get the model

You can get the model converted to CoreML format from the link of Google drive. See the section below for how to use it in Xcode. The license for each model conforms to the license for the original project.

Image Classifier

Efficientnet

Google Drive Link	Size	Dataset	Original Project	License
Efficientnetb0	22.7 MB	ImageNet	TensorFlowHub	Apache2.0

Efficientnetv2

Google Drive Link	Size	Dataset	Original Project	License	Year
Efficientnetv2	85.8 MB	ImageNet	Google/autoML	Apache2.0	2021

VisionTransformer

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Google Drive Link	Size	Dataset	Original Project	License	Year
VisionTransformer-B16	347.5 MB	ImageNet	google-research/vision_transformer	Apache2.0	2021

Conformer

Local Features Coupling Global Representations for Visual Recognition.

Google Drive Link	Size	Dataset	Original Project	License	Year
Conformer-tiny-p16	94.1 MB	ImageNet	pengzhiliang/Conformer	Apache2.0	2021

DeiT

Data-efficient Image Transformers

Google Drive Link	Size	Dataset	Original Project	License	Year
DeiT-base384	350.5 MB	ImageNet	facebookresearch/deit	Apache2.0	2021

RepVGG

Making VGG-style ConvNets Great Again

Google Drive Link	Size	Dataset	Original Project	License	Year
RepVGG-A0	33.3 MB	ImageNet	DingXiaoH/RepVGG	MIT	2021

RegNet

Designing Network Design Spaces

Google Drive Link	Size	Dataset	Original Project	License	Year
regnet_y_400mf	16.5 MB	ImageNet	TORCHVISION.MODELS	MIT	2020

MobileViTv2

CVNets: A library for training computer vision networks

Google Drive Link	Size	Dataset	Original Project	License	Year	Conversion Script
MobileViTv2	18.8 MB	ImageNet	apple/ml-cvnets	apple	2022

MobileNetV3-Small

Lightweight classification model optimized for mobile devices. Ultra-fast inference with 67.7% top-1 accuracy.

Google Drive Link	Size	Dataset	Original Project	License	Year	Sample Project
MobileNetV3-Small (TBD)	4.9 MB	ImageNet	pytorch/vision	BSD-3	2019	MobileNetV3SmallDemo

ConvNeXt-Tiny

A ConvNet for the 2020s. Pure CNN architecture that competes with Vision Transformers. 82.5% top-1 accuracy.

Google Drive Link	Size	Dataset	Original Project	License	Year	Sample Project
ConvNeXt-Tiny (TBD)	54.6 MB	ImageNet	facebookresearch/ConvNeXt	MIT	2022	ConvNeXtTinyDemo

FastViT-T8

Official CoreML model and sample app available:

CoreML Model: apple/coreml-FastViT-T8

iOS Sample: huggingface/coreml-examples/FastViTSample

Source: apple/ml-fastvit

MobileOne-S0

Official CoreML model and benchmark app available:

CoreML Model + iOS App: apple/ml-mobileone

EfficientFormerV2-S0

Rethinking Vision Transformers for MobileNet Size and Speed. Lightweight ViT for mobile. 76.2% top-1 accuracy.

Google Drive Link	Size	Dataset	Original Project	License	Year	Sample Project
EfficientFormerV2-S0 (TBD)	7.2 MB	ImageNet	snap-research/EfficientFormer	Apache2.0	2023	EfficientFormerV2Demo

GhostNetV2-100

GhostNetV2: Enhance Cheap Operation with Long-Range Attention. Ghost module with DFC attention. 75.3% top-1 accuracy.

Google Drive Link	Size	Dataset	Original Project	License	Year	Sample Project
GhostNetV2-100 (TBD)	11.9 MB	ImageNet	huawei-noah/Efficient-AI-Backbones	Apache2.0	2022	GhostNetV2Demo

PoolFormer-S12

MetaFormer is Actually What You Need for Vision. Uses simple pooling instead of attention. 77.2% top-1 accuracy.

Google Drive Link	Size	Dataset	Original Project	License	Year	Sample Project
PoolFormer-S12 (TBD)	22.9 MB	ImageNet	sail-sg/poolformer	Apache2.0	2022	PoolFormerDemo

LeViT-128S

LeViT: A Vision Transformer in ConvNet's Clothing. Fast hybrid CNN-Transformer. 76.6% top-1 accuracy.

Google Drive Link	Size	Dataset	Original Project	License	Year	Sample Project
LeViT-128S (TBD)	16.0 MB	ImageNet	facebookresearch/LeViT	Apache2.0	2021	LeViTDemo

Object Detection

YOLOv5s

Google Drive Link	Size	Output	Original Project	License	Note	Sample Project
YOLOv5s	29.3MB	Confidence(MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4))	ultralytics/yolov5	GNU	Non Maximum Suppression has been added.	CoreML-YOLOv5

YOLOv7

Google Drive Link	Size	Output	Original Project	License	Note	Sample Project	Conversion Script
YOLOv7	147.9MB	Confidence(MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4))	WongKinYiu/yolov7	GNU	Non Maximum Suppression has been added.	CoreML-YOLOv5

YOLOv8

Google Drive Link	Size	Output	Original Project	License	Note	Sample Project
YOLOv8s	45.1MB	Confidence(MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4))	ultralytics/ultralytics	GNU	Non Maximum Suppression has been added.	CoreML-YOLOv5

Segmentation

U2Net

Google Drive Link	Size	Output	Original Project	License
U2Net	175.9 MB	Image(GRAYSCALE 320 × 320)	xuebinqin/U-2-Net	Apache
U2Netp	4.6 MB	Image(GRAYSCALE 320 × 320)	xuebinqin/U-2-Net	Apache

IS-Net

Google Drive Link	Size	Output	Original Project	License	Year	Conversion Script
IS-Net	176.1 MB	Image(GRAYSCALE 1024 × 1024)	xuebinqin/DIS	Apache	2022
IS-Net-General-Use	176.1 MB	Image(GRAYSCALE 1024 × 1024)	xuebinqin/DIS	Apache	2022

RMBG1.4

RMBG1.4 - The IS-Net enhanced with our unique training scheme and proprietary dataset.

Google Drive Link	Size	Output	Original Project	License	year	Conversion Script
RMBG.mlpackage/RMBG.mlmodel	176 MB	Image(GrayScale 1024x1024)	briaai/RMBG-1.4	Creative Commons	2024

face-Parsing

Google Drive Link	Size	Output	Original Project	License	Sample Project
face-Parsing	53.2 MB	MultiArray(1 x 512 × 512)	zllrunning/face-parsing.PyTorch	MIT	CoreML-face-parsing

Segformer

Simple and Efficient Design for Semantic Segmentation with Transformers

Google Drive Link	Size	Output	Original Project	License	year
SegFormer_mit-b0_1024x1024_cityscapes	14.9 MB	MultiArray(512 × 1024)	NVlabs/SegFormer	NVIDIA	2021

BiSeNetV2

Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

Google Drive Link	Size	Output	Original Project	License	year
BiSeNetV2_1024x1024_cityscapes	12.8 MB	MultiArray	ycszen/BiSeNet	Apache2.0	2021

DNL

Disentangled Non-Local Neural Networks

Google Drive Link	Size	Output	Dataset	Original Project	License	year
dnl_r50-d8_512x512_80k_ade20k	190.8 MB	MultiArray[512x512]	ADE20K	yinmh17/DNL-Semantic-Segmentation	Apache2.0	2020

ISANet

Interlaced Sparse Self-Attention for Semantic Segmentation

Google Drive Link	Size	Output	Dataset	Original Project	License	year
isanet_r50-d8_512x512_80k_ade20k	141.5 MB	MultiArray[512x512]	ADE20K	openseg-group/openseg.pytorch	MIT	ArXiv'2019/IJCV'2021

FastFCN

Rethinking Dilated Convolution in the Backbone for Semantic Segmentation

Google Drive Link	Size	Output	Dataset	Original Project	License	year
fastfcn_r50-d32_jpu_aspp_512x512_80k_ade20k	326.2 MB	MultiArray[512x512]	ADE20K	wuhuikai/FastFCN	MIT	ArXiv'2019

GCNet

Non-local Networks Meet Squeeze-Excitation Networks and Beyond

Google Drive Link	Size	Output	Dataset	Original Project	License	year
gcnet_r50-d8_512x512_20k_voc12aug	189 MB	MultiArray[512x512]	PascalVOC	xvjiarui/GCNet	Apache License 2.0	ICCVW'2019/TPAMI'2020

DANet

Dual Attention Network for Scene Segmentation(CVPR2019)

Google Drive Link	Size	Output	Dataset	Original Project	License	year
danet_r50-d8_512x1024_40k_cityscapes	189.7 MB	MultiArray[512x1024]	CityScapes	junfu1115/DANet	MIT	CVPR2019

Semantic-FPN

Panoptic Feature Pyramid Networks

Google Drive Link	Size	Output	Dataset	Original Project	License	year
fpn_r50_512x1024_80k_cityscapes	108.6 MB	MultiArray[512x1024]	CityScapes	facebookresearch/detectron2	Apache License 2.0	2019

cloths_segmentation

Code for binary segmentation of various cloths.

Google Drive Link	Size	Output	Dataset	Original Project	License	year
clothSegmentation	50.1 MB	Image(GrayScale 640x960)	fashion-2019-FGVC6	facebookresearch/detectron2	MIT	2020

easyportrait

EasyPortrait - Face Parsing and Portrait Segmentation Dataset.

Google Drive Link	Size	Output	Original Project	License	year	Swift sample	Conversion Script
easyportrait-segformer512-fp	7.6 MB	Image(GrayScale 512x512) * 9	hukenovs/easyportrait	Creative Commons	2023	easyportrait-coreml

DeepLabV3-MobileNetV3

DeepLabV3 with MobileNetV3-Large backbone. 21-class PASCAL VOC semantic segmentation (person, car, cat, dog, etc.).

Google Drive Link	Size	Output	Original Project	License	Year	Sample Project
DeepLabV3-MobileNetV3 (TBD)	21.1 MB	MultiArray (1x21x512x512)	pytorch/vision	BSD-3	2019	DeepLabV3Demo

LRASPP-MobileNetV3

Lite R-ASPP with MobileNetV3-Large backbone. Ultra-lightweight 21-class semantic segmentation (57.9 mIoU). Only 6.3 MB.

Google Drive Link	Size	Output	Original Project	License	Year	Sample Project
LRASPP-MobileNetV3 (TBD)	6.3 MB	MultiArray (1x21x512x512)	pytorch/vision	BSD-3	2019	LRASPPDemo

Super Resolution

Real ESRGAN

Google Drive Link	Size	Output	Original Project	License	year
Real ESRGAN4x	66.9 MB	Image(RGB 2048x2048)	xinntao/Real-ESRGAN	BSD 3-Clause License	2021
Real ESRGAN Anime4x	66.9 MB	Image(RGB 2048x2048)	xinntao/Real-ESRGAN	BSD 3-Clause License	2021

GFPGAN

Towards Real-World Blind Face Restoration with Generative Facial Prior

Google Drive Link	Size	Output	Original Project	License	year
GFPGAN	337.4 MB	Image(RGB 512x512)	TencentARC/GFPGAN	Apache2.0	2021

BSRGAN

Google Drive Link	Size	Output	Original Project	License	year
BSRGAN	66.9 MB	Image(RGB 2048x2048)	cszn/BSRGAN		2021

A-ESRGAN

Google Drive Link	Size	Output	Original Project	License	year	Conversion Script
A-ESRGAN	63.8 MB	Image(RGB 1024x1024)	aesrgan/A-ESRGANN	BSD 3-Clause License	2021

Beby-GAN

Best-Buddy GANs for Highly Detailed Image Super-Resolution

Google Drive Link	Size	Output	Original Project	License	year
Beby-GAN	66.9 MB	Image(RGB 2048x2048)	dvlab-research/Simple-SR	MIT	2021

RRDN

The Residual in Residual Dense Network for image super-scaling.

Google Drive Link	Size	Output	Original Project	License	year
RRDN	16.8 MB	Image(RGB 2048x2048)	idealo/image-super-resolution	Apache2.0	2018

Fast-SRGAN

Fast-SRGAN.

Google Drive Link	Size	Output	Original Project	License	year
Fast-SRGAN	628 KB	Image(RGB 1024x1024)	HasnainRaz/Fast-SRGAN	MIT	2019

ESRGAN

Enhanced-SRGAN.

Google Drive Link	Size	Output	Original Project	License	year
ESRGAN	66.9 MB	Image(RGB 2048x2048)	xinntao/ESRGAN	Apache 2.0	2018

UltraSharp

Pretrained: 4xESRGAN

Google Drive Link	Size	Output	Original Project	License	year
UltraSharp	34 MB	Image(RGB 1024x1024)	Kim2019/	CC-BY-NC-SA-4.0	2021

SRGAN

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.

Google Drive Link	Size	Output	Original Project	License	year
SRGAN	6.1 MB	Image(RGB 2048x2048)	dongheehand/SRGAN-PyTorch		2017

SRResNet

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.

Google Drive Link	Size	Output	Original Project	License	year
SRResNet	6.1 MB	Image(RGB 2048x2048)	dongheehand/SRGAN-PyTorch		2017

LESRCNN

Lightweight Image Super-Resolution with Enhanced CNN.

Google Drive Link	Size	Output	Original Project	License	year	Conversion Script
LESRCNN	4.3 MB	Image(RGB 512x512)	hellloxiaotian/LESRCNN		2020

MMRealSR

Metric Learning based Interactive Modulation for Real-World Super-Resolution

Google Drive Link	Size	Output	Original Project	License	year	Conversion Script
MMRealSRGAN	104.6 MB	Image(RGB 1024x1024)	TencentARC/MM-RealSR	BSD 3-Clause	2022
MMRealSRNet	104.6 MB	Image(RGB 1024x1024)	TencentARC/MM-RealSR	BSD 3-Clause	2022

DASR

Pytorch implementation of "Unsupervised Degradation Representation Learning for Blind Super-Resolution", CVPR 2021

Google Drive Link	Size	Output	Original Project	License	year
DASR	12.1 MB	Image(RGB 1024x1024)	The-Learning-And-Vision-Atelier-LAVA/DASR	MIT	2022

Low Light Enhancement

StableLLVE

Learning Temporal Consistency for Low Light Video Enhancement from Single Images.

Google Drive Link	Size	Output	Original Project	License	Year
StableLLVE	17.3 MB	Image(RGB 512x512)	zkawfanx/StableLLVE	MIT	2021

Zero-DCE

Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement

Google Drive Link	Size	Output	Original Project	License	Year	Conversion Script
Zero-DCE	320KB	Image(RGB 512x512)	Li-Chongyi/Zero-DCE	See Repo	2021

Retinexformer

Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement

Google Drive Link	Size	Output	Original Project	License	Year	Conversion Script
ZRetinexformer FiveK	3.4MB	Image(RGB 512x512)	caiyuanhao1998/Retinexformer	MIT	2023
ZRetinexformer NTIRE	3.4MB	Image(RGB 512x512)	caiyuanhao1998/Retinexformer	MIT	2023

Image Restoration

MPRNet

Multi-Stage Progressive Image Restoration.

Debluring

Denoising

Deraining

Google Drive Link	Size	Output	Original Project	License	Year
MPRNetDebluring	137.1 MB	Image(RGB 512x512)	swz30/MPRNet	MIT	2021
MPRNetDeNoising	108 MB	Image(RGB 512x512)	swz30/MPRNet	MIT	2021
MPRNetDeraining	24.5 MB	Image(RGB 512x512)	swz30/MPRNet	MIT	2021

MIRNetv2

Learning Enriched Features for Fast Image Restoration and Enhancement.

Denoising

Super Resolution

Contrast Enhancement

Low Light Enhancement

Google Drive Link	Size	Output	Original Project	License	Year
MIRNetv2Denoising	42.5 MB	Image(RGB 512x512)	swz30/MIRNetv2	ACADEMIC PUBLIC LICENSE	2022
MIRNetv2SuperResolution	42.5 MB	Image(RGB 512x512)	swz30/MIRNetv2	ACADEMIC PUBLIC LICENSE	2022
MIRNetv2ContrastEnhancement	42.5 MB	Image(RGB 512x512)	swz30/MIRNetv2	ACADEMIC PUBLIC LICENSE	2022
MIRNetv2LowLightEnhancement	42.5 MB	Image(RGB 512x512)	swz30/MIRNetv2	ACADEMIC PUBLIC LICENSE	2022

Image Generation

MobileStyleGAN

Google Drive Link	Size	Output	Original Project	License	Sample Project
MobileStyleGAN	38.6MB	Image(Color 1024 × 1024)	bes-dev/MobileStyleGAN.pytorch	Nvidia Source Code License-NC	CoreML-StyleGAN

DCGAN

Google Drive Link	Size	Output	Original Project
DCGAN	9.2MB	MultiArray	TensorFlowCore

Image2Image

Anime2Sketch

Google Drive Link	Size	Output	Original Project	License	Usage
Anime2Sketch	217.7MB	Image(Color 512 × 512)	Mukosame/Anime2Sketch	MIT	Drop an image to preview

AnimeGAN2Face_Paint_512_v2

Google Drive Link	Size	Output	Original Project	Conversion Script
AnimeGAN2Face_Paint_512_v2	8.6MB	Image(Color 512 × 512)	bryandlee/animegan2-pytorch

Photo2Cartoon

Google Drive Link	Size	Output	Original Project	License	Note
Photo2Cartoon	15.2 MB	Image(Color 256 × 256)	minivision-ai/photo2cartoon	MIT	The output is little bit different from the original model. It cause some operations were converted replaced　manually.

AnimeGANv2_Hayao

Google Drive Link	Size	Output	Original Project	Sample
AnimeGANv2_Hayao	8.7MB	Image(256 x 256)	TachibanaYoshino/AnimeGANv2	AnimeGANv2-iOS

AnimeGANv2_Paprika

Google Drive Link	Size	Output	Original Project
AnimeGANv2_Paprika	8.7MB	Image(256 x 256)	TachibanaYoshino/AnimeGANv2

WarpGAN Caricature

Google Drive Link	Size	Output	Original Project
WarpGAN Caricature	35.5MB	Image(256 x 256)	seasonSH/WarpGAN

UGATIT_selfie2anime

Google Drive Link	Size	Output	Original Project
UGATIT_selfie2anime	266.2MB(quantized)	Image(256x256)	taki0112/UGATIT

CartoonGAN

Google Drive Link	Size	Output	Original Project
CartoonGAN_Shinkai	44.6MB	MultiArray	mnicnc404/CartoonGan-tensorflow
CartoonGAN_Hayao	44.6MB	MultiArray	mnicnc404/CartoonGan-tensorflow
CartoonGAN_Hosoda	44.6MB	MultiArray	mnicnc404/CartoonGan-tensorflow
CartoonGAN_Paprika	44.6MB	MultiArray	mnicnc404/CartoonGan-tensorflow

Fast-Neural-Style-Transfer

Google Drive Link	Size	Output	Original Project	License	Year
fast-neural-style-transfer-cuphead	6.4MB	Image(RGB 960x640)	eriklindernoren/Fast-Neural-Style-Transfer	MIT	2019
fast-neural-style-transfer-starry-night	6.4MB	Image(RGB 960x640)	eriklindernoren/Fast-Neural-Style-Transfer	MIT	2019
fast-neural-style-transfer-mosaic	6.4MB	Image(RGB 960x640)	eriklindernoren/Fast-Neural-Style-Transfer	MIT	2019

White_box_Cartoonization

Learning to Cartoonize Using White-box Cartoon Representations

Google Drive Link	Size	Output	Original Project	License	Year
White_box_Cartoonization	5.9MB	Image(1536x1536)	SystemErrorWang/White-box-Cartoonization	creativecommons	CVPR2020

FacialCartoonization

White-box facial image cartoonizaiton

Google Drive Link	Size	Output	Original Project	License	Year
FacialCartoonization	8.4MB	Image(256x256)	SystemErrorWang/FacialCartoonization	creativecommons	2020

Inpainting

AOT-GAN-for-Inpainting

Google Drive Link	Size	Output	Original Project	License	Note	Sample Project
AOT-GAN-for-Inpainting	60.8MB	MLMultiArray(3,512,512)	researchmm/AOT-GAN-for-Inpainting	Apache2.0	To use see sample.	john-rocky/Inpainting-CoreML

Lama

Google Drive Link	Size	Input	Output	Original Project	License	Note	Sample Project	Conversion Script
Lama	216.6MB	Image (Color 800 × 800), Image (GrayScale 800 × 800)	Image (Color 800 × 800)	advimman/lama	Apache2.0	To use see sample.	john-rocky/lama-cleaner-iOS	mallman/CoreMLaMa

Monocular Depth Estimation

MiDaS

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

Google Drive Link	Size	Output	Original Project	License	Year	Conversion Script
MiDaS_Small	66.3MB	MultiArray(1x256x256)	isl-org/MiDaS	MIT	2022

Stable Diffusion

stable-diffusion-v1-5

Google Drive Link	Original Model	Original Project	License	Run on mac	Conversion Script	Year
stable-diffusion-v1-5	runwayml/stable-diffusion-v1-5	runwayml/stable-diffusion	Open RAIL M license	godly-devotion/MochiDiffusion	godly-devotion/MochiDiffusion	2022

pastel-mix

Pastel Mix - a stylized latent diffusion model.This model is intended to produce high-quality, highly detailed anime style with just a few prompts.

Google Drive Link	Original Model	License	Run on mac	Conversion Script	Year
pastelMixStylizedAnime_pastelMixPrunedFP16	andite/pastel-mix	Fantasy.ai	godly-devotion/MochiDiffusion	godly-devotion/MochiDiffusion	2023

Orange Mix

Google Drive Link	Original Model	License	Run on mac	Conversion Script	Year
AOM3_orangemixs	WarriorMama777/OrangeMixs	CreativeML OpenRAIL-M	godly-devotion/MochiDiffusion	godly-devotion/MochiDiffusion	2023

Counterfeit

Google Drive Link	Original Model	License	Run on mac	Conversion Script	Year
Counterfeit-V2.5	gsdf/Counterfeit-V2.5	-	godly-devotion/MochiDiffusion	godly-devotion/MochiDiffusion	2023

anything-v4

Google Drive Link	Original Model	License	Run on mac	Conversion Script	Year
anything-v4.5	andite/anything-v4.0	Fantasy.ai	godly-devotion/MochiDiffusion	godly-devotion/MochiDiffusion	2023

Openjourney

Google Drive Link	Original Model	License	Run on mac	Conversion Script	Year
Openjourney	prompthero/openjourney	-	godly-devotion/MochiDiffusion	godly-devotion/MochiDiffusion	2023

dreamlike-photoreal-2

Google Drive Link	Original Model	License	Run on mac	Conversion Script	Year
dreamlike-photoreal-2.0	dreamlike-art/dreamlike-photoreal-2.0	CreativeML OpenRAIL-M	godly-devotion/MochiDiffusion	godly-devotion/MochiDiffusion	2023

Models converted by someone other than me.

Stable Diffusion

apple/ml-stable-diffusion

How to use in a xcode project.

Option 1,implement Vision request.


import Vision
lazy var coreMLRequest:VNCoreMLRequest = {
   let model = try! VNCoreMLModel(for: modelname().model)
   let request = VNCoreMLRequest(model: model, completionHandler: self.coreMLCompletionHandler)
   return request
   }()

let handler = VNImageRequestHandler(ciImage: ciimage,options: [:])
   DispatchQueue.global(qos: .userInitiated).async {
   try? handler.perform([coreMLRequest])
}

If the model has Image type output:

let result = request?.results?.first as! VNPixelBufferObservation
let uiimage = UIImage(ciImage: CIImage(cvPixelBuffer: result.pixelBuffer))

Else the model has Multiarray type output:

For visualizing multiArray as image, Mr. Hollance’s “CoreML Helpers” are very convenient. CoreML Helpers

Converting from MultiArray to Image with CoreML Helpers.

func coreMLCompletionHandler（request：VNRequest？、error：Error？）{
   let = coreMLRequest.results？.first as！VNCoreMLFeatureValueObservation
   let multiArray = result.featureValue.multiArrayValue
   let cgimage = multiArray？.cgImage（min：-1、max：1、channel：nil）

Option 2,Use CoreGANContainer. You can use models with dragging&dropping into the container project.

Make the model lighter

You can make the model size lighter with Quantization if you want. https://coremltools.readme.io/docs/quantization

The lower the number of bits, more the chances of degrading the model accuracy. The loss in accuracy varies with the model.

import coremltools as ct
from coremltools.models.neural_network import quantization_utils

# load full precision model
model_fp32 = ct.models.MLModel('model.mlmodel')

model_fp16 = quantization_utils.quantize_weights(model_fp32, nbits=16)
# nbits can be 16(half size model), 8(1/4), 4(1/8), 2, 1

quantized sample (U2Net)

InputImage / nbits=32(original) / nbits=16 / nbits=8 / nbits=4

Face Manipulation

LivePortrait

Portrait Animation (Kuaishou, 2024). Animate any portrait photo with expression transfer from a driving video. Multi-model pipeline.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[LivePortrait_MotionExtractor (TBD)]	54 MB	256x256 image	keypoints, pose, expression	KwaiVGI/LivePortrait	MIT	2024	LivePortraitDemo
[LivePortrait_AppearanceExtractor (TBD)]	1.6 MB	256x256 image	3D feature volume
[LivePortrait_WarpingNetwork (TBD)]	91 MB	features + keypoints	warped features
[LivePortrait_SPADEGenerator (TBD)]	106 MB	warped features	512x512 output

FOMM

First Order Motion Model. Face reenactment -- transfer facial expressions and head pose from one person to another.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[FOMM_KPDetector (TBD)]	27 MB	256x256 image	10 keypoints + Jacobians	AliaksandrSiarohin/first-order-model	MIT	2019	FOMMDemo
[FOMM_Generator (TBD)]	87 MB	source + keypoint pairs	256x256 output

Wav2Lip

Audio-Driven Talking Head. Make any portrait speak from audio input.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[Wav2Lip (TBD)]	69 MB	face(6ch,96x96) + mel(1,1,80,16)	lip-synced face(96x96)	Rudrabha/Wav2Lip	See repo	2020	Wav2LipDemo

SimSwap

Face Swap. Transfer face identity between photos using ArcFace embeddings + generator.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[SimSwap_ArcFace (TBD)]	100 MB	112x112 face	512-d identity embedding	neuralchen/SimSwap	See repo	2020	SimSwapDemo
[SimSwap_Generator (TBD)]	105 MB	224x224 target + 512-d id	224x224 swapped face

3DDFA_V2

3D Dense Face Alignment. Reconstruct 3D face mesh from single photo using MobileNet backbone (only 6.3 MB).

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[3DDFA_V2 (TBD)]	6.3 MB	120x120 face	62 3DMM params (pose+shape+expression)	cleardusk/3DDFA_V2	MIT	2020	Face3DDemo

DPR Portrait Relighting

Deep Portrait Relighting. Change lighting direction in portraits using Spherical Harmonics.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[DPR_Relighting (TBD)]	1.4 MB	512x512 luminance + 9 SH coefficients	relit portrait	zhhoper/DPR	See repo	2019	RelightDemo

Image Harmonization

CDTNet

Color-Dual-Transformer Network. Make composited foreground objects blend naturally with the background.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[CDTNet_Harmonization (TBD)]	5.4 MB	256x256 composite + mask	harmonized image	bcmi/CDTNet	See repo	2022	CDTNetDemo

Audio Source Separation

HTDemucs

Hybrid Transformer Demucs by Meta. Separate music into 4 stems: vocals, drums, bass, other.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[HTDemucs (TBD)]	100 MB	STFT freq(1,8,2049,336) + waveform(1,2,343980)	4 separated stems	facebookresearch/demucs	MIT	2023	DemucsDemo

Note: STFT/iSTFT must be performed app-side using Accelerate/vDSP. See sample app for integration details.

Video Motion Magnification

STB-VMM

Swin Transformer Based Video Motion Magnification. Amplify invisible micro-motions in video (e.g., visualize heartbeat, structural vibrations).

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[STB_VMM (TBD)]	65 MB	2 frames(384x384) + magnification factor	magnified frame(384x384)	RLado/STB-VMM	GPL-3.0	2023	MotionMagDemo

Image Deblurring

NAFNet

Nonlinear Activation Free Network. State-of-the-art image deblurring without nonlinear activation functions.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[NAFNet_Deblur (TBD)]	130 MB	256x256 blurry image	256x256 deblurred image	megvii-research/NAFNet	MIT	2022	NAFNetDemo

Monocular Depth Estimation (Next-Gen)

Depth Anything V2 Small

Depth Anything V2 (TsingHua, 2024). State-of-the-art monocular depth estimation.

Official CoreML model and iOS sample app available:

CoreML Model: apple/coreml-depth-anything-v2-small

iOS Sample: huggingface/coreml-examples/depth-anything-example

Object Detection (Next-Gen)

YOLOv10-N

YOLOv10 Nano (Tsinghua, 2024). NMS-free real-time object detection. Consistent dual assignments for training eliminates the need for Non-Maximum Suppression, reducing latency. Nano variant is only ~8 MB.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[YOLOv10N (TBD)]	8 MB	640x640 image	bounding boxes + class scores (80 COCO classes)	THU-MIG/yolov10	AGPL-3.0	2024	YOLOv10Demo

Background Removal (SOTA)

BiRefNet

Bilateral Reference Network (2024). State-of-the-art dichotomous image segmentation for high-quality background removal. Excels at fine details like hair, fur, and transparent objects.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[BiRefNet (TBD)]	80 MB	1024x1024 image	1024x1024 alpha mask	ZhengPeng7/BiRefNet	MIT	2024	BiRefNetDemo

Speech Recognition

Whisper

OpenAI Whisper (OpenAI, 2023). Multilingual speech-to-text model supporting 99+ languages.

Full CoreML implementation available:

argmaxinc/WhisperKit — Optimized CoreML models (Tiny to Large) with full encoder+decoder pipeline, Swift Package, MIT license

CoreML Models: argmaxinc/whisperkit-coreml

Text-to-Speech

Kokoro-82M

Kokoro-82M (2025). #1 on TTS Arena. Ultra-lightweight text-to-speech model with only 82M parameters, supporting 54 voices across 8 languages (EN, JP, FR, ES, IT, PT, HI, ZH). Runs 3.3x real-time on iPhone 13 Pro. CoreML conversion and iOS Swift package already available.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[Kokoro82M (TBD)]	80 MB (quantized)	phoneme tokens + voice style	24kHz audio waveform	hexgrad/Kokoro-82M	Apache 2.0	2025	KokoroDemo

Note: Pre-converted CoreML model available at FluidInference/kokoro-82m-coreml. iOS Swift package at mlalma/kokoro-ios.

Vision-Language Model

SmolVLM2-500M

SmolVLM2-500M (HuggingFace, 2025). The world's smallest video-language model. Describe images, answer visual questions, read text (OCR), and understand video — all on-device. Only 500M parameters, runs on iPhone via MLX Swift.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[SmolVLM2_VisionEncoder (TBD)]	245 MB (Q8)	384x384 image + text tokens	text response	HuggingFaceTB/SmolVLM2-500M-Video-Instruct	Apache 2.0	2025	SmolVLMDemo

Note: GGUF models for llama.cpp available at ggml-org/SmolVLM2-500M-Video-Instruct-GGUF.

Open-Vocabulary Detection

YOLOE-S

YOLOE-S (Tsinghua, ICCV 2025). Real-time open-vocabulary object detection and segmentation. Detect any object by text description, visual reference, or in prompt-free mode. +3.5 AP over YOLO-World with 1.4x faster inference. Zero overhead compared to closed-set YOLOs.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[YOLOE_S (TBD)]	50 MB	640x640 image + text prompt	bounding boxes + segmentation masks	THU-MIG/yoloe	AGPL-3.0	2025	YOLOEDemo

Pose Estimation

Human Body Pose

Built-in to Apple Vision framework:

VNDetectHumanBodyPoseRequest — 19 body keypoints, no model download needed

VNDetectHumanBodyPose3DRequest — 3D pose estimation (iOS 17+)

For more keypoints (hands, face), see also VNDetectHumanHandPoseRequest

Multilingual OCR

PP-OCRv5

PP-OCRv5 (Baidu, 2025). Ultra-lightweight multilingual OCR supporting 100+ languages. Two-stage pipeline: text detection + text recognition. Total model size under 20 MB. Handles scene text, handwriting, documents, and more.

Model	Size	Input	Output	Original Project	License	Year	Sample Project
[PPOCRv5_Det (TBD)]	10 MB	640x640 image	text region heatmap	PaddlePaddle/PaddleOCR	Apache 2.0	2025	PPOCRv5Demo
[PPOCRv5_Rec (TBD)]	10 MB	48x320 text crop	character sequence

Thanks

Cover image was taken from Ghibli free images.

On YOLOv5 convertion, dbsystel/yolov5-coreml-tools give me the super inteligent convert script.

And all of original projects

Auther

Daisuke Majima Freelance engineer. iOS/MachineLearning/AR I can work on mobile ML projects and AR project. Feel free to contact: rockyshikoku@gmail.com

GitHub Twitter Medium

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

CoreML-Models

How to use

Section Link

How to get the model

Image Classifier

Efficientnet

Efficientnetv2

VisionTransformer

Conformer

DeiT

RepVGG

RegNet

MobileViTv2

MobileNetV3-Small

ConvNeXt-Tiny

FastViT-T8

MobileOne-S0

EfficientFormerV2-S0

GhostNetV2-100

PoolFormer-S12

LeViT-128S

Object Detection

YOLOv5s

YOLOv7

YOLOv8

Segmentation

RMBG1.4

face-Parsing

Segformer

BiSeNetV2

DNL

ISANet

FastFCN

GCNet

DANet

Semantic-FPN

cloths_segmentation

easyportrait

DeepLabV3-MobileNetV3

LRASPP-MobileNetV3

Super Resolution

Low Light Enhancement

StableLLVE

Zero-DCE

Retinexformer

Image Restoration

MPRNet

MIRNetv2

Image Generation

Image2Image

CartoonGAN

Inpainting

AOT-GAN-for-Inpainting

Monocular Depth Estimation

Stable Diffusion

Models converted by someone other than me.

How to use in a xcode project.

Option 1,implement Vision request.

Option 2,Use CoreGANContainer. You can use models with dragging&dropping into the container project.

Make the model lighter

quantized sample (U2Net)

InputImage / nbits=32(original) / nbits=16 / nbits=8 / nbits=4

Face Manipulation

LivePortrait

FOMM

Wav2Lip

SimSwap

3DDFA_V2

DPR Portrait Relighting

Image Harmonization

CDTNet

Audio Source Separation

HTDemucs

Video Motion Magnification