Open
Conversation
New model categories: - Face Manipulation: LivePortrait, FOMM, Wav2Lip, SimSwap, 3DDFA_V2, DPR - Image Harmonization: CDTNet - Audio Source Separation: HTDemucs - Video Motion Magnification: STB-VMM - Image Deblurring: NAFNet - Image Classifiers: MobileNetV3, ConvNeXt, FastViT, MobileOne, etc. - Semantic Segmentation: DeepLabV3, LRASPP Includes 20 SwiftUI sample apps (creative_apps/ and sample_apps/). Model files (.mlpackage) are excluded - download from Google Drive.
New models: Depth Anything V2, YOLOv10-N, BiRefNet, Whisper Tiny, Depth Pro, Kokoro-82M TTS, SmolVLM2-500M, YOLOE-S, DWPose, PP-OCRv5. Covers new categories: speech recognition, TTS, VLM, open-vocab detection, pose estimation, and multilingual OCR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Models are distributed via GitHub Releases (not in repo). Download .mlpackage files and place in the app directory to build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Depth Pro requires 1536x1536 fixed input (~1.2GB model). Added RAM requirement warning (iPhone 15 Pro+ / 6GB RAM). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ANE fails to compile the large DepthPro model. Switch to CPU+GPU compute units. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Model input is MLMultiArray (not pixelBuffer): convert BGRA→RGB Float16 - Output name is 'var_4563' (auto-generated), with fallback to first output - Handle Float16 output with vImage conversion - Add Accelerate import for vImage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Also handle Float16 output in fallback path with vImage conversion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Depth Pro (1.2GB, 1536x1536 fixed) crashes on all iPhones due to memory - Removed DepthProDemo app, conversion script, and README entry - BiRefNet: reduced input from 1024x1024 to 512x512 to fit iPhone memory - BiRefNet: switched from ANE to cpuAndGPU (ANE compilation fails) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Model uses Float16, not Float32. Reading Float32 from Float16 buffer produced garbage → NaN → UInt8 crash. - Input: write as Float16 via vImage conversion - Output: read as Float16 and convert to Float32 via vImage - Add NaN guard in mask-to-image conversion - Add Accelerate import for vImage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
UIImage from PhotosPicker can have rotation metadata (imageOrientation). CGImage ignores this, causing a 90-degree mismatch between mask and cutout. Normalize to .up orientation before extracting CGImage pixels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…trait pipeline - Remove DepthAnythingV2Demo (Apple official CoreML model available) - Remove WhisperDemo (WhisperKit provides full implementation) - Remove DWPoseDemo (Apple Vision API has built-in pose detection) - Remove corresponding conversion scripts - Update README to reference official implementations - Fix YOLOv10Demo to parse raw MultiArray output [1,300,6] - Implement full LivePortrait animation pipeline with 4-model inference - Add AppIcon.appiconset to YOLOv10Demo
- Add Accelerate-based STFT/iSTFT signal processing (vDSP FFT) - Implement stride-aware MLMultiArray extraction for Float16 outputs - Use time-domain output only (freq branch overflows Float16 → ±inf) - Audio loading with format conversion and resampling to 44.1kHz stereo - Per-stem WAV export and playback - Fix SwiftUI type-checker timeout in WaveformView Known issue: freq_output produces ±inf due to Float16 overflow in the model's frequency branch. Reconverting the model with Float32 outputs should enable freq+time reconstruction for better separation quality.
Draft conversion script to reconvert HTDemucs with Float32 precision. The current Float16 model causes overflow (±inf) in the frequency branch.
- freq_output overflows Float16 range (±inf) for real STFT data, even with Float32 internal computation (output tensor is Float16) - Use time-domain output only for stem reconstruction - Add F32 model to Xcode project (compute_precision=FLOAT32) - Simplify conversion script: ONNX-based, end-to-end model - Source order confirmed: drums, bass, other, vocals Known issue: Full freq+time reconstruction requires Float32 output tensors in the CoreML model. The current model spec forces Float16 output which cannot represent large STFT values (>65504). Time-only provides decent separation quality.
- Add computeSpectralInput: STFT with Python _spec padding, CaC channel format - Add inverseSpec: iSTFT matching Python _ispec (time padding + trim) - Feed actual STFT data to spectral_magnitude (was zeros, disabling freq branch) - Normalize STFT input (÷√N) to match Python torch.stft(normalized=True) - Compensate iSTFT output (×√N) for correct freq+time addition - Add stride-aware MLMultiArray fallback for non-contiguous layouts - Fix stem order: vocals=3, other=2 (matching Python model.sources) - Generalize forwardSTFT/inverseSTFT for variable frame counts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
conversion_scripts/New Models
Model Distribution
CoreML models (.mlpackage) will be distributed via GitHub Releases to avoid repo size limits. Users download models and place them in the app directory to build.
Test plan
🤖 Generated with Claude Code