You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/working-with-cv-models.md
+85-2Lines changed: 85 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,9 +56,58 @@ If the model expects a crop after resizing, keep that policy in exactly one plac
56
56
57
57
Most mobile image APIs expose decoded pixels as interleaved rows. Most PyTorch vision models expect channels-first tensors. If preprocessing stays in the app, explicitly pack pixels into the model's expected layout.
58
58
59
+
ExecuTorch ships a C++ `ImageProcessor` (`extension/image`) that resizes, color-converts, and normalizes pixels into a channels-first `Tensor<Float>`, with a Swift and Objective-C binding on iOS. Prefer it where available; the per-platform helpers below show the manual packing path for when you are not using it.
60
+
61
+
### C++
62
+
63
+
For native runners and JNI code, call the C++ `ImageProcessor` directly. Decode the image yourself (for example with `stb_image`) into an 8-bit `RGBA` or `BGRA` buffer; `ImageProcessor` then resizes, converts to RGB, and normalizes into a `[1, 3, target_height, target_width]``float32` tensor. Link against `extension_image`.
- **YUV camera frames:** call `process_yuv(...)` with `YUVFormat::NV12` or `NV21`.
103
+
- **Video:** preallocate a contiguous `[1, 3, target_height, target_width]` `float32` tensor and call `process_into(...)` to reuse it across frames and avoid per-frame allocations.
104
+
- **Rotated source:** pass `Orientation::DOWN`, `RIGHT`, or `LEFT`.
105
+
106
+
See `examples/models/dinov2/main.cpp` for a complete runner.
107
+
59
108
### Android
60
109
61
-
For production Android preprocessing, handle decoding, EXIF orientation, and camera-specific transforms before packing pixels into the input tensor. The following Kotlin helper keeps the layout conversion explicit: it resizes a `Bitmap`, reads RGB pixels, applies ImageNet-style normalization, and packs the result as `NCHW``float32` data for `Tensor.fromBlob`.
110
+
For production Android preprocessing, handle decoding, EXIF orientation, and camera-specific transforms before packing pixels into the input tensor. There is no Java or Kotlin binding for the C++ `ImageProcessor` yet, so on Android either call it through JNI or pack the tensor in app code. The following Kotlin helper keeps the layout conversion explicit: it resizes a `Bitmap`, reads RGB pixels, applies ImageNet-style normalization, and packs the result as `NCHW` `float32` data for `Tensor.fromBlob`.
62
111
63
112
```kotlin
64
113
import android.graphics.Bitmap
@@ -104,7 +153,41 @@ val inputTensor = Tensor.fromBlobUnsigned(
104
153
105
154
### iOS
106
155
107
-
For production iOS preprocessing, prefer platform image APIs and Accelerate, such as vImage for resizing and color conversion and vDSP for normalization, especially for camera frames or other hot paths. The following Swift helper keeps the layout conversion explicit so the tensor contract is easy to inspect: it draws a `UIImage` into a fixed-size RGB buffer, uses vDSP to normalize RGB channels, and creates a channels-first `Tensor<Float>`.
156
+
For production iOS preprocessing from a `CVPixelBuffer`, prefer the `ImageProcessor` included in the ExecuTorch iOS framework. It handles resize, color conversion, and normalization from a `CVPixelBuffer` to a channels-first `Tensor<Float>`, so you avoid hand-written pixel packing. This is a good fit for camera frames and other hot paths.
157
+
158
+
```swift
159
+
importExecuTorch
160
+
161
+
// Configure once and reuse across frames.
162
+
let config =ImageProcessorConfig(
163
+
targetWidth: 224,
164
+
targetHeight: 224,
165
+
normalization: .imagenet()
166
+
)
167
+
let processor =ImageProcessor(config: config)
168
+
169
+
// Process a CVPixelBuffer (BGRA, RGBA, 8-bit NV12, or 10-bit P010).
170
+
let input: Tensor<Float> =try processor.process(pixelBuffer)
- Normalization: `.zeroToOne()`, `.imagenet()`, or a custom `ImageNormalization(scaleFactor:mean:standardDeviation:)` for models such as CLIP or detection/segmentation backbones.
175
+
- Resize: `.stretch` (default) or `.letterbox` (with `letterboxAnchor` and `padValue`); use `computeLetterboxPadding(inputWidth:inputHeight:)` to map outputs back to source coordinates.
176
+
- Pass `orientation:` when the source buffer is rotated, for example from capture metadata.
177
+
- For sustained video, reuse an output tensor to avoid per-frame allocations:
178
+
179
+
```swift
180
+
let output = Tensor<Float>.zeros(shape: [1, 3, 224, 224])
181
+
try processor.process(pixelBuffer, into: output)
182
+
```
183
+
184
+
An `ImageProcessor` instance is not thread-safe; use one instance per concurrent caller.
185
+
186
+
You can still use `ImageProcessor` with a `UIImage` or `CGImage`: render it into a `CVPixelBuffer` (draw the `CGImage` into a `CGContext` backed by a BGRA buffer), then call `process(_:)`. This keeps preprocessing identical to the camera path. (The C++ `ImageProcessor::process(...)` accepts a raw RGBA/BGRA buffer directly, but only the `CVPixelBuffer` entry points are exposed to Swift and Objective-C today.)
187
+
188
+
`ImageProcessor` is tuned for performance: it handles common pixel formats (BGRA, RGBA, and semi-planar YUV) and picks CPU or GPU based on image size. Matching its throughput by hand is hard, so reach for manual packing only when you need full control of the conversion, or behavior `ImageProcessor` does not provide.
189
+
190
+
The Swift helper below shows that manual path. It draws a `UIImage` into a fixed-size RGB buffer, normalizes the RGB channels with vDSP, and creates a channels-first `Tensor<Float>`, keeping the layout conversion explicit so the tensor contract is easy to inspect.
0 commit comments