Skip to content

Commit ed9ffa5

Browse files
authored
Example/doc-update (pytorch#20121) (pytorch#20121)
Summary: Pull Request resolved: pytorch#20121 Differential Revision: D107922134
1 parent a9d5674 commit ed9ffa5

4 files changed

Lines changed: 130 additions & 37 deletions

File tree

Makefile

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -261,7 +261,8 @@ parakeet-vulkan:
261261

262262
dinov2-cuda:
263263
@echo "==> Building and installing ExecuTorch with CUDA..."
264-
cmake --workflow --preset llm-release-cuda
264+
cmake --preset llm-release-cuda -DEXECUTORCH_BUILD_EXTENSION_IMAGE=ON
265+
cmake --build --preset llm-release-cuda-install
265266
@echo "==> Building DINOv2 runner with CUDA..."
266267
cd examples/models/dinov2 && cmake --workflow --preset dinov2-cuda
267268
@echo ""
@@ -270,7 +271,8 @@ dinov2-cuda:
270271

271272
dinov2-cuda-debug:
272273
@echo "==> Building and installing ExecuTorch with CUDA (debug mode)..."
273-
cmake --workflow --preset llm-debug-cuda
274+
cmake --preset llm-debug-cuda -DEXECUTORCH_BUILD_EXTENSION_IMAGE=ON
275+
cmake --build --preset llm-debug-cuda-install
274276
@echo "==> Building DINOv2 runner with CUDA (debug mode)..."
275277
cd examples/models/dinov2 && cmake --workflow --preset dinov2-cuda-debug
276278
@echo ""

docs/source/working-with-cv-models.md

Lines changed: 85 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,9 +56,58 @@ If the model expects a crop after resizing, keep that policy in exactly one plac
5656

5757
Most mobile image APIs expose decoded pixels as interleaved rows. Most PyTorch vision models expect channels-first tensors. If preprocessing stays in the app, explicitly pack pixels into the model's expected layout.
5858

59+
ExecuTorch ships a C++ `ImageProcessor` (`extension/image`) that resizes, color-converts, and normalizes pixels into a channels-first `Tensor<Float>`, with a Swift and Objective-C binding on iOS. Prefer it where available; the per-platform helpers below show the manual packing path for when you are not using it.
60+
61+
### C++
62+
63+
For native runners and JNI code, call the C++ `ImageProcessor` directly. Decode the image yourself (for example with `stb_image`) into an 8-bit `RGBA` or `BGRA` buffer; `ImageProcessor` then resizes, converts to RGB, and normalizes into a `[1, 3, target_height, target_width]` `float32` tensor. Link against `extension_image`.
64+
65+
```cpp
66+
#include <executorch/extension/image/image_processor.h>
67+
#include <executorch/extension/module/module.h>
68+
69+
using executorch::extension::Module;
70+
using executorch::extension::image::ColorFormat;
71+
using executorch::extension::image::ImageProcessor;
72+
using executorch::extension::image::ImageProcessorConfig;
73+
using executorch::extension::image::Normalization;
74+
75+
// Decode to interleaved 8-bit RGBA (alpha is ignored). ImageProcessor does not
76+
// decode JPEG/PNG; bring your own decoder.
77+
int width = 0, height = 0, channels = 0;
78+
uint8_t* rgba = stbi_load(path, &width, &height, &channels, /*req_comp=*/4);
79+
80+
ImageProcessorConfig config;
81+
config.target_width = 224;
82+
config.target_height = 224;
83+
config.normalization = Normalization::imagenet(); // or zeroToOne(), or custom
84+
// config.resize_mode = ResizeMode::LETTERBOX; // default: STRETCH
85+
86+
ImageProcessor processor(config);
87+
88+
// Resize + RGB conversion + normalization -> [1, 3, 224, 224] float32 tensor.
89+
auto result = processor.process(
90+
rgba, width, height, /*stride_bytes=*/width * 4, ColorFormat::RGBA);
91+
if (!result.ok()) {
92+
// Inspect result.error() and bail out.
93+
}
94+
auto input = result.get(); // TensorPtr, shape [1, 3, 224, 224], float32, RGB
95+
96+
Module module("model.pte");
97+
const auto outputs = module.forward(*input);
98+
```
99+
100+
The same processor covers a few related cases:
101+
102+
- **YUV camera frames:** call `process_yuv(...)` with `YUVFormat::NV12` or `NV21`.
103+
- **Video:** preallocate a contiguous `[1, 3, target_height, target_width]` `float32` tensor and call `process_into(...)` to reuse it across frames and avoid per-frame allocations.
104+
- **Rotated source:** pass `Orientation::DOWN`, `RIGHT`, or `LEFT`.
105+
106+
See `examples/models/dinov2/main.cpp` for a complete runner.
107+
59108
### Android
60109
61-
For production Android preprocessing, handle decoding, EXIF orientation, and camera-specific transforms before packing pixels into the input tensor. The following Kotlin helper keeps the layout conversion explicit: it resizes a `Bitmap`, reads RGB pixels, applies ImageNet-style normalization, and packs the result as `NCHW` `float32` data for `Tensor.fromBlob`.
110+
For production Android preprocessing, handle decoding, EXIF orientation, and camera-specific transforms before packing pixels into the input tensor. There is no Java or Kotlin binding for the C++ `ImageProcessor` yet, so on Android either call it through JNI or pack the tensor in app code. The following Kotlin helper keeps the layout conversion explicit: it resizes a `Bitmap`, reads RGB pixels, applies ImageNet-style normalization, and packs the result as `NCHW` `float32` data for `Tensor.fromBlob`.
62111
63112
```kotlin
64113
import android.graphics.Bitmap
@@ -104,7 +153,41 @@ val inputTensor = Tensor.fromBlobUnsigned(
104153

105154
### iOS
106155

107-
For production iOS preprocessing, prefer platform image APIs and Accelerate, such as vImage for resizing and color conversion and vDSP for normalization, especially for camera frames or other hot paths. The following Swift helper keeps the layout conversion explicit so the tensor contract is easy to inspect: it draws a `UIImage` into a fixed-size RGB buffer, uses vDSP to normalize RGB channels, and creates a channels-first `Tensor<Float>`.
156+
For production iOS preprocessing from a `CVPixelBuffer`, prefer the `ImageProcessor` included in the ExecuTorch iOS framework. It handles resize, color conversion, and normalization from a `CVPixelBuffer` to a channels-first `Tensor<Float>`, so you avoid hand-written pixel packing. This is a good fit for camera frames and other hot paths.
157+
158+
```swift
159+
import ExecuTorch
160+
161+
// Configure once and reuse across frames.
162+
let config = ImageProcessorConfig(
163+
targetWidth: 224,
164+
targetHeight: 224,
165+
normalization: .imagenet()
166+
)
167+
let processor = ImageProcessor(config: config)
168+
169+
// Process a CVPixelBuffer (BGRA, RGBA, 8-bit NV12, or 10-bit P010).
170+
let input: Tensor<Float> = try processor.process(pixelBuffer)
171+
// input shape: [1, 3, 224, 224], RGB, channels-first
172+
```
173+
174+
- Normalization: `.zeroToOne()`, `.imagenet()`, or a custom `ImageNormalization(scaleFactor:mean:standardDeviation:)` for models such as CLIP or detection/segmentation backbones.
175+
- Resize: `.stretch` (default) or `.letterbox` (with `letterboxAnchor` and `padValue`); use `computeLetterboxPadding(inputWidth:inputHeight:)` to map outputs back to source coordinates.
176+
- Pass `orientation:` when the source buffer is rotated, for example from capture metadata.
177+
- For sustained video, reuse an output tensor to avoid per-frame allocations:
178+
179+
```swift
180+
let output = Tensor<Float>.zeros(shape: [1, 3, 224, 224])
181+
try processor.process(pixelBuffer, into: output)
182+
```
183+
184+
An `ImageProcessor` instance is not thread-safe; use one instance per concurrent caller.
185+
186+
You can still use `ImageProcessor` with a `UIImage` or `CGImage`: render it into a `CVPixelBuffer` (draw the `CGImage` into a `CGContext` backed by a BGRA buffer), then call `process(_:)`. This keeps preprocessing identical to the camera path. (The C++ `ImageProcessor::process(...)` accepts a raw RGBA/BGRA buffer directly, but only the `CVPixelBuffer` entry points are exposed to Swift and Objective-C today.)
187+
188+
`ImageProcessor` is tuned for performance: it handles common pixel formats (BGRA, RGBA, and semi-planar YUV) and picks CPU or GPU based on image size. Matching its throughput by hand is hard, so reach for manual packing only when you need full control of the conversion, or behavior `ImageProcessor` does not provide.
189+
190+
The Swift helper below shows that manual path. It draws a `UIImage` into a fixed-size RGB buffer, normalizes the RGB channels with vDSP, and creates a channels-first `Tensor<Float>`, keeping the layout conversion explicit so the tensor contract is easy to inspect.
108191

109192
```swift
110193
import Accelerate

examples/models/dinov2/CMakeLists.txt

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,18 @@ if(TARGET optimized_native_cpu_ops_lib)
4141
endif()
4242

4343
# Add the required ExecuTorch extensions
44-
list(APPEND link_libraries extension_module extension_data_loader
45-
extension_tensor extension_flat_tensor
44+
list(
45+
APPEND
46+
link_libraries
47+
extension_module
48+
extension_data_loader
49+
extension_tensor
50+
extension_flat_tensor
51+
extension_image
4652
)
4753

48-
# stb_image: lightweight library to load and resize images
54+
# stb_image: lightweight header-only library used to decode the input image
55+
# (ImageProcessor handles resize and normalization).
4956
include(FetchContent)
5057
FetchContent_Declare(
5158
stb

examples/models/dinov2/main.cpp

Lines changed: 31 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,10 @@
2525

2626
#define STB_IMAGE_IMPLEMENTATION
2727
#include <stb_image.h>
28-
#define STB_IMAGE_RESIZE_IMPLEMENTATION
29-
#include <stb_image_resize.h>
3028

3129
#include <gflags/gflags.h>
3230

31+
#include <executorch/extension/image/image_processor.h>
3332
#include <executorch/extension/module/module.h>
3433
#include <executorch/extension/tensor/tensor_ptr.h>
3534
#include <executorch/extension/tensor/tensor_ptr_maker.h>
@@ -56,47 +55,49 @@ DEFINE_bool(
5655

5756
using ::executorch::extension::from_blob;
5857
using ::executorch::extension::Module;
58+
using ::executorch::extension::image::ColorFormat;
59+
using ::executorch::extension::image::ImageProcessor;
60+
using ::executorch::extension::image::ImageProcessorConfig;
61+
using ::executorch::extension::image::Normalization;
5962
using ::executorch::runtime::Error;
6063
using ::executorch::runtime::EValue;
6164

6265
namespace {
6366

64-
// ImageNet normalization constants
65-
constexpr float kImageNetMean[] = {0.485f, 0.456f, 0.406f};
66-
constexpr float kImageNetStd[] = {0.229f, 0.224f, 0.225f};
67-
6867
/**
69-
* Load an image file, resize to target_size x target_size, and apply
70-
* ImageNet normalization. Returns CHW float data.
68+
* Load an image file, then resize to target_size x target_size and apply
69+
* ImageNet normalization with ImageProcessor. Returns CHW float data.
7170
*/
7271
std::vector<float> load_image(const std::string& path, int target_size) {
73-
int width, height, channels;
74-
unsigned char* raw = stbi_load(path.c_str(), &width, &height, &channels, 3);
75-
if (!raw) {
72+
int width = 0, height = 0, channels = 0;
73+
// Decode as RGBA; ImageProcessor accepts BGRA/RGBA and discards alpha.
74+
unsigned char* rgba = stbi_load(path.c_str(), &width, &height, &channels, 4);
75+
if (!rgba) {
7676
ET_LOG(Error, "Failed to load image: %s", path.c_str());
7777
return {};
7878
}
7979

80-
// Resize to target_size x target_size
81-
std::vector<unsigned char> resized(target_size * target_size * 3);
82-
stbir_resize_uint8(
83-
raw, width, height, 0, resized.data(), target_size, target_size, 0, 3);
84-
stbi_image_free(raw);
85-
86-
// Convert to CHW float with ImageNet normalization
87-
size_t spatial = target_size * target_size;
88-
std::vector<float> chw_data(3 * spatial);
89-
for (int h = 0; h < target_size; ++h) {
90-
for (int w = 0; w < target_size; ++w) {
91-
int hwc_idx = (h * target_size + w) * 3;
92-
for (int c = 0; c < 3; ++c) {
93-
float pixel = static_cast<float>(resized[hwc_idx + c]) / 255.0f;
94-
chw_data[c * spatial + h * target_size + w] =
95-
(pixel - kImageNetMean[c]) / kImageNetStd[c];
96-
}
97-
}
80+
ImageProcessorConfig config;
81+
config.target_width = target_size;
82+
config.target_height = target_size;
83+
config.normalization = Normalization::imagenet();
84+
85+
ImageProcessor processor(config);
86+
auto result = processor.process(
87+
rgba, width, height, /*stride_bytes=*/width * 4, ColorFormat::RGBA);
88+
stbi_image_free(rgba);
89+
if (!result.ok()) {
90+
ET_LOG(
91+
Error,
92+
"Failed to preprocess image: %d",
93+
static_cast<int>(result.error()));
94+
return {};
9895
}
99-
return chw_data;
96+
97+
// Copy the [1, 3, target_size, target_size] float output into a CHW vector.
98+
const auto tensor = result.get();
99+
const float* data = tensor->const_data_ptr<float>();
100+
return std::vector<float>(data, data + tensor->numel());
100101
}
101102

102103
/**

0 commit comments

Comments
 (0)