Skip to content

Commit 7b592c2

Browse files
authored
feat(computer-vision)!: TextToImage returns file URI instead of base64 (#1180)
## Description Closes #888. Moves PNG encoding for `TextToImage` from JS (`pngjs`) into the native side via the existing `image_processing::saveToTempFile` helper, mirroring how `StyleTransfer`'s `url` output mode already works. `TextToImageModule.forward` (and the `useTextToImage` `generate` hook) now resolves to a `file://` URI pointing to a PNG on disk instead of a base64-encoded payload. Also tightens the `StyleTransferModule.forward` JSDoc to document the `'pixelData'` / `'url'` output modes — that was the doc-correction half of #888. The `pngjs` dependency is no longer needed and is dropped from `packages/react-native-executorch/package.json` and the `apps/computer-vision` example. ### Introduces a breaking change? - [x] Yes - [ ] No `TextToImageModule.forward` / `useTextToImage.generate` now resolves to a `file://` URI instead of a base64-encoded PNG string. Callers should switch from `data:image/png;base64,\${image}` to using the URI directly: ```tsx <Image source={{ uri: image }} /> ``` ### Type of change - [x] Bug fix (change which fixes an issue) - [ ] New feature (change which adds functionality) - [x] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [ ] iOS - [x] Android ### Testing instructions - Run the computer-vision example app -> Text to Image screen, generate an image with each supported model. Image should render correctly from the returned URI. - Verify interrupt still works mid-generation (returned URI string is empty). - Verify Style Transfer's `forward(..., 'url')` mode still returns a `file://` URI (no behavior change, doc-only). ### Screenshots <!-- Add screenshots here, if applicable --> ### Related issues Closes #888 ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings ### Additional notes The C++ integration tests covering `generate` are still `GTEST_SKIP`-ed pending the existing UNet emulator issue, but I refreshed them so they exercise the new URI return shape when re-enabled.
1 parent 5022de3 commit 7b592c2

11 files changed

Lines changed: 62 additions & 91 deletions

File tree

apps/computer-vision/app/text_to_image/index.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ export default function TextToImageScreen() {
143143
<Image
144144
style={styles.image}
145145
resizeMode="contain"
146-
source={{ uri: `data:image/png;base64,${image}` }}
146+
source={{ uri: image }}
147147
/>
148148
) : (
149149
<View style={styles.infoContainer}>

apps/computer-vision/package.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@
4545
},
4646
"devDependencies": {
4747
"@babel/core": "^7.29.0",
48-
"@types/pngjs": "^6.0.5",
4948
"@types/react": "~19.2.0",
5049
"@types/react-refresh": "^0",
5150
"babel-preset-expo": "~55.0.16",

docs/docs/03-hooks/02-computer-vision/useTextToImage.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,8 @@ function App() {
8282
}
8383
//...
8484

85-
return <Image source={{ uri: `data:image/png;base64,${image}` }} />;
85+
// `generate` returns a `file://` URI to the PNG saved on disk.
86+
return <Image source={{ uri: image }} />;
8687
}
8788
```
8889

packages/react-native-executorch/common/rnexecutorch/models/text_to_image/TextToImage.cpp

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@
55
#include <span>
66

77
#include <executorch/extension/tensor/tensor.h>
8+
#include <opencv2/opencv.hpp>
89

910
#include <rnexecutorch/Log.h>
11+
#include <rnexecutorch/data_processing/ImageProcessing.h>
1012
#include <rnexecutorch/models/text_to_image/Constants.h>
1113

1214
#include <rnexecutorch/Error.h>
@@ -54,10 +56,9 @@ void TextToImage::setSeed(int32_t &seed) {
5456
seed = rd();
5557
}
5658

57-
std::shared_ptr<OwningArrayBuffer>
58-
TextToImage::generate(std::string input, int32_t imageSize,
59-
size_t numInferenceSteps, int32_t seed,
60-
std::shared_ptr<jsi::Function> callback) {
59+
std::string TextToImage::generate(std::string input, int32_t imageSize,
60+
size_t numInferenceSteps, int32_t seed,
61+
std::shared_ptr<jsi::Function> callback) {
6162
std::scoped_lock lock(inference_mutex_);
6263
setImageSize(imageSize);
6364
setSeed(seed);
@@ -105,7 +106,7 @@ TextToImage::generate(std::string input, int32_t imageSize,
105106
}
106107
if (interrupted) {
107108
interrupted = false;
108-
return std::make_shared<OwningArrayBuffer>(0);
109+
return "";
109110
}
110111

111112
for (auto &val : latents) {
@@ -116,18 +117,20 @@ TextToImage::generate(std::string input, int32_t imageSize,
116117
return postprocess(output);
117118
}
118119

119-
std::shared_ptr<OwningArrayBuffer>
120-
TextToImage::postprocess(const std::vector<float> &output) const {
121-
// Convert RGB to RGBA
122-
int32_t imagePixelCount = imageSize * imageSize;
123-
std::vector<uint8_t> outputRgba(imagePixelCount * 4);
124-
for (int32_t i = 0; i < imagePixelCount; i++) {
125-
outputRgba[i * 4 + 0] = output[i * 3 + 0];
126-
outputRgba[i * 4 + 1] = output[i * 3 + 1];
127-
outputRgba[i * 4 + 2] = output[i * 3 + 2];
128-
outputRgba[i * 4 + 3] = 255;
120+
std::string TextToImage::postprocess(const std::vector<float> &output) const {
121+
// Decoder output is HWC float RGB (values already in [0..255]). cv::imwrite
122+
// expects a BGR matrix, so pack the channels in BGR order here.
123+
cv::Mat bgr(imageSize, imageSize, CV_8UC3);
124+
for (int32_t y = 0; y < imageSize; ++y) {
125+
auto *row = bgr.ptr<cv::Vec3b>(y);
126+
for (int32_t x = 0; x < imageSize; ++x) {
127+
const int32_t idx = (y * imageSize + x) * 3;
128+
row[x] = cv::Vec3b(static_cast<uint8_t>(output[idx + 2]),
129+
static_cast<uint8_t>(output[idx + 1]),
130+
static_cast<uint8_t>(output[idx + 0]));
131+
}
129132
}
130-
return std::make_shared<OwningArrayBuffer>(outputRgba);
133+
return image_processing::saveToTempFile(bgr);
131134
}
132135

133136
void TextToImage::interrupt() noexcept { interrupted = true; }

packages/react-native-executorch/common/rnexecutorch/models/text_to_image/TextToImage.h

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
#include <ReactCommon/CallInvoker.h>
99
#include <jsi/jsi.h>
1010

11-
#include <rnexecutorch/jsi/OwningArrayBuffer.h>
1211
#include <rnexecutorch/metaprogramming/ConstructorHelpers.h>
1312

1413
#include <rnexecutorch/models/text_to_image/Decoder.h>
@@ -30,18 +29,17 @@ class TextToImage final {
3029
int32_t schedulerNumTrainTimesteps,
3130
int32_t schedulerStepsOffset,
3231
std::shared_ptr<react::CallInvoker> callInvoker);
33-
std::shared_ptr<OwningArrayBuffer>
34-
generate(std::string input, int32_t imageSize, size_t numInferenceSteps,
35-
int32_t seed, std::shared_ptr<jsi::Function> callback);
32+
std::string generate(std::string input, int32_t imageSize,
33+
size_t numInferenceSteps, int32_t seed,
34+
std::shared_ptr<jsi::Function> callback);
3635
void interrupt() noexcept;
3736
size_t getMemoryLowerBound() const noexcept;
3837
void unload() noexcept;
3938

4039
private:
4140
void setImageSize(int32_t imageSize);
4241
void setSeed(int32_t &seed);
43-
std::shared_ptr<OwningArrayBuffer>
44-
postprocess(const std::vector<float> &output) const;
42+
std::string postprocess(const std::vector<float> &output) const;
4543

4644
size_t memorySizeLowerBound;
4745
int32_t imageSize;

packages/react-native-executorch/common/rnexecutorch/tests/integration/TextToImageTest.cpp

Lines changed: 21 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
#include "BaseModelTests.h"
22
#include <gtest/gtest.h>
3+
#include <opencv2/opencv.hpp>
34
#include <rnexecutorch/Error.h>
45
#include <rnexecutorch/models/text_to_image/TextToImage.h>
56
#include <string>
7+
#include <string_view>
68

79
using namespace rnexecutorch;
810
using namespace rnexecutorch::models::text_to_image;
@@ -111,7 +113,7 @@ TEST(TextToImageGenerateTests, ZeroStepsThrows) {
111113
RnExecutorchError);
112114
}
113115

114-
TEST(TextToImageGenerateTests, GenerateReturnsNonNull) {
116+
TEST(TextToImageGenerateTests, GenerateReturnsFileUri) {
115117
// TODO: Investigate source of the issue
116118
GTEST_SKIP() << "Skipping TextToImage generation test in emulator "
117119
"environment due to UNet forward call throwing error no. 1";
@@ -120,22 +122,8 @@ TEST(TextToImageGenerateTests, GenerateReturnsNonNull) {
120122
kSchedulerNumTrainTimesteps, kSchedulerStepsOffset,
121123
createMockCallInvoker());
122124
auto result = model.generate("a cat", 128, 1, 42, nullptr);
123-
EXPECT_NE(result, nullptr);
124-
}
125-
126-
TEST(TextToImageGenerateTests, GenerateReturnsCorrectSize) {
127-
// TODO: Investigate source of the issue
128-
GTEST_SKIP() << "Skipping TextToImage generation test in emulator "
129-
"environment due to UNet forward call throwing error no. 1";
130-
TextToImage model(kValidTokenizerPath, kValidEncoderPath, kValidUnetPath,
131-
kValidDecoderPath, kSchedulerBetaStart, kSchedulerBetaEnd,
132-
kSchedulerNumTrainTimesteps, kSchedulerStepsOffset,
133-
createMockCallInvoker());
134-
int32_t imageSize = 128;
135-
auto result = model.generate("a cat", imageSize, 1, 42, nullptr);
136-
ASSERT_NE(result, nullptr);
137-
size_t expectedSize = imageSize * imageSize * 4;
138-
EXPECT_EQ(result->size(), expectedSize);
125+
EXPECT_FALSE(result.empty());
126+
EXPECT_TRUE(result.starts_with("file://"));
139127
}
140128

141129
TEST(TextToImageGenerateTests, SameSeedProducesSameResult) {
@@ -146,15 +134,20 @@ TEST(TextToImageGenerateTests, SameSeedProducesSameResult) {
146134
kValidDecoderPath, kSchedulerBetaStart, kSchedulerBetaEnd,
147135
kSchedulerNumTrainTimesteps, kSchedulerStepsOffset,
148136
createMockCallInvoker());
149-
auto result1 = model.generate("a cat", 128, 1, 42, nullptr);
150-
auto result2 = model.generate("a cat", 128, 1, 42, nullptr);
151-
ASSERT_NE(result1, nullptr);
152-
ASSERT_NE(result2, nullptr);
153-
ASSERT_EQ(result1->size(), result2->size());
154-
155-
auto data1 = static_cast<uint8_t *>(result1->data());
156-
auto data2 = static_cast<uint8_t *>(result2->data());
157-
for (size_t i = 0; i < result1->size(); i++) {
158-
EXPECT_EQ(data1[i], data2[i]) << "at index: " << i;
159-
}
137+
auto path1 = model.generate("a cat", 128, 1, 42, nullptr);
138+
auto path2 = model.generate("a cat", 128, 1, 42, nullptr);
139+
ASSERT_FALSE(path1.empty());
140+
ASSERT_FALSE(path2.empty());
141+
142+
const std::string kScheme = "file://";
143+
auto stripScheme = [&kScheme](const std::string &uri) {
144+
return uri.starts_with(kScheme) ? uri.substr(kScheme.size()) : uri;
145+
};
146+
cv::Mat img1 = cv::imread(stripScheme(path1), cv::IMREAD_UNCHANGED);
147+
cv::Mat img2 = cv::imread(stripScheme(path2), cv::IMREAD_UNCHANGED);
148+
ASSERT_FALSE(img1.empty());
149+
ASSERT_FALSE(img2.empty());
150+
ASSERT_EQ(img1.size(), img2.size());
151+
ASSERT_EQ(img1.type(), img2.type());
152+
EXPECT_EQ(cv::countNonZero(img1 != img2), 0);
160153
}

packages/react-native-executorch/package.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,6 @@
124124
"@huggingface/jinja": "^0.5.0",
125125
"jsonrepair": "^3.12.0",
126126
"jsonschema": "^1.5.0",
127-
"pngjs": "^7.0.0",
128127
"zod": "^4.3.6"
129128
}
130129
}

packages/react-native-executorch/src/modules/computer_vision/StyleTransferModule.ts

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,16 @@ export class StyleTransferModule extends VisionModule<PixelData | string> {
6464
);
6565
}
6666

67+
/**
68+
* Executes style transfer on the provided image.
69+
* @param input - Image source (string path/URI or `PixelData` from a frame library).
70+
* @param outputType - Controls the output format. Defaults to `'pixelData'`, which
71+
* returns raw RGBA pixels suitable for direct rendering. Pass `'url'` to
72+
* have the stylized image saved to a temporary PNG on the device and
73+
* receive a `file://` URI string instead.
74+
* @returns A Promise resolving to either a `PixelData` object or a `file://` URI string,
75+
* depending on `outputType`.
76+
*/
6777
async forward<O extends 'pixelData' | 'url' = 'pixelData'>(
6878
input: string | PixelData,
6979
outputType?: O

packages/react-native-executorch/src/modules/computer_vision/TextToImageModule.ts

Lines changed: 4 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ import { ResourceSource } from '../../types/common';
33
import { TextToImageModelName } from '../../types/tti';
44
import { BaseModule } from '../BaseModule';
55

6-
import { PNG } from 'pngjs/browser';
76
import { RnExecutorchErrorCode } from '../../errors/ErrorCodes';
87
import { parseUnknownError, RnExecutorchError } from '../../errors/errorUtils';
98
import { Logger } from '../../common/Logger';
@@ -147,40 +146,27 @@ export class TextToImageModule extends BaseModule {
147146

148147
/**
149148
* Runs the model to generate an image described by `input`, and conditioned by `seed`, performing `numSteps` inference steps.
150-
* The resulting image, with dimensions `imageSize`×`imageSize` pixels, is returned as a base64-encoded string.
149+
* The resulting image, with dimensions `imageSize`×`imageSize` pixels, is saved as a PNG on the device and returned as a `file://` URI.
150+
* If generation is interrupted before completion, an empty string is returned.
151151
* @param input - The text prompt to generate the image from.
152152
* @param imageSize - The desired width and height of the output image in pixels.
153153
* @param numSteps - The number of inference steps to perform.
154154
* @param seed - An optional seed for random number generation to ensure reproducibility.
155-
* @returns A Base64-encoded string representing the generated PNG image.
155+
* @returns A `file://` URI pointing to the generated PNG, or an empty string if generation was interrupted.
156156
*/
157157
async forward(
158158
input: string,
159159
imageSize: number = 512,
160160
numSteps: number = 5,
161161
seed?: number
162162
): Promise<string> {
163-
const output = await this.nativeModule.generate(
163+
return await this.nativeModule.generate(
164164
input,
165165
imageSize,
166166
numSteps,
167167
seed ? seed : -1,
168168
this.inferenceCallback
169169
);
170-
const outputArray = new Uint8Array(output);
171-
if (!outputArray.length) {
172-
return '';
173-
}
174-
const png = new PNG({ width: imageSize, height: imageSize });
175-
png.data = outputArray as unknown as Buffer;
176-
const pngBuffer = PNG.sync.write(png, { colorType: 6 });
177-
const pngArray = new Uint8Array(pngBuffer as unknown as ArrayBufferLike);
178-
let binary = '';
179-
const chunkSize = 8192;
180-
for (let i = 0; i < pngArray.length; i += chunkSize) {
181-
binary += String.fromCharCode(...pngArray.subarray(i, i + chunkSize));
182-
}
183-
return btoa(binary);
184170
}
185171

186172
/**

packages/react-native-executorch/src/types/tti.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ export interface TextToImageType {
8181
* @param [imageSize] - Optional. The target width and height of the generated image (e.g., 512 for 512x512). Defaults to the model's standard size if omitted.
8282
* @param [numSteps] - Optional. The number of denoising steps for the diffusion process. More steps generally yield higher quality at the cost of generation time.
8383
* @param [seed] - Optional. A random seed for reproducible generation. Should be a positive integer.
84-
* @returns A Promise that resolves to a string representing the generated image (e.g., base64 string or file URI).
84+
* @returns A Promise that resolves to a `file://` URI pointing to the generated PNG on the device, or an empty string if generation was interrupted.
8585
* @throws {RnExecutorchError} If the model is not loaded or is currently generating another image.
8686
*/
8787
generate: (

0 commit comments

Comments
 (0)