feat: Add text to image pipeline by a-szymanska · Pull Request #586 · software-mansion/react-native-executorch

a-szymanska · 2025-09-05T13:04:11Z

Description

Introducing support for text-to-image tasks following the Diffusion Pipeline. Adding the TextToImageModule and the useTextToImage hook to access the models.

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Run the computer vision app to test image generation with the BK-SDM-Tiny model for 256×256 or 512×512 outputs.

⚠️ Testing the model requires a phone with a reasonably large amount of RAM (preferably at least 8 GB for 256 model).

Screenshots

Related issues

Closes #585

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

msluszniak

First batch of comments, I'll sent more in a while

chmjkb · 2025-09-11T08:46:17Z

+  async forward(input: string, numSteps: number): Promise<Float32Array> {
+    return new Float32Array(await this.nativeModule.generate(input, numSteps));
+  }
+}


You don't have to do this but FYI you can also return JSTensorViewOut from C++, similarly to what is done in forwardJS in BaseModel. This maps to the following JS type:

export interface TensorPtr { dataPtr: TensorBuffer; sizes: number[]; scalarType: ScalarType; }

chmjkb · 2025-09-11T08:51:09Z

+    : BaseModel(modelSource, callInvoker), modelImageSize(modelImageSize),
+      numChannels(numChannels) {}
+
+std::vector<float> UNet::generate(std::vector<float> &latents, int32_t timestep,


Why not a const ref for latents and embeddings? Is it because make_tensor_ptr accepts non-const pointers?

Exactly, that's the reason

But these creation of tensors still might be improved. @a-szymanska look at the available overloads of make_tensor_ptr: https://github.com/pytorch/executorch/blob/6ed10e5728e1d244f9200e0088a2e2d404355d02/extension/tensor/tensor_ptr.h. Look that, e.g. there is templated overload of creating tensor from vector, so instead of this:

auto latentsTensor = make_tensor_ptr(latentsShape, latents.data(), ScalarType::Float);

you can probably do something like this (this overload: https://github.com/pytorch/executorch/blob/6ed10e5728e1d244f9200e0088a2e2d404355d02/extension/tensor/tensor_ptr.h#L178):

auto latentsTensor = make_tensor_ptr<float>(latentsShape, latents);

Also if tensor is one-dimensional, you can pass only vector, without shape, so again:

std::vector<int32_t> timestepShape = {1}; std::vector<int64_t> timestepData = {static_cast<int64_t>(timestep)}; auto timestepTensor = make_tensor_ptr(timestepShape, timestepData.data(), ScalarType::Long);

might be this (https://github.com/pytorch/executorch/blob/6ed10e5728e1d244f9200e0088a2e2d404355d02/extension/tensor/tensor_ptr.h#L255):

auto timestepTensor = make_tensor_ptr<int64_t>({static_cast<int64_t>(timestep)});

Hmm, but generally I see that these values are taken by value for these template functions so probably we can mark these vector arguments const as well. Previously better template candidate was function with void* pointer which blocked the const std::vector, @a-szymanska could you analyse these as well and mark vector arguments const if these will work for the mentioned make_tensor_ptr overloads?

I would advise against using the vector overloads here, because they create another data copy. Actually the cleanest way would be to move creation of latentsConcat inside of the UNet class, then we can pass const reference but still use proper make_tensor_ptr overload

But there is no latentsConcat in Unet I guess. Other than that, the second case is extremaly cheap to copy (one element) and it's probably faster then passing by reference. In the first case, indeed as I proposed it there will be a copy, but this object is ~~not used anymore so we can move it~~ ok, it's not so obvious, I need to check it. I don't understand what you wanted to do here, can you elaborate?

Oh, I see, move creating the latents which are now passed from TextToImage (here named latentsConcat) to UNet class. OK, that makes sense.

In the scalar value case I don't really care, you are right, it is cheap.
For the latents argument we can move this:

std::vector<float> latentsConcat; latentsConcat.reserve(2 * latentsSize); latentsConcat.insert(latentsConcat.end(), latents.begin(), latents.end()); latentsConcat.insert(latentsConcat.end(), latents.begin(), latents.end());

inside Unet.cpp and then we can pass const reference to UNet::generate, as for embeddingConcat since it is actually a const we could call make_tensor_ptr outside the for (size_t t = 0; t < numInferenceSteps + 1; t++) loop and pass that pointer to UNet.

Yeap, that's the way we should do it.

There is one more concern I have with their non data owning tensors, I am not sure they gave the guarantee that the underlying data is unmodified, but it so far it seems like it is

Ok, I educated myself and it seems that:

Data is not not-modified without our internal modification of buffer

Overloads for vector specialization are designed to serve as data-owning implementation (passing std::move(vector_data) to make_tensor_ptr)

Overload for void* in non-owning overload and here goes all pointers, references etc.

Reference: https://docs.pytorch.org/executorch/stable/extension-tensor.html#introducing-tensorptr and implementation of make_tensor_ptr

chmjkb · 2025-09-11T09:11:17Z

Also, please change the name of this PR before merging :D

msluszniak

Just resolve conflicts and I okay with the PR, great work! 😁 I don't have enough RAM to test the whole demo app, so it will be great if somebody makes a sanity testing

…xt_to_image/UNet.cpp Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>

## Description Introducing support for text-to-image tasks following the [Diffusion Pipeline](https://huggingface.co/docs/diffusers/en/using-diffusers/write_own_pipeline#deconstruct-the-stable-diffusion-pipeline). Adding the TextToImageModule and the useTextToImage hook to access the models. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [x] New feature (change which adds functionality) - [ ] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [ ] iOS - [x] Android ### Testing instructions Run the computer vision app to test image generation with the [BK-SDM-Tiny](https://huggingface.co/aszymanska/bk-sdm-tiny-vpred) model for 256×256 or 512×512 outputs. ⚠️ Testing the model requires a phone with a reasonably large amount of RAM (preferably at least 8 GB for 256 model). ### Screenshots  ### Related issues Closes #585 ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have updated the documentation accordingly - [x] My changes generate no new warnings ### Additional notes  --------- Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com> Co-authored-by: Mateusz Kopcinski <120639731+mkopcins@users.noreply.github.com>

tsnguyenducphuong · 2025-09-23T11:42:09Z

Can I already test text to image with release 0.5.6?

Looks like the import in demo app doesnt work:
import { useTextToImage, BK_SDM_TINY_VPRED_256 } from 'react-native-executorch';

jakmro · 2025-09-23T12:06:02Z

Hi @tsnguyenducphuong, unfortunately you can't - this feature will be introduced in v0.6.0

tsnguyenducphuong · 2025-09-23T12:20:55Z

Noted, thanks @jakmro

mkopcins · 2025-09-23T12:37:38Z

@tsnguyenducphuong if you are in a hurry you can build a package directly from main and use that in your app :)

## Description Introducing support for text-to-image tasks following the [Diffusion Pipeline](https://huggingface.co/docs/diffusers/en/using-diffusers/write_own_pipeline#deconstruct-the-stable-diffusion-pipeline). Adding the TextToImageModule and the useTextToImage hook to access the models. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [x] New feature (change which adds functionality) - [ ] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [ ] iOS - [x] Android ### Testing instructions Run the computer vision app to test image generation with the [BK-SDM-Tiny](https://huggingface.co/aszymanska/bk-sdm-tiny-vpred) model for 256×256 or 512×512 outputs. ⚠️ Testing the model requires a phone with a reasonably large amount of RAM (preferably at least 8 GB for 256 model). ### Screenshots  ### Related issues Closes software-mansion#585 ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have updated the documentation accordingly - [x] My changes generate no new warnings ### Additional notes  --------- Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com> Co-authored-by: Mateusz Kopcinski <120639731+mkopcins@users.noreply.github.com>

msluszniak reviewed Sep 5, 2025

View reviewed changes

msluszniak reviewed Sep 7, 2025

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/Constants.h Outdated

msluszniak reviewed Sep 8, 2025

View reviewed changes

Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated

msluszniak reviewed Sep 8, 2025

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/Scheduler.cpp Outdated

msluszniak reviewed Sep 8, 2025

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/embeddings/text/TextEmbeddings.cpp Outdated

msluszniak reviewed Sep 8, 2025

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/TextToImage.cpp Outdated

mkopcins requested changes Sep 8, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

msluszniak reviewed Sep 9, 2025

View reviewed changes

Comment thread apps/computer-vision/app/text_to_image/utils.ts Outdated

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/TextToImage.cpp Outdated

msluszniak reviewed Sep 10, 2025

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/Decoder.cpp Outdated

msluszniak reviewed Sep 10, 2025

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/TextToImage.cpp Outdated

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/TextToImage.cpp Outdated

msluszniak reviewed Sep 11, 2025

View reviewed changes

chmjkb requested changes Sep 11, 2025

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/TextToImage.cpp

a-szymanska commented Sep 12, 2025

View reviewed changes

Comment thread packages/react-native-executorch/src/modules/computer_vision/TextToImageModule.ts

a-szymanska force-pushed the @aszymanska/text-to-image branch from 54e53f7 to 8b49aae Compare September 15, 2025 09:12

a-szymanska changed the title ~~@aszymanska/text to image~~ Add text to image pipeline Sep 15, 2025

msluszniak reviewed Sep 15, 2025

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/UNet.cpp Outdated

msluszniak reviewed Sep 15, 2025

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/Scheduler.h Outdated

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/UNet.h Outdated

mkopcins changed the title ~~Add text to image pipeline~~ feat: Add text to image pipeline Sep 16, 2025

msluszniak reviewed Sep 17, 2025

View reviewed changes

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/TextToImage.cpp Outdated

msluszniak approved these changes Sep 19, 2025

View reviewed changes

a-szymanska and others added 16 commits September 22, 2025 07:53

Address changes from review

a85c258

Bulk fetch in t2i module

70cd7ab

Address changes from review

1d5d34d

Address changes from review

5dbe18e

Update packages/react-native-executorch/common/rnexecutorch/models/te…

56b347c

…xt_to_image/UNet.cpp Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>

Use dynamic shaped model, add numSteps button

66493ee

One model for different image sizes

23d8270

Minor changes

694e912

Docs and model urls

0583794

More changes

3ee98d8

Update docs

1e3b6cc

Adding seed value

c7a831b

Update docs - seed value

ac7e754

Redesign demo app

a1bbda3

Update docs and model urls

12f2c00

Minor fix in app

3e681ce

a-szymanska force-pushed the @aszymanska/text-to-image branch from a66f226 to 3e681ce Compare September 22, 2025 06:00

mkopcins approved these changes Sep 22, 2025

View reviewed changes

Update model urls

936e114

a-szymanska enabled auto-merge (squash) September 22, 2025 09:42

a-szymanska disabled auto-merge September 22, 2025 09:42

chmjkb approved these changes Sep 22, 2025

View reviewed changes

a-szymanska merged commit 82c41df into main Sep 22, 2025
3 checks passed

a-szymanska deleted the @aszymanska/text-to-image branch September 22, 2025 10:28

Conversation

a-szymanska commented Sep 5, 2025

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

msluszniak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak Sep 11, 2025 •

edited

Loading

msluszniak Sep 11, 2025 •

edited

Loading

msluszniak Sep 11, 2025 •

edited

Loading

msluszniak Sep 11, 2025 •

edited

Loading