Skip to content

feat: Add text to image pipeline#586

Merged
a-szymanska merged 75 commits intomainfrom
@aszymanska/text-to-image
Sep 22, 2025
Merged

feat: Add text to image pipeline#586
a-szymanska merged 75 commits intomainfrom
@aszymanska/text-to-image

Conversation

@a-szymanska
Copy link
Copy Markdown
Contributor

Description

Introducing support for text-to-image tasks following the Diffusion Pipeline. Adding the TextToImageModule and the useTextToImage hook to access the models.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Run the computer vision app to test image generation with the BK-SDM-Tiny model for 256×256 or 512×512 outputs.

⚠️ Testing the model requires a phone with a reasonably large amount of RAM (preferably at least 8 GB for 256 model).

Screenshots

Related issues

Closes #585

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

Copy link
Copy Markdown
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch of comments, I'll sent more in a while

Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated
Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated
Comment thread apps/computer-vision/app/text_to_image/index.tsx Outdated
Comment thread apps/computer-vision/app/text_to_image/index.tsx Outdated
Comment thread apps/computer-vision/app/text_to_image/index.tsx Outdated
Comment thread apps/computer-vision/app/text_to_image/index.tsx Outdated
Comment thread apps/computer-vision/app/text_to_image/index.tsx Outdated
Comment thread packages/react-native-executorch/src/modules/computer_vision/TextToImageModule.ts Outdated
@msluszniak

This comment was marked as resolved.

Comment thread apps/computer-vision/app/text_to_image/utils.ts Outdated
Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/UNet.h Outdated
Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/UNet.h Outdated
Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/UNet.h Outdated
Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated
Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated
Comment thread packages/react-native-executorch/src/index.ts Outdated
Comment thread packages/react-native-executorch/src/modules/computer_vision/TextToImageModule.ts Outdated
Comment on lines +91 to +93
async forward(input: string, numSteps: number): Promise<Float32Array> {
return new Float32Array(await this.nativeModule.generate(input, numSteps));
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to do this but FYI you can also return JSTensorViewOut from C++, similarly to what is done in forwardJS in BaseModel. This maps to the following JS type:

export interface TensorPtr {
  dataPtr: TensorBuffer;
  sizes: number[];
  scalarType: ScalarType;
}

Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/UNet.h Outdated
: BaseModel(modelSource, callInvoker), modelImageSize(modelImageSize),
numChannels(numChannels) {}

std::vector<float> UNet::generate(std::vector<float> &latents, int32_t timestep,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a const ref for latents and embeddings? Is it because make_tensor_ptr accepts non-const pointers?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, that's the reason

Copy link
Copy Markdown
Member

@msluszniak msluszniak Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But these creation of tensors still might be improved. @a-szymanska look at the available overloads of make_tensor_ptr: https://github.com/pytorch/executorch/blob/6ed10e5728e1d244f9200e0088a2e2d404355d02/extension/tensor/tensor_ptr.h. Look that, e.g. there is templated overload of creating tensor from vector, so instead of this:

auto latentsTensor =
      make_tensor_ptr(latentsShape, latents.data(), ScalarType::Float);

you can probably do something like this (this overload: https://github.com/pytorch/executorch/blob/6ed10e5728e1d244f9200e0088a2e2d404355d02/extension/tensor/tensor_ptr.h#L178):

auto latentsTensor = make_tensor_ptr<float>(latentsShape, latents);

Also if tensor is one-dimensional, you can pass only vector, without shape, so again:

std::vector<int32_t> timestepShape = {1};
std::vector<int64_t> timestepData = {static_cast<int64_t>(timestep)};
  auto timestepTensor =
      make_tensor_ptr(timestepShape, timestepData.data(), ScalarType::Long);

might be this (https://github.com/pytorch/executorch/blob/6ed10e5728e1d244f9200e0088a2e2d404355d02/extension/tensor/tensor_ptr.h#L255):

auto timestepTensor = make_tensor_ptr<int64_t>({static_cast<int64_t>(timestep)});

Copy link
Copy Markdown
Member

@msluszniak msluszniak Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but generally I see that these values are taken by value for these template functions so probably we can mark these vector arguments const as well. Previously better template candidate was function with void* pointer which blocked the const std::vector, @a-szymanska could you analyse these as well and mark vector arguments const if these will work for the mentioned make_tensor_ptr overloads?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advise against using the vector overloads here, because they create another data copy. Actually the cleanest way would be to move creation of latentsConcat inside of the UNet class, then we can pass const reference but still use proper make_tensor_ptr overload

Copy link
Copy Markdown
Member

@msluszniak msluszniak Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is no latentsConcat in Unet I guess. Other than that, the second case is extremaly cheap to copy (one element) and it's probably faster then passing by reference. In the first case, indeed as I proposed it there will be a copy, but this object is not used anymore so we can move it ok, it's not so obvious, I need to check it. I don't understand what you wanted to do here, can you elaborate?

Oh, I see, move creating the latents which are now passed from TextToImage (here named latentsConcat) to UNet class. OK, that makes sense.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scalar value case I don't really care, you are right, it is cheap.
For the latents argument we can move this:

    std::vector<float> latentsConcat;
    latentsConcat.reserve(2 * latentsSize);
    latentsConcat.insert(latentsConcat.end(), latents.begin(), latents.end());
    latentsConcat.insert(latentsConcat.end(), latents.begin(), latents.end());

inside Unet.cpp and then we can pass const reference to UNet::generate, as for embeddingConcat since it is actually a const we could call make_tensor_ptr outside the for (size_t t = 0; t < numInferenceSteps + 1; t++) loop and pass that pointer to UNet.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, that's the way we should do it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one more concern I have with their non data owning tensors, I am not sure they gave the guarantee that the underlying data is unmodified, but it so far it seems like it is

Copy link
Copy Markdown
Member

@msluszniak msluszniak Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I educated myself and it seems that:

  1. Data is not not-modified without our internal modification of buffer
  2. Overloads for vector specialization are designed to serve as data-owning implementation (passing std::move(vector_data) to make_tensor_ptr)
  3. Overload for void* in non-owning overload and here goes all pointers, references etc.

Reference: https://docs.pytorch.org/executorch/stable/extension-tensor.html#introducing-tensorptr and implementation of make_tensor_ptr

@chmjkb
Copy link
Copy Markdown
Collaborator

chmjkb commented Sep 11, 2025

Also, please change the name of this PR before merging :D

@a-szymanska a-szymanska force-pushed the @aszymanska/text-to-image branch from 54e53f7 to 8b49aae Compare September 15, 2025 09:12
@a-szymanska a-szymanska changed the title @aszymanska/text to image Add text to image pipeline Sep 15, 2025
Comment thread packages/react-native-executorch/common/rnexecutorch/models/text_to_image/UNet.h Outdated
@mkopcins mkopcins changed the title Add text to image pipeline feat: Add text to image pipeline Sep 16, 2025
Copy link
Copy Markdown
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just resolve conflicts and I okay with the PR, great work! 😁 I don't have enough RAM to test the whole demo app, so it will be great if somebody makes a sanity testing

@a-szymanska a-szymanska force-pushed the @aszymanska/text-to-image branch from a66f226 to 3e681ce Compare September 22, 2025 06:00
@a-szymanska a-szymanska enabled auto-merge (squash) September 22, 2025 09:42
@a-szymanska a-szymanska merged commit 82c41df into main Sep 22, 2025
3 checks passed
@a-szymanska a-szymanska deleted the @aszymanska/text-to-image branch September 22, 2025 10:28
a-szymanska added a commit that referenced this pull request Sep 22, 2025
## Description

Introducing support for text-to-image tasks following the [Diffusion
Pipeline](https://huggingface.co/docs/diffusers/en/using-diffusers/write_own_pipeline#deconstruct-the-stable-diffusion-pipeline).
Adding the TextToImageModule and the useTextToImage hook to access the
models.

### Introduces a breaking change?

- [ ] Yes
- [x] No

### Type of change

- [ ] Bug fix (change which fixes an issue)
- [x] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

### Tested on

- [ ] iOS
- [x] Android

### Testing instructions

Run the computer vision app to test image generation with the
[BK-SDM-Tiny](https://huggingface.co/aszymanska/bk-sdm-tiny-vpred) model
for 256×256 or 512×512 outputs.

⚠️ Testing the model requires a phone with a reasonably large amount of
RAM (preferably at least 8 GB for 256 model).

### Screenshots

<!-- Add screenshots here, if applicable -->

### Related issues

Closes #585

### Checklist

- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [x] My changes generate no new warnings

### Additional notes

<!-- Include any additional information, assumptions, or context that
reviewers might need to understand this PR. -->

---------

Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>
Co-authored-by: Mateusz Kopcinski <120639731+mkopcins@users.noreply.github.com>
@tsnguyenducphuong
Copy link
Copy Markdown

Can I already test text to image with release 0.5.6?

Looks like the import in demo app doesnt work:
import { useTextToImage, BK_SDM_TINY_VPRED_256 } from 'react-native-executorch';

@jakmro
Copy link
Copy Markdown
Contributor

jakmro commented Sep 23, 2025

Hi @tsnguyenducphuong, unfortunately you can't - this feature will be introduced in v0.6.0

@tsnguyenducphuong
Copy link
Copy Markdown

Noted, thanks @jakmro

@mkopcins
Copy link
Copy Markdown
Collaborator

@tsnguyenducphuong if you are in a hurry you can build a package directly from main and use that in your app :)

KnextKoder pushed a commit to Synkhiv/react-native-executorch that referenced this pull request Nov 7, 2025
## Description

Introducing support for text-to-image tasks following the [Diffusion
Pipeline](https://huggingface.co/docs/diffusers/en/using-diffusers/write_own_pipeline#deconstruct-the-stable-diffusion-pipeline).
Adding the TextToImageModule and the useTextToImage hook to access the
models.

### Introduces a breaking change?

- [ ] Yes
- [x] No

### Type of change

- [ ] Bug fix (change which fixes an issue)
- [x] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

### Tested on

- [ ] iOS
- [x] Android

### Testing instructions

Run the computer vision app to test image generation with the
[BK-SDM-Tiny](https://huggingface.co/aszymanska/bk-sdm-tiny-vpred) model
for 256×256 or 512×512 outputs.

⚠️ Testing the model requires a phone with a reasonably large amount of
RAM (preferably at least 8 GB for 256 model).

### Screenshots

<!-- Add screenshots here, if applicable -->

### Related issues

Closes software-mansion#585

### Checklist

- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [x] My changes generate no new warnings

### Additional notes

<!-- Include any additional information, assumptions, or context that
reviewers might need to understand this PR. -->

---------

Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>
Co-authored-by: Mateusz Kopcinski <120639731+mkopcins@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image generation

6 participants