Conversation
msluszniak
left a comment
There was a problem hiding this comment.
First batch of comments, I'll sent more in a while
This comment was marked as resolved.
This comment was marked as resolved.
| async forward(input: string, numSteps: number): Promise<Float32Array> { | ||
| return new Float32Array(await this.nativeModule.generate(input, numSteps)); | ||
| } | ||
| } |
There was a problem hiding this comment.
You don't have to do this but FYI you can also return JSTensorViewOut from C++, similarly to what is done in forwardJS in BaseModel. This maps to the following JS type:
export interface TensorPtr {
dataPtr: TensorBuffer;
sizes: number[];
scalarType: ScalarType;
}| : BaseModel(modelSource, callInvoker), modelImageSize(modelImageSize), | ||
| numChannels(numChannels) {} | ||
|
|
||
| std::vector<float> UNet::generate(std::vector<float> &latents, int32_t timestep, |
There was a problem hiding this comment.
Why not a const ref for latents and embeddings? Is it because make_tensor_ptr accepts non-const pointers?
There was a problem hiding this comment.
But these creation of tensors still might be improved. @a-szymanska look at the available overloads of make_tensor_ptr: https://github.com/pytorch/executorch/blob/6ed10e5728e1d244f9200e0088a2e2d404355d02/extension/tensor/tensor_ptr.h. Look that, e.g. there is templated overload of creating tensor from vector, so instead of this:
auto latentsTensor =
make_tensor_ptr(latentsShape, latents.data(), ScalarType::Float);you can probably do something like this (this overload: https://github.com/pytorch/executorch/blob/6ed10e5728e1d244f9200e0088a2e2d404355d02/extension/tensor/tensor_ptr.h#L178):
auto latentsTensor = make_tensor_ptr<float>(latentsShape, latents);Also if tensor is one-dimensional, you can pass only vector, without shape, so again:
std::vector<int32_t> timestepShape = {1};
std::vector<int64_t> timestepData = {static_cast<int64_t>(timestep)};
auto timestepTensor =
make_tensor_ptr(timestepShape, timestepData.data(), ScalarType::Long);might be this (https://github.com/pytorch/executorch/blob/6ed10e5728e1d244f9200e0088a2e2d404355d02/extension/tensor/tensor_ptr.h#L255):
auto timestepTensor = make_tensor_ptr<int64_t>({static_cast<int64_t>(timestep)});There was a problem hiding this comment.
Hmm, but generally I see that these values are taken by value for these template functions so probably we can mark these vector arguments const as well. Previously better template candidate was function with void* pointer which blocked the const std::vector, @a-szymanska could you analyse these as well and mark vector arguments const if these will work for the mentioned make_tensor_ptr overloads?
There was a problem hiding this comment.
I would advise against using the vector overloads here, because they create another data copy. Actually the cleanest way would be to move creation of latentsConcat inside of the UNet class, then we can pass const reference but still use proper make_tensor_ptr overload
There was a problem hiding this comment.
But there is no latentsConcat in Unet I guess. Other than that, the second case is extremaly cheap to copy (one element) and it's probably faster then passing by reference. In the first case, indeed as I proposed it there will be a copy, but this object is not used anymore so we can move it ok, it's not so obvious, I need to check it. I don't understand what you wanted to do here, can you elaborate?
Oh, I see, move creating the latents which are now passed from TextToImage (here named latentsConcat) to UNet class. OK, that makes sense.
There was a problem hiding this comment.
In the scalar value case I don't really care, you are right, it is cheap.
For the latents argument we can move this:
std::vector<float> latentsConcat;
latentsConcat.reserve(2 * latentsSize);
latentsConcat.insert(latentsConcat.end(), latents.begin(), latents.end());
latentsConcat.insert(latentsConcat.end(), latents.begin(), latents.end());
inside Unet.cpp and then we can pass const reference to UNet::generate, as for embeddingConcat since it is actually a const we could call make_tensor_ptr outside the for (size_t t = 0; t < numInferenceSteps + 1; t++) loop and pass that pointer to UNet.
There was a problem hiding this comment.
Yeap, that's the way we should do it.
There was a problem hiding this comment.
There is one more concern I have with their non data owning tensors, I am not sure they gave the guarantee that the underlying data is unmodified, but it so far it seems like it is
There was a problem hiding this comment.
Ok, I educated myself and it seems that:
- Data is not not-modified without our internal modification of buffer
- Overloads for vector specialization are designed to serve as data-owning implementation (passing
std::move(vector_data)tomake_tensor_ptr) - Overload for
void*in non-owning overload and here goes all pointers, references etc.
Reference: https://docs.pytorch.org/executorch/stable/extension-tensor.html#introducing-tensorptr and implementation of make_tensor_ptr
|
Also, please change the name of this PR before merging :D |
54e53f7 to
8b49aae
Compare
…xt_to_image/UNet.cpp Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>
a66f226 to
3e681ce
Compare
## Description Introducing support for text-to-image tasks following the [Diffusion Pipeline](https://huggingface.co/docs/diffusers/en/using-diffusers/write_own_pipeline#deconstruct-the-stable-diffusion-pipeline). Adding the TextToImageModule and the useTextToImage hook to access the models. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [x] New feature (change which adds functionality) - [ ] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [ ] iOS - [x] Android ### Testing instructions Run the computer vision app to test image generation with the [BK-SDM-Tiny](https://huggingface.co/aszymanska/bk-sdm-tiny-vpred) model for 256×256 or 512×512 outputs.⚠️ Testing the model requires a phone with a reasonably large amount of RAM (preferably at least 8 GB for 256 model). ### Screenshots <!-- Add screenshots here, if applicable --> ### Related issues Closes #585 ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have updated the documentation accordingly - [x] My changes generate no new warnings ### Additional notes <!-- Include any additional information, assumptions, or context that reviewers might need to understand this PR. --> --------- Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com> Co-authored-by: Mateusz Kopcinski <120639731+mkopcins@users.noreply.github.com>
|
Can I already test text to image with release 0.5.6? Looks like the import in demo app doesnt work: |
|
Hi @tsnguyenducphuong, unfortunately you can't - this feature will be introduced in v0.6.0 |
|
Noted, thanks @jakmro |
|
@tsnguyenducphuong if you are in a hurry you can build a package directly from main and use that in your app :) |
## Description Introducing support for text-to-image tasks following the [Diffusion Pipeline](https://huggingface.co/docs/diffusers/en/using-diffusers/write_own_pipeline#deconstruct-the-stable-diffusion-pipeline). Adding the TextToImageModule and the useTextToImage hook to access the models. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [x] New feature (change which adds functionality) - [ ] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [ ] iOS - [x] Android ### Testing instructions Run the computer vision app to test image generation with the [BK-SDM-Tiny](https://huggingface.co/aszymanska/bk-sdm-tiny-vpred) model for 256×256 or 512×512 outputs.⚠️ Testing the model requires a phone with a reasonably large amount of RAM (preferably at least 8 GB for 256 model). ### Screenshots <!-- Add screenshots here, if applicable --> ### Related issues Closes software-mansion#585 ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have updated the documentation accordingly - [x] My changes generate no new warnings ### Additional notes <!-- Include any additional information, assumptions, or context that reviewers might need to understand this PR. --> --------- Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com> Co-authored-by: Mateusz Kopcinski <120639731+mkopcins@users.noreply.github.com>
Description
Introducing support for text-to-image tasks following the Diffusion Pipeline. Adding the TextToImageModule and the useTextToImage hook to access the models.
Introduces a breaking change?
Type of change
Tested on
Testing instructions
Run the computer vision app to test image generation with the BK-SDM-Tiny model for 256×256 or 512×512 outputs.
Screenshots
Related issues
Closes #585
Checklist
Additional notes