I have a small question about the per-box generation process.
I'm curious why it isn't possible to generate multiple objects simultaneously.
For example, let's say we have two boxes, A and B, in an image. Couldn't we allow box A attend to object a and box B attend to object b?
Is there a significant difference in quality compared to the suggested "per-box generation" approach?
If we can generate all objects simultaneously, we don't even have to perform the DDIM inversion process
I have a small question about the per-box generation process.
I'm curious why it isn't possible to generate multiple objects simultaneously.
For example, let's say we have two boxes, A and B, in an image. Couldn't we allow box A attend to object a and box B attend to object b?
Is there a significant difference in quality compared to the suggested "per-box generation" approach?
If we can generate all objects simultaneously, we don't even have to perform the DDIM inversion process