Really cool project.
I'm working on something similar (structurally at least), a manga-to-anime pipeline.
It involves a lot of different steps/models, similar to this project:
- Pre processing (alignment, upscaling, coloring).
- Separating pages into panels.
- Ordering the panels in the right reading order (took so much more effort than I thought...)
- Segmentation (using segment-anything)
- Extracting bubbles, the tails of bubbles/their vector, faces, bodies, backgrounds. Most of that necessitated training custom models.
- Assigning a character identity to each face/body.
- Making a naive association between faces and bubbles.
- Reading the text of bubbles.
- I feed all that data to GPT4-V, and ask it to "read" each panel, telling it what happened in previous panels, what bubble is associated with what face, etc, asking it to "understand" what is happening in the panel, and to "deduce" some associations between the items, the tone of voice, etc. I tried "just" asking GPT4-V to read manga pages without all the steps above, and it was terrible at it. But with all the provided info (which causes easily 10k-token prompts, just for the text), it gets much better at it. It's sort of "pre-chewing" the work for him.
- That's where I am at now, the next step is going to be generating voice (what I'm working on now, bark/whisper/other models), sound effects, and then generating animation and special effects, and finally assembling all that into video.
I'll be looking closer into your project, in particular how it's organized, thanks a lot for sharing.
I'd be curious if you have any insights on how you'd do manga reading if you had to.
Cheers!


prompt.json
prompt.txt
reading.json
response.txt
result.json


Really cool project.
I'm working on something similar (structurally at least), a manga-to-anime pipeline.
It involves a lot of different steps/models, similar to this project:
I'll be looking closer into your project, in particular how it's organized, thanks a lot for sharing.
I'd be curious if you have any insights on how you'd do manga reading if you had to.
Cheers!
prompt.json
prompt.txt
reading.json
response.txt
result.json