Skip to content

Small fix in inference cli: Do final multiplication in Pytorch instead#578

Open
sognetic wants to merge 1 commit into
numz:mainfrom
sognetic:main
Open

Small fix in inference cli: Do final multiplication in Pytorch instead#578
sognetic wants to merge 1 commit into
numz:mainfrom
sognetic:main

Conversation

@sognetic

@sognetic sognetic commented May 3, 2026

Copy link
Copy Markdown

Hi, first of all: Thanks for the nice software and especially the great docs, running the model is really a breeze with this tooling.

I've run this in standalone mode on a RTX 5090 server node to process a >1h video files in ~10 minute chunks and noticed noticeable slowdown (more than 10 minutes) after each chunk, with only a single core out of 192 actually doing anything. I think the culprit is doing this final transformation in numpy instead of e.g. pytorch, multiplying first in pytorch and then casting to numpy removed the bottleneck. The clamp is just to make the transformation a bit more obvious wrt. value range.
I've only tested this with a video file in a single-GPU setup on that specific machine and there might be ways to do this better (e.g. multiplying entirely on the GPU) but I think this already solves the issue sufficiently.
Let me know what you think and thanks again for the great project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant