Hi, I'm trying to generate novel multi-view videos using SV4D 2.0 for scenes from the DAVIS dataset.
I then want to do 3D reconstruction per-frame on the generated views. However, I'm not getting the best results.
Do you have any tips on how to get better results with SV4D?
I'm using the GT masks from DAVIS to create RGBA frames and then the SV4D preprocessing script which extracts a square bbox for the whole video and resizes to 576x576:
The results are not great if the object moves across the frame.
Would it be better to center each frame separately so the object stays roughly in the center throughout the sequence?
Is converting to a 576x576 square input with white background still the best option?
Hi, I'm trying to generate novel multi-view videos using SV4D 2.0 for scenes from the DAVIS dataset.
I then want to do 3D reconstruction per-frame on the generated views. However, I'm not getting the best results.
Do you have any tips on how to get better results with SV4D?
I'm using the GT masks from DAVIS to create RGBA frames and then the SV4D preprocessing script which extracts a square bbox for the whole video and resizes to 576x576:
generative-models/scripts/demo/sv4d_helpers.py
Line 167 in e8cd657
The results are not great if the object moves across the frame.
Would it be better to center each frame separately so the object stays roughly in the center throughout the sequence?
Is converting to a 576x576 square input with white background still the best option?