Any tips for generating better multi-view consistent videos with SV4D 2.0 ?

Hi, I'm trying to generate novel multi-view videos using SV4D 2.0 for scenes from the DAVIS dataset.
I then want to do 3D reconstruction per-frame on the generated views. However, I'm not getting the best results.

Do you have any tips on how to get better results with SV4D? 
I'm using the GT masks from DAVIS to create RGBA frames and then the SV4D preprocessing script which extracts a square bbox for the whole video and resizes to 576x576: https://github.com/Stability-AI/generative-models/blob/e8cd657656fa5d61688191730d0e03242bf4ed44/scripts/demo/sv4d_helpers.py#L167

The results are not great if the object moves across the frame.
Would it be better to center each frame separately so the object stays roughly in the center throughout the sequence?
Is converting to a 576x576 square input with white background still the best option?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any tips for generating better multi-view consistent videos with SV4D 2.0 ? #483

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Any tips for generating better multi-view consistent videos with SV4D 2.0 ? #483

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions