Quality of scene level consistency

Hi, thank you for open-sourcing the great work. I really enjoyed reading it.


I have a question regarding scene-level consistency under occlusion. In the demo figures in your paper, many objects appear to be cleanly visible. However, in real-world scenarios, occlusions are quite common. For instance, in the example with the chair heavily occluded by the toy bear, it is not entirely clear how well the method handles such cases. (picture is from SAM3D)
<img width="1425" height="950" alt="Image" src="https://github.com/user-attachments/assets/74a58c84-bc65-41eb-8520-cc959b546490" />

<img width="523" height="456" alt="Image" src="https://github.com/user-attachments/assets/031773b6-2d0e-4267-9215-f69d9c028a29" />

I was wondering whether this limitation might be related to the underlying assumptions in methods like VGGT and TRELLIS, since they do not explicitly address amodal segmentation (i.e., reasoning about the full extent of partially occluded objects).

I’m not raising this as criticism. Instead, I’m genuinely interested in understanding the current best practices. Specifically:

1. What are some recommended approaches for multi-view 3D reconstruction with strong scene-level consistency under occlusion?
2. Do you think recent works like multi-view SAM3D (e.g., the arXiv version) [paper](https://arxiv.org/pdf/2603.11633) move in this direction, or is this still an open challenge?

I would really appreciate any insights or pointers you could share. Thanks again for your great work!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quality of scene level consistency #21

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Quality of scene level consistency #21

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions