diff --git a/docs/index.md b/docs/index.md index f43ec385a1..c17f37535e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -154,6 +154,7 @@ This documentation is organized into 3 parts: - [SIGNeRF](nerfology/methods/signerf.md): Controlled Generative Editing of NeRF Scenes - [K-Planes](nerfology/methods/kplanes.md): Unified 3D and 4D Radiance Fields - [LERF](nerfology/methods/lerf.md): Language Embedded Radiance Fields +- [LiveScene](nerfology/methods/livescene.md): Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control - [Feature Splatting](nerfology/methods/feature_splatting.md): Gaussian Feature Splatting based on GSplats - [Nerfbusters](nerfology/methods/nerfbusters.md): Removing Ghostly Artifacts from Casually Captured NeRFs - [NeRFPlayer](nerfology/methods/nerfplayer.md): 4D Radiance Fields by Streaming Feature Channels @@ -161,7 +162,7 @@ This documentation is organized into 3 parts: - [PyNeRF](nerfology/methods/pynerf.md): Pyramidal Neural Radiance Fields - [SeaThru-NeRF](nerfology/methods/seathru_nerf.md): Neural Radiance Field for subsea scenes - [Zip-NeRF](nerfology/methods/zipnerf.md): Anti-Aliased Grid-Based Neural Radiance Fields -- [NeRFtoGSandBack](nerfology/methods/nerf2gs2nerf.md): Converting back and forth between NeRF and GS to get the best of both approaches. +- [NeRFtoGSandBack](nerfology/methods/nerf2gs2nerf.md): Converting back and forth between NeRF and GS to get the best of both approaches - [OpenNeRF](nerfology/methods/opennerf.md): OpenSet 3D Neural Scene Segmentation **Eager to contribute a method?** We'd love to see you use nerfstudio in implementing new (or even existing) methods! Please view our {ref}`guide` for more details about how to add to this list! diff --git a/docs/nerfology/methods/index.md b/docs/nerfology/methods/index.md index 320d6ae97f..98c1367a03 100644 --- a/docs/nerfology/methods/index.md +++ b/docs/nerfology/methods/index.md @@ -34,6 +34,7 @@ The following methods are supported in nerfstudio: SIGNeRF K-Planes LERF + LiveScene Feature-Splatting Mip-NeRF NeRF diff --git a/docs/nerfology/methods/livescene.md b/docs/nerfology/methods/livescene.md new file mode 100644 index 0000000000..3a7f03f045 --- /dev/null +++ b/docs/nerfology/methods/livescene.md @@ -0,0 +1,101 @@ +# LiveScene + +

Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control

+ +```{button-link} https://tavish9.github.io/livescene// +:color: primary +:outline: +Paper Website +``` + +```{button-link} https://github.com/Tavish9/livescene/ +:color: primary +:outline: +Code +``` + + + +**The first scene-level language-embedded interactive radiance field, which efficiently reconstructs and controls complex physical scenes, enabling manipulation of multiple articulated objects and language-based interaction.** + +## Installation + +First install nerfstudio dependencies. Then run: + +```bash +pip install git+https://github.com/Tavish9/livescene +``` + +## Running LiveScene + +Details for running LiveScene (built with Nerfstudio!) can be found [here](https://github.com/Tavish9/livescene). +Once installed, run: + +```bash +ns-train livescene --help +``` + +There is only one default configuration provided. However, you can run it for different datasets. + +The default configurations provided is: + +| Method | Description | Memory | Quality | +| ----------- | ----------------------------------------------- | ------ | ------- | +| `livescene` | LiveScene with OpenCLIP ViT-B/16, used in paper | ~8 GB | Good | + +There are two new dataparser provider for LiveScene: + +| Method | Description | Scene type | +| ---------------- | ------------------------------- | ----------------- | +| `livescene-sim` | OmniSim dataset for LiveScene | Synthetic dataset | +| `livescene-real` | InterReal dataset for LiveScene | Real dataset | + +## Method + +LiveScene proposes an efficient factorization that decomposes the interactive scene into multiple local deformable fields to separately reconstruct individual interactive objects, achieving the first accurate and independent control on multiple interactive objects in a complex scene. Moreover, LiveScene introduces an interaction-aware language embedding method that generates varying language embeddings to localize individual interactive objects under different interactive states, enabling arbitrary control of interactive objects using natural language. + +### Overview + +Given a camera view and control variable $\boldsymbol{\kappa}$ of one specific interactive object, a series 3D points are sampled in a local deformable field that models the interactive motions of this specific interactive object, and then the interactive object with novel interactive motion state is generated via volume-rendering. Moreover, an interaction-aware language embedding is utilized to localize and control individual interactive objects using natural language. + + + +### Multi-scale Interaction Space Factorization + +LiveScene maintains mutiple local deformable fields $\left \{\mathcal{R}_1, \mathcal{R}\_2, \cdots \mathcal{R}_\alpha \right \}$ for each interactive object in the 4D space, and project high-dimensional interaction features into a compact multi-scale 4D space. In training, LiveScene denotes a feature repulsion loss and to amplify the feature differences between distinct deformable scenes, which relieve the boundary ray sampling and feature storage conflicts. + + + +### Interaction-Aware Language Embedding + +LiveScene Leverages the proposed multi-scale interaction space factorization to efficiently store language features in lightweight planes by indexing the maximum probability sampling instead of 3D fields in LERF. For any sampling point $\mathbf{p}$, it retrieves local language feature group, and perform bilinear interpolation to obtain a language embedding that adapts to interactive variable changes from surrounding clip features. + + + +## Dataset + +To our knowledge, existing view synthetic datasets for interactive scene rendering are primarily limited to a few interactive objects, making it impractical to scale up to real scenarios involving multi-object interactions. To bridge this gap, we construct two scene-level, high-quality annotated datasets to advance research progress in reconstructing and understanding interactive scenes: OminiSim and InterReal, containing 28 subsets and 70 interactive objects with 2 million samples, providing rgbd images, camera trajectories, interactive object masks, prompt captions, and corresponding object state quantities at each time step. + + + +## Interaction + +For more interaction with viewer, please see [here](https://github.com/Tavish9/livescene?tab=readme-ov-file#3-interact-with-viewer). + +## BibTeX + +If you find our work helpful for your research, please consider citing + +```none +@misc{livescene2024, + title={LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control}, + author={Delin Qu, Qizhi Chen, Pingrui Zhang, Xianqiang Gao, Bin Zhao, Zhigang Wang, Dong Wang†, Xuelong Li†}, + year={2024}, + eprint={2406.16038}, + archivePrefix={arXiv}, + } +``` diff --git a/nerfstudio/configs/external_methods.py b/nerfstudio/configs/external_methods.py index 002b3299b6..fdf1be7429 100644 --- a/nerfstudio/configs/external_methods.py +++ b/nerfstudio/configs/external_methods.py @@ -93,6 +93,21 @@ class ExternalMethod: ) ) +# LiveScene +external_methods.append( + ExternalMethod( + """[bold yellow]LiveScene[/bold yellow] +For more information visit: https://docs.nerf.studio/nerfology/methods/livescene.html + +To enable LiveScene, you must install it first by running: + [grey]pip install git+https://github.com/Tavish9/livescene[/grey]""", + configurations=[ + ("livescene", "LiveScene with OpenCLIP ViT-B/16, used in paper"), + ], + pip_package="git+https://github.com/Tavish9/livescene", + ) +) + # Feature Splatting external_methods.append( ExternalMethod(