We generate multiple independent interactive 3D objects in a coarse-to-fine manner. Initially, we render a view of the input text-to-3D NeRF for Deep Concept Mining (DCM), obtaining both the T2I diffusion model and the corresponding text embedding. We then use the mined embedding and the T2I diffusion model to train the neural category field (NeCF) using category score distillation sampling (CSDS). After disentangling the input NeRF, we convert the sub-NeRFs into DMTets and fine-tune these for further refinement. Finally, we export independent surface meshes with improved geometries and textures.
There were a lot of excellent works that were introduced around the same time as ours.
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
@article{yan2024dreamdissector,
author = {Zizheng, Yan and Jiapeng, Zhou and Fanpeng, Meng and Yushuang, Wu and Lingteng, Qiu and Zisheng, Ye and Shuguang, Cui and Guanying, Chen and Xiaoguang, Han},
title = {DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors},
journal = {ECCV},
year = {2024},
}