A Survey On Text-to-3D Contents Generation In The Wild

3D content creation plays a vital role in various applications, such asgaming, robotics simulation, and virtual reality. However, the process islabor-intensive and time-consuming, requiring skilled designers to investconsiderable effort in creating a single 3D asset. To address this challenge,text-to-3D generation technologies have emerged as a promising solution forautomating 3D creation. Leveraging the success of large vision language models,these techniques aim to generate 3D content based on textual descriptions.Despite recent advancements in this area, existing solutions still facesignificant limitations in terms of generation quality and efficiency. In thissurvey, we conduct an in-depth investigation of the latest text-to-3D creationmethods. We provide a comprehensive background on text-to-3D creation,including discussions on datasets employed in training and evaluation metricsused to assess the quality of generated 3D models. Then, we delve into thevarious 3D representations that serve as the foundation for the 3D generationprocess. Furthermore, we present a thorough comparison of the rapidly growingliterature on generative pipelines, categorizing them into feedforwardgenerators, optimization-based generation, and view reconstruction approaches.By examining the strengths and weaknesses of these methods, we aim to shedlight on their respective capabilities and limitations. Lastly, we point outseveral promising avenues for future research. With this survey, we hope toinspire researchers further to explore the potential of open-vocabularytext-conditioned 3D content creation.

Further reading