Magic3D is a new text-to-3D content creation tool that
creates 3D mesh models with unprecedented quality.
Together with image conditioning techniques as well as
prompt-based editing approach, we provide users with new ways to control
3D synthesis, opening up new avenues to various creative applications.
(best viewed with Google Chrome on a desktop/laptop)
Video
High-Resolution 3D Meshes
Magic3D can create high-quality 3D textured mesh models from input text prompts.
It utilizes a coarse-to-fine strategy that leverages both low- and highresolution diffusion priors for learning the 3D representation of the target content.
Magic3D synthesizes 3D content with 8× higher-resolution supervision than DreamFusion while also being 2× faster.
[...] indicates helper captions added to improve quality, e.g. "A DSLR photo of".
Videos are best viewed with Google Chrome.
[...] a silver candelabra sitting on a red velvet tablecloth, only one candle is lit.
[...] Sydney opera house, aerial view.
Michelangelo style statue of an astronaut.
Prompt-based Editing
Given a coarse model generated with a base text prompt, we can modify parts of the text in the prompt, and then fine-tune the NeRF and 3D mesh models to obtain an edited high-resolution 3D mesh.
A squirrel wearing a leather jacket riding a motorcycle.
A bunny riding a scooter.
A fairy riding a bike.
A steampunk squirrel riding a horse.
A baby bunny sitting on top of a stack of pancakes.
A lego bunny sitting on top of a stack of books.
A metal bunny sitting on top of a stack of broccoli.
A metal bunny sitting on top of a stack of chocolate cookies.
Other Editing Capabilities
Given input images for a subject instance, we can fine-tune the diffusion models with DreamBooth and optimize the 3D models with the given prompts.
The identity of the subject can be well-preserved in the 3D models.
We can also condition the diffusion model (eDiff-I) on an input image to transfer its style to the output 3D model.
Method
We utilize a two-stage coarse-to-fine optimization framework for fast and high-quality text-to-3D content creation.
In the first stage, we obtain a coarse model using a low-resolution diffusion prior and accelerate this with a hash grid and sparse acceleration structure.
In the second stage, we use a textured mesh model initialized from the coarse neural representation, allowing optimization with an efficient differentiable renderer interacting with a high-resolution latent diffusion model.
Citation
@article{lin2022magic3d,
title={Magic3D: High-Resolution Text-to-3D Content Creation},
author={Lin, Chen-Hsuan and Gao, Jun and Tang, Luming and Takikawa, Towaki and Zeng, Xiaohui and Huang, Xun and Kreis, Karsten and Fidler, Sanja and Liu, Ming-Yu and Lin, Tsung-Yi},
journal={arXiv preprint arXiv:2211.10440},
year={2022}
}