deepseek-ai/DreamCraft3D
Python
Captured source
source ↗deepseek-ai/DreamCraft3D
Description: [ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Language: Python
License: MIT
Stars: 3006
Forks: 357
Open issues: 35
Created: 2023-10-23T07:40:20Z
Pushed: 2025-04-22T11:09:39Z
Default branch: main
Fork: no
Archived: no
README:
DreamCraft3D
**Paper** | **Project Page** | **Youtube video** | **Replicate demo**
Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, Yebin Liu
Abstract: *We present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.*
News
- 2024.10: We release DreamCraft3D++, featuring significantly enhanced 3D generation quality and efficiency.
Method Overview
Installation
Install threestudio
This part is the same as original threestudio. Skip it if you already have installed the environment.
See [installation.md](docs/installation.md) for additional information, including installation via Docker.
- You must have an NVIDIA graphics card with at least 20GB VRAM and have CUDA installed.
- Install
Python >= 3.8. - (Optional, Recommended) Create a virtual environment:
python3 -m virtualenv venv . venv/bin/activate # Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x. # For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later. python3 -m pip install --upgrade pip
- Install
PyTorch >= 1.12. We have tested ontorch1.12.1+cu113andtorch2.0.0+cu118, but other versions should also work fine.
# torch1.12.1+cu113 pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 # or torch2.0.0+cu118 pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
- (Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:
pip install ninja
- Install dependencies:
pip install -r requirements.txt
Download pre-trained models
- Zero123. We use the newest
stable-zero123.ckptby default. You can download it here intoload/zero123/. In the paper we usezero123-xl.ckptand you can download it by
cd load/zero123 bash download.sh
- Omnidata. We use Omnidata for depth and normal predition in
preprocess_image.py(copyed from stable-dreamfusion).
cd load/omnidata gdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI&confirm=t' # omnidata_dpt_depth_v2.ckpt gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' # omnidata_dpt_normal_v2.ckpt
Quickstart
Preprocess the input image to move background and obtain its depth and normal image.
python preprocess_image.py /path/to/image.png --recenter
Our model is trained in multiple stages. You can run it by
prompt="a brightly colored mushroom growing on a log" image_path="load/images/mushroom_log_rgba.png" # --------- Stage 1 (NeRF & NeuS) --------- # python launch.py --config configs/dreamcraft3d-coarse-nerf.yaml --train system.prompt_processor.prompt="$prompt" data.image_path="$image_path" ckpt=outputs/dreamcraft3d-coarse-nerf/$prompt@LAST/ckpts/last.ckpt python launch.py --config configs/dreamcraft3d-coarse-neus.yaml --train system.prompt_processor.prompt="$prompt" data.image_path="$image_path" system.weights="$ckpt" # --------- Stage 2 (Geometry Refinement) --------- # ckpt=outputs/dreamcraft3d-coarse-neus/$prompt@LAST/ckpts/last.ckpt python launch.py --config configs/dreamcraft3d-geometry.yaml --train system.prompt_processor.prompt="$prompt" data.image_path="$image_path" system.geometry_convert_from="$ckpt" # --------- Stage 3 (Texture Refinement) --------- # ckpt=outputs/dreamcraft3d-geometry/$prompt@LAST/ckpts/last.ckpt python launch.py --config configs/dreamcraft3d-texture.yaml --train system.prompt_processor.prompt="$prompt" data.image_path="$image_path" system.geometry_convert_from="$ckpt"
[Optional] If the "Janus problem" arises in Stage 1, consider training a custom Text2Image model.
First, generate multi-view images from a single reference image by Zero123++.
python threestudio/scripts/img_to_mv.py --image_path 'load/mushroom.png' --save_path '.cache/temp' --prompt 'a photo of mushroom' --superres
Train a personalized DeepFloyd…
Excerpt shown — open the source for the full document.