Era3D

Era3D：使用高效行级注意力的高分辨率多视图扩散

这是 Era3D：使用高效行级注意力的高分辨率多视图扩散 的官方实现。

项目页面 | Arxiv | 权重 |

https://github.com/pengHTYX/Era3D/assets/38601831/5f927a1d-c6a9-44ef-92d0-563c26a2ce75

预览图

从单张图像创建您的数字肖像

https://github.com/pengHTYX/Era3D/assets/38601831/e663005c-f8df-485e-9047-285c46b3d602

https://github.com/pengHTYX/Era3D/assets/38601831/1dbe75e6-f54a-4321-927d-3234d7568aab

安装

conda create -n Era3D python=3.9
conda activate Era3D

# torch
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

# 安装 xformers，从 https://download.pytorch.org/whl/cu118 下载
pip install xformers-0.0.23.post1-cp39-cp39-manylinux2014_x86_64.whl 

# 用于重建
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install git+https://github.com/NVlabs/nvdiffrast

# 其他依赖
pip install -r requirements.txt

权重

您可以直接从 huggingface 下载模型。您也可以在 Python 脚本中下载模型：

from huggingface_hub import snapshot_download
snapshot_download(repo_id="pengHTYX/MacLab-Era3D-512-6view", local_dir="./pengHTYX/MacLab-Era3D-512-6view/")

推理

我们通过运行 test_mvdiffusion_unclip.py 生成多视图颜色和法线图像。例如，

python test_mvdiffusion_unclip.py --config configs/test_unclip-512-6view.yaml \
    pretrained_model_name_or_path='pengHTYX/MacLab-Era3D-512-6view' \
    validation_dataset.crop_size=420 \
    validation_dataset.root_dir=examples \
    seed=600 \
    save_dir='mv_res'  \
    save_mode='rgb'

您可以调整 crop_size（400 或 420）和 seed（42 或 600）以获得某些情况下的最佳结果。

通常，我们使用 rembg 预测 alpha 通道。如果有瑕疵，请尝试使用 Clipdrop 移除背景。
Instant-NSR 网格提取

cd instant-nsr-pl
bash run.sh $GPU $CASE $OUTPUT_DIR

例如，

bash run.sh 0 A_bulldog_with_a_black_pirate_hat_rgba  recon

带纹理的网格将保存在 $OUTPUT_DIR 中。

多视图生成的 Gradio 演示

继承前人的工作，我们使用预训练的 SAM 来交互式地移除背景。

mkdir sam_pt && cd sam_pt
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
cd ..

然后，运行本地 Gradio 演示。

python app.py

许可证

本项目使用 AGPL-3.0 许可，因此任何包含我们代码或预训练模型的下游解决方案和产品都应开源以符合 AGPL 条件。如果您对 Era3D 的使用有任何疑问，请随时与我们联系。

引用

如果您发现这个代码库有用，请考虑引用我们的工作。

@article{li2024era3d,
  title={Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention},
  author={Li, Peng and Liu, Yuan and Long, Xiaoxiao and Zhang, Feihu and Lin, Cheng and Li, Mengfei and Qi, Xingqun and Zhang, Shanghang and Luo, Wenhan and Tan, Ping and others},
  journal={arXiv preprint arXiv:2405.11616},
  year={2024}
}