Skip to content

Latest commit

 

History

History
82 lines (65 loc) · 4.89 KB

README.md

File metadata and controls

82 lines (65 loc) · 4.89 KB

Carve3D

Code for CVPR 2024 paper, Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning, by Desai Xie, Jiahao Li, Hao Tan, Xin Sun, Zhixin Shu, Yi Zhou, Sai Bi, Sören Pirk, Arie E. Kaufman, a collaboration between Adobe Research, Stony Brook University, TTIC, and Kiel University.

Project Website ArXiv Paper

This repository does not contain the implementation of the SDXL-based multiview diffusion model and the sparse-view large reconstruction model proposed in Instant3D. To implement the full Carve3D pipeline, an open-sourced sparse-view reconstruction model is needed, such as OpenLRM, GRM, LGM, etc. Pull requests are welcome!

Release TODOs

  • training and testing text prompt dataset
  • SDXL LoRA training code adapted from diffusers and DDPO
    • a multi-view diffusion model (e.g. MVDream), replacing Instant3D's finetuned SDXL. Pull requests are welcomed!
  • Multi-view Reconstruction Consistency (MRC) metric
    • an open-sourced multi-view/sparse-view reconstruction model. Pull requests are welcomed!

Installation

cd ddpo
pip install -e .

method

Training

bash carve3d_train.sh

Configurations

carve3d_train.py uses configuration files. The base configuration file is config/base.py. The final training config reported in the paper is in carve3d_train() in config/dgx.py.

Improvements on DDPO (Section 4.2 of the paper)

The training code carve3d_train.py, configurations in config/, and DDPO's change to diffusers library codes diffusers_patch/ are based on ddpo-pytorch. carve3d_train.py is also based on diffusers sdxl lora train example. Our improvement to DDPO include pure on-policy training, and KL-divergence regularization.

Pure On-policy Training

In the original DDPO, total sample batch size, sample.batch_size * sample.num_batches_per_epoch * num_nodes * 8, is 2x the total training batch size, train.batch_size * train.gradient_accumulation_steps * num_nodes * 8, so that in each epoch of sampling, there will be 2 updates, each on half of the samples.

Our modification is to simply set them to be equal, so that in each epoch of sampling, there will be only 1 update, using all of the samples. We didn't remove the importance sampling ratio, but I think removing it shouldn't make a difference.

KL-divergence Regularization

Our KL-divergence Regularization is enabled by default with config.kl_penalty = True, config.kl_in_reward = True, config.kl_per_prompt_stat_tracking = True, and config.kl_normalized_coeff = 0.2. The coefficient could be tuned, as a trade-off between focusing on optimizing more for higher reward or more for lower KL-divergence.

Hyperparameter Tuning

The main hyperparameters to tune are

  • KL-divergence regularization coefficient config.kl_normalized_coeff,
  • training data size config.train_size,
  • total sample batch size, sample.batch_size * sample.num_batches_per_epoch * num_nodes * 8,
  • and total train batch size, train.batch_size * train.gradient_accumulation_steps * num_nodes * 8.

Keep total sample batch size and total train batch size equal to use our pure on-policy training. Otherwise, set total sample batch size to be 2x (or 3x, etc.) of total train batch size to use the PPO multi-round update. Tune the total batch size and the training data size according to Section 4.3 of the paper.

Citation

If you find this code useful, please consider citing:

@inproceedings{xie2024carve3d,
  title={Carve3d: Improving multi-view reconstruction consistency for diffusion models with rl finetuning},
  author={Xie, Desai and Li, Jiahao and Tan, Hao and Sun, Xin and Shu, Zhixin and Zhou, Yi and Bi, Sai and Pirk, S{\"o}ren and Kaufman, Arie E},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={6369--6379},
  year={2024}
}

@inproceedings{black2024training,
title={Training Diffusion Models with Reinforcement Learning},
author={Kevin Black and Michael Janner and Yilun Du and Ilya Kostrikov and Sergey Levine},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=YCWjhGrJFD}
}