Skip to content

Latest commit

 

History

History
160 lines (135 loc) · 6.45 KB

README.md

File metadata and controls

160 lines (135 loc) · 6.45 KB

MIMO-unofficial

Unofficial implementation of MIMO (MImicking anyone anywhere with complex Motions and Object interactions)

My blog post: Medium Original paper: arXiv

🎯 Overview

This repository offers a comprehensive pipeline for training and inference to transform character appearances and motions in videos. As part of the video-to-video generation category, this framework enables dynamic character modification with optional inputs, including an avatar photo and/or 3D animations.

demo.webm

⚒️ Installation and Setup

Tests were made using:

Cuda 12.2
Python 3.10.12
torch 2.4.1

Clone repository and Install requirements

git clone [email protected]:antoinedelplace/MIMO-unofficial.git
cd MIMO-unofficial/
python3 -m venv venv
source venv/bin/activate
pip install torch torchvision torchaudio
pip install -r requirements.txt

Folder architecture

See configs/paths.py

├── code
│   ├── MIMO-unofficial
│   ├── AnimateAnyone
│   ├── Depth-Anything-V2
│   ├── 4D-Humans
│   ├── PHALP
│   ├── detectron2
│   ├── sam2
│   ├── nvdiffrast
│   └── ProPainter
├── checkpoints
└── data

Clone external repositories and Install requirements

Download checkpoints

Here are the checkpoints to download:

Some checkpoints are automatically downloaded from 🤗 Hugging Face but require manual acceptance of the terms and conditions. You can accept these terms with your 🤗 Hugging Face account, then log in to the server using huggingface-cli login. Here are the gated models:

Extra steps

  • For mimo/dataset_preprocessing/pose_estimation_4DH.py

    pip uninstall phalp hmr2
    • Renderer needs to be removed to avoid OpenGL errors in 4D-Humans/hmr2/models/__init__.py line 84:
    model = HMR2.load_from_checkpoint(checkpoint_path, strict=False, cfg=model_cfg, init_renderer=False)
    • Remove automatic saving files to speed up inference in PHALP/phalp/trackers/PHALP.py line 264: Remove joblib.dump(final_visuals_dic, pkl_path, compress=3)
  • For mimo/dataset_preprocessing/get_apose_ref.py

    • If you need DWPose to extract 2D pose from an image
    pip install -U openmim
    mim install mmengine
    mim install "mmcv>=2.0.1"
    mim install "mmdet>=3.1.0"
    mim install "mmpose>=1.1.0"
    • in AnimateAnyone/src/models/unet_2d_blocks.py line 9:
    from diffusers.models.transformers.dual_transformer_2d import DualTransformer2DModel
  • For mimo/training/main.py

    • in AnimateAnyone/src/models/mutual_self_attention.py line 48:
        self.register_reference_hooks(
            mode,
            do_classifier_free_guidance,
            attention_auto_machine_weight,
            gn_auto_machine_weight,
            style_fidelity,
            reference_attn,
            reference_adain,
            batch_size=batch_size,
            fusion_blocks=fusion_blocks,
        )
    • in AnimateAnyone/src/models/unet_2d_blocks.py line 9:
    from diffusers.models.transformers.dual_transformer_2d import DualTransformer2DModel

🚀 Run scripts

Dataset Preprocessing

  1. python mimo/dataset_preprocessing/video_sampling_resizing.py
  2. python mimo/dataset_preprocessing/remove_duplicate_videos.py
  3. python mimo/dataset_preprocessing/human_detection_detectron2.py
  4. python mimo/dataset_preprocessing/depth_estimation.py
  5. python mimo/dataset_preprocessing/video_tracking_sam2.py
  6. python mimo/dataset_preprocessing/video_inpainting.py
  7. python mimo/dataset_preprocessing/get_apose_ref.py
  8. python mimo/dataset_preprocessing/upscale_apose_ref.py
  9. python mimo/dataset_preprocessing/vae_encoding.py
  10. python mimo/dataset_preprocessing/clip_embedding.py
  11. python mimo/dataset_preprocessing/pose_estimation_4DH.py
  12. python mimo/dataset_preprocessing/rasterizer_2d_joints.py

Inference

accelerate config
    - No distributed training
    - numa efficiency
    - fp16

accelerate launch mimo/inference/main.py -i input_video.mp4

Training

accelerate config
   - multi-GPU
   - numa efficiency
   - fp16

accelerate launch mimo/training/main.py -c 1540 -t ./mimo/configs/training/cfg_phase1.yaml
accelerate launch mimo/training/main.py -c 1540 -t ./mimo/configs/training/cfg_phase2.yaml

🙏🏻 Acknowledgements

This project is based on novitalabs/AnimateAnyone and MooreThreads/Moore-AnimateAnyone which is licensed under the Apache License 2.0. We thank to the authors of MIMO, novitalabs/AnimateAnyone and MooreThreads/Moore-AnimateAnyone, for their open research and exploration.