Unofficial implementation of MIMO (MImicking anyone anywhere with complex Motions and Object interactions)
This repository offers a comprehensive pipeline for training and inference to transform character appearances and motions in videos. As part of the video-to-video generation category, this framework enables dynamic character modification with optional inputs, including an avatar photo and/or 3D animations.
demo.webm
Tests were made using:
Cuda 12.2
Python 3.10.12
torch 2.4.1
git clone [email protected]:antoinedelplace/MIMO-unofficial.git
cd MIMO-unofficial/
python3 -m venv venv
source venv/bin/activate
pip install torch torchvision torchaudio
pip install -r requirements.txt
See configs/paths.py
├── code
│ ├── MIMO-unofficial
│ ├── AnimateAnyone
│ ├── Depth-Anything-V2
│ ├── 4D-Humans
│ ├── PHALP
│ ├── detectron2
│ ├── sam2
│ ├── nvdiffrast
│ └── ProPainter
├── checkpoints
└── data
Here are the checkpoints to download:
Some checkpoints are automatically downloaded from 🤗 Hugging Face but require manual acceptance of the terms and conditions. You can accept these terms with your 🤗 Hugging Face account, then log in to the server using huggingface-cli login
. Here are the gated models:
-
For
mimo/dataset_preprocessing/pose_estimation_4DH.py
-
Download SMPL model and put it in
MIMO-unofficial
and inMIMO-unofficial/data
basicModel_neutral_lbs_10_207_0_v1.0.0.pkl -
Uninstall phalp and hmr2 packages to use the clone repositoy instead
pip uninstall phalp hmr2
- Renderer needs to be removed to avoid OpenGL errors
in
4D-Humans/hmr2/models/__init__.py
line 84:
model = HMR2.load_from_checkpoint(checkpoint_path, strict=False, cfg=model_cfg, init_renderer=False)
- Remove automatic saving files to speed up inference
in
PHALP/phalp/trackers/PHALP.py
line 264: Removejoblib.dump(final_visuals_dic, pkl_path, compress=3)
-
-
For
mimo/dataset_preprocessing/get_apose_ref.py
- If you need DWPose to extract 2D pose from an image
pip install -U openmim mim install mmengine mim install "mmcv>=2.0.1" mim install "mmdet>=3.1.0" mim install "mmpose>=1.1.0"
- in
AnimateAnyone/src/models/unet_2d_blocks.py
line 9:
from diffusers.models.transformers.dual_transformer_2d import DualTransformer2DModel
-
For
mimo/training/main.py
- in
AnimateAnyone/src/models/mutual_self_attention.py
line 48:
self.register_reference_hooks( mode, do_classifier_free_guidance, attention_auto_machine_weight, gn_auto_machine_weight, style_fidelity, reference_attn, reference_adain, batch_size=batch_size, fusion_blocks=fusion_blocks, )
- in
AnimateAnyone/src/models/unet_2d_blocks.py
line 9:
from diffusers.models.transformers.dual_transformer_2d import DualTransformer2DModel
- in
python mimo/dataset_preprocessing/video_sampling_resizing.py
python mimo/dataset_preprocessing/remove_duplicate_videos.py
python mimo/dataset_preprocessing/human_detection_detectron2.py
python mimo/dataset_preprocessing/depth_estimation.py
python mimo/dataset_preprocessing/video_tracking_sam2.py
python mimo/dataset_preprocessing/video_inpainting.py
python mimo/dataset_preprocessing/get_apose_ref.py
python mimo/dataset_preprocessing/upscale_apose_ref.py
python mimo/dataset_preprocessing/vae_encoding.py
python mimo/dataset_preprocessing/clip_embedding.py
python mimo/dataset_preprocessing/pose_estimation_4DH.py
python mimo/dataset_preprocessing/rasterizer_2d_joints.py
accelerate config
- No distributed training
- numa efficiency
- fp16
accelerate launch mimo/inference/main.py -i input_video.mp4
accelerate config
- multi-GPU
- numa efficiency
- fp16
accelerate launch mimo/training/main.py -c 1540 -t ./mimo/configs/training/cfg_phase1.yaml
accelerate launch mimo/training/main.py -c 1540 -t ./mimo/configs/training/cfg_phase2.yaml
This project is based on novitalabs/AnimateAnyone and MooreThreads/Moore-AnimateAnyone which is licensed under the Apache License 2.0. We thank to the authors of MIMO, novitalabs/AnimateAnyone and MooreThreads/Moore-AnimateAnyone, for their open research and exploration.