diff --git a/README.md b/README.md
index 2f80e5970..94849fb7d 100644
--- a/README.md
+++ b/README.md
@@ -6,6 +6,8 @@ This repository contains the source code for the paper [First Order Motion Model
The videos on the left show the driving videos. The first row on the right for each dataset shows the source videos. The bottom row contains the animated sequences with motion transferred from the driving video and object taken from the source image. We trained a separate network for each task.
+### VoxCeleb Dataset
+![Screenshot](sup-mat/vox-teaser.gif)
### Fashion Dataset
![Screenshot](sup-mat/fashion-teaser.gif)
### MGIF Dataset
@@ -28,13 +30,16 @@ There are several configuration (```config/dataset_name.yaml```) files one for e
Checkpoints can be found under following link: [checkpoint](https://yadi.sk/d/lEw8uRm140L_eQ).
### Animation Demo
-
To run a demo, download checkpoint and run the following command:
```
python demo.py --config config/dataset_name.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint --relative --adapt_scale
```
The result will be stored in ```result.mp4```.
+
+### Colab Demo
+We prepare a special demo for the google-colab, see: ```demo-colab.ipynb```, also you can check ```face-swap-demo.ipynb```.
+
### Training
**Note: It is important to use pytroch==1.0.0 for training. Higher versions of pytorch have strage bilinear warping behavior, because of it model diverge.**
To train a model on specific dataset run:
@@ -77,10 +82,15 @@ In this way there are no specific requirements for the driving video and source
However this usually leads to poor performance since unrelevant details such as shape is transfered.
Check animate parameters in ```taichi-256.yaml``` to enable this mode.
+
+
2) Animation using relative coordinates: from the driving video we first estimate the relative movement of each keypoint,
then we add this movement to the absolute position of keypoints in the source image.
This keypoint along with source image is used for animation. This usually leads to better performance, however this requires
-that the object in the first frame of the video and in the source image have the same pose.
+that the object in the first frame of the video and in the source image have the same pose
+
+
+
### Datasets
@@ -90,13 +100,16 @@ that the object in the first frame of the video and in the source image have the
3) **Fashion**. Follow the instruction on dataset downloading [from](https://vision.cs.ubc.ca/datasets/fashion/).
-4) **Taichi**. Follow the instructions in [data/taichi-loading](data/taichi-loading/README.md)
+4) **Taichi**. Follow the instructions in [data/taichi-loading](data/taichi-loading/README.md) or instructions from https://github.com/AliaksandrSiarohin/video-preprocessing.
+5) **Nemo**. Please follow the [instructions](https://www.uva-nemo.org/) on how to download the dataset. Then the dataset should be preprocessed using scripts from https://github.com/AliaksandrSiarohin/video-preprocessing.
+
+6) **VoxCeleb**. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.
### Training on your own dataset
-1) Resize all the videos to the same size e.g 256x256, the videos can be in '.gif' or '.mp4' format.
-But we recommend for each video to make a separate folder with all the frames in '.png' format, because this format is loss-less, and it has better i/o performance.
+1) Resize all the videos to the same size e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images.
+We recommend the later, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.
2) Create a folder ```data/dataset_name``` with 2 subfolders ```train``` and ```test```, put training videos in the ```train``` and testing in the ```test```.
diff --git a/config/nemo-256.yaml b/config/nemo-256.yaml
new file mode 100644
index 000000000..1bd44b957
--- /dev/null
+++ b/config/nemo-256.yaml
@@ -0,0 +1,76 @@
+dataset_params:
+ root_dir: data/nemo-png
+ frame_shape: [256, 256, 3]
+ id_sampling: False
+ augmentation_params:
+ flip_param:
+ horizontal_flip: True
+ time_flip: True
+
+model_params:
+ common_params:
+ num_kp: 10
+ num_channels: 3
+ estimate_jacobian: True
+ kp_detector_params:
+ temperature: 0.1
+ block_expansion: 32
+ max_features: 1024
+ scale_factor: 0.25
+ num_blocks: 5
+ generator_params:
+ block_expansion: 64
+ max_features: 512
+ num_down_blocks: 2
+ num_bottleneck_blocks: 6
+ estimate_occlusion_map: True
+ dense_motion_params:
+ block_expansion: 64
+ max_features: 1024
+ num_blocks: 5
+ scale_factor: 0.25
+ discriminator_params:
+ scales: [1]
+ block_expansion: 32
+ max_features: 512
+ num_blocks: 4
+ sn: True
+
+train_params:
+ num_epochs: 100
+ num_repeats: 8
+ epoch_milestones: [60, 90]
+ lr_generator: 2.0e-4
+ lr_discriminator: 2.0e-4
+ lr_kp_detector: 2.0e-4
+ batch_size: 36
+ scales: [1, 0.5, 0.25, 0.125]
+ checkpoint_freq: 50
+ transform_params:
+ sigma_affine: 0.05
+ sigma_tps: 0.005
+ points_tps: 5
+ loss_weights:
+ generator_gan: 1
+ discriminator_gan: 1
+ feature_matching: [10, 10, 10, 10]
+ perceptual: [10, 10, 10, 10, 10]
+ equivariance_value: 10
+ equivariance_jacobian: 10
+
+reconstruction_params:
+ num_videos: 1000
+ format: '.mp4'
+
+animate_params:
+ num_pairs: 50
+ format: '.mp4'
+ normalization_params:
+ adapt_movement_scale: False
+ use_relative_movement: True
+ use_relative_jacobian: True
+
+visualizer_params:
+ kp_size: 5
+ draw_border: True
+ colormap: 'gist_rainbow'
diff --git a/config/vox-256.yaml b/config/vox-256.yaml
new file mode 100644
index 000000000..abfe9a239
--- /dev/null
+++ b/config/vox-256.yaml
@@ -0,0 +1,83 @@
+dataset_params:
+ root_dir: data/vox-png
+ frame_shape: [256, 256, 3]
+ id_sampling: True
+ pairs_list: data/vox256.csv
+ augmentation_params:
+ flip_param:
+ horizontal_flip: True
+ time_flip: True
+ jitter_param:
+ brightness: 0.1
+ contrast: 0.1
+ saturation: 0.1
+ hue: 0.1
+
+
+model_params:
+ common_params:
+ num_kp: 10
+ num_channels: 3
+ estimate_jacobian: True
+ kp_detector_params:
+ temperature: 0.1
+ block_expansion: 32
+ max_features: 1024
+ scale_factor: 0.25
+ num_blocks: 5
+ generator_params:
+ block_expansion: 64
+ max_features: 512
+ num_down_blocks: 2
+ num_bottleneck_blocks: 6
+ estimate_occlusion_map: True
+ dense_motion_params:
+ block_expansion: 64
+ max_features: 1024
+ num_blocks: 5
+ scale_factor: 0.25
+ discriminator_params:
+ scales: [1]
+ block_expansion: 32
+ max_features: 512
+ num_blocks: 4
+ sn: True
+
+train_params:
+ num_epochs: 100
+ num_repeats: 75
+ epoch_milestones: [60, 90]
+ lr_generator: 2.0e-4
+ lr_discriminator: 2.0e-4
+ lr_kp_detector: 2.0e-4
+ batch_size: 40
+ scales: [1, 0.5, 0.25, 0.125]
+ checkpoint_freq: 50
+ transform_params:
+ sigma_affine: 0.05
+ sigma_tps: 0.005
+ points_tps: 5
+ loss_weights:
+ generator_gan: 0
+ discriminator_gan: 1
+ feature_matching: [10, 10, 10, 10]
+ perceptual: [10, 10, 10, 10, 10]
+ equivariance_value: 10
+ equivariance_jacobian: 10
+
+reconstruction_params:
+ num_videos: 1000
+ format: '.mp4'
+
+animate_params:
+ num_pairs: 50
+ format: '.mp4'
+ normalization_params:
+ adapt_movement_scale: False
+ use_relative_movement: True
+ use_relative_jacobian: True
+
+visualizer_params:
+ kp_size: 5
+ draw_border: True
+ colormap: 'gist_rainbow'
diff --git a/config/vox-adv-256.yaml b/config/vox-adv-256.yaml
new file mode 100644
index 000000000..ed89890c8
--- /dev/null
+++ b/config/vox-adv-256.yaml
@@ -0,0 +1,84 @@
+dataset_params:
+ root_dir: data/vox-png
+ frame_shape: [256, 256, 3]
+ id_sampling: True
+ pairs_list: data/vox256.csv
+ augmentation_params:
+ flip_param:
+ horizontal_flip: True
+ time_flip: True
+ jitter_param:
+ brightness: 0.1
+ contrast: 0.1
+ saturation: 0.1
+ hue: 0.1
+
+
+model_params:
+ common_params:
+ num_kp: 10
+ num_channels: 3
+ estimate_jacobian: True
+ kp_detector_params:
+ temperature: 0.1
+ block_expansion: 32
+ max_features: 1024
+ scale_factor: 0.25
+ num_blocks: 5
+ generator_params:
+ block_expansion: 64
+ max_features: 512
+ num_down_blocks: 2
+ num_bottleneck_blocks: 6
+ estimate_occlusion_map: True
+ dense_motion_params:
+ block_expansion: 64
+ max_features: 1024
+ num_blocks: 5
+ scale_factor: 0.25
+ discriminator_params:
+ scales: [1]
+ block_expansion: 32
+ max_features: 512
+ num_blocks: 4
+ use_kp: True
+
+
+train_params:
+ num_epochs: 150
+ num_repeats: 75
+ epoch_milestones: []
+ lr_generator: 2.0e-4
+ lr_discriminator: 2.0e-4
+ lr_kp_detector: 2.0e-4
+ batch_size: 36
+ scales: [1, 0.5, 0.25, 0.125]
+ checkpoint_freq: 50
+ transform_params:
+ sigma_affine: 0.05
+ sigma_tps: 0.005
+ points_tps: 5
+ loss_weights:
+ generator_gan: 1
+ discriminator_gan: 1
+ feature_matching: [10, 10, 10, 10]
+ perceptual: [10, 10, 10, 10, 10]
+ equivariance_value: 10
+ equivariance_jacobian: 10
+
+reconstruction_params:
+ num_videos: 1000
+ format: '.mp4'
+
+animate_params:
+ num_pairs: 50
+ format: '.mp4'
+ normalization_params:
+ adapt_movement_scale: False
+ use_relative_movement: True
+ use_relative_jacobian: True
+
+visualizer_params:
+ kp_size: 5
+ draw_border: True
+ colormap: 'gist_rainbow'
diff --git a/demo.ipynb b/demo.ipynb
new file mode 100644
index 000000000..30c8c996a
--- /dev/null
+++ b/demo.ipynb
@@ -0,0 +1,18315 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "first-order-model-demo.ipynb",
+ "version": "0.3.2",
+ "provenance": [],
+ "toc_visible": true,
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "accelerator": "GPU"
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cdO_RxQZLahB",
+ "colab_type": "text"
+ },
+ "source": [
+ "# Demo for paper \"First Order Motion Model for Image Animation\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "GCDNKsEGLtR6",
+ "colab_type": "text"
+ },
+ "source": [
+ "**Clone repository and install all the requirments**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "UCMFMJV7K-ag",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 139
+ },
+ "outputId": "c9f6f763-aa0a-4032-fa83-fb5999422228"
+ },
+ "source": [
+ "!git clone https://github.com/AliaksandrSiarohin/first-order-model"
+ ],
+ "execution_count": 1,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Cloning into 'first-order-model'...\n",
+ "remote: Enumerating objects: 324, done.\u001b[K\n",
+ "remote: Counting objects: 100% (324/324), done.\u001b[K\n",
+ "remote: Compressing objects: 100% (192/192), done.\u001b[K\n",
+ "remote: Total 9896 (delta 237), reused 207 (delta 132), pack-reused 9572\u001b[K\n",
+ "Receiving objects: 100% (9896/9896), 113.89 MiB | 34.77 MiB/s, done.\n",
+ "Resolving deltas: 100% (1359/1359), done.\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "EuiQoPrEBUn6",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "!pip install -r first-order-model/requirements.txt"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ZaHs0zSPFxyr",
+ "colab_type": "text"
+ },
+ "source": [
+ "**Restart Runtime**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "PBp6l_4bBYUL",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 34
+ },
+ "outputId": "f1e174b4-a27c-4255-dde0-3e61f32f69d6"
+ },
+ "source": [
+ "cd first-order-model"
+ ],
+ "execution_count": 1,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "/content/first-order-model\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IcMX7ueZO0Oa",
+ "colab_type": "text"
+ },
+ "source": [
+ "**Mount your Google drive folder on Colab**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "tDbMA8R9OuUo",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 124
+ },
+ "outputId": "306bc643-2676-4a1f-be64-044a5932a5f7"
+ },
+ "source": [
+ "from google.colab import drive\n",
+ "drive.mount('/content/gdrive')"
+ ],
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code\n",
+ "\n",
+ "Enter your authorization code:\n",
+ "··········\n",
+ "Mounted at /content/gdrive\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VsgVK1EURXkd",
+ "colab_type": "text"
+ },
+ "source": [
+ "**Add folder https://drive.google.com/drive/folders/1kZ1gCnpfU0BnpdU47pLM_TQ6RypDDqgw?usp=sharing to your google drive.**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rW-ipQXPOWUo",
+ "colab_type": "text"
+ },
+ "source": [
+ "**Load driving video and source image**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Oxi6-riLOgnm",
+ "colab_type": "code",
+ "outputId": "cee603cf-da20-4a6b-ee01-55589f355a61",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 453
+ }
+ },
+ "source": [
+ "import imageio\n",
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "import matplotlib.animation as animation\n",
+ "from skimage.transform import resize\n",
+ "from IPython.display import HTML\n",
+ "import warnings\n",
+ "warnings.filterwarnings(\"ignore\")\n",
+ "\n",
+ "source_image = imageio.imread('/content/gdrive/My Drive/first-order-motion-model/02.png')\n",
+ "driving_video = imageio.mimread('/content/gdrive/My Drive/first-order-motion-model/04.mp4')\n",
+ "\n",
+ "\n",
+ "#Resize image and video to 256x256\n",
+ "\n",
+ "source_image = resize(source_image, (256, 256))[..., :3]\n",
+ "driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]\n",
+ "\n",
+ "def display(source, driving, generated=None):\n",
+ " fig = plt.figure(figsize=(8 + 4 * (generated is not None), 6))\n",
+ "\n",
+ " ims = []\n",
+ " for i in range(len(driving)):\n",
+ " cols = [source]\n",
+ " cols.append(driving[i])\n",
+ " if generated is not None:\n",
+ " cols.append(generated[i])\n",
+ " im = plt.imshow(np.concatenate(cols, axis=1), animated=True)\n",
+ " plt.axis('off')\n",
+ " ims.append([im])\n",
+ "\n",
+ " ani = animation.ArtistAnimation(fig, ims, interval=50, repeat_delay=1000)\n",
+ " plt.close()\n",
+ " return ani\n",
+ " \n",
+ "\n",
+ "HTML(display(source_image, driving_video).to_html5_video())"
+ ],
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 3
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xjM7ubVfWrwT",
+ "colab_type": "text"
+ },
+ "source": [
+ "**Create a model and load checkpoints**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "3FQiXqQPWt5B",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "from demo import load_checkpoints\n",
+ "generator, kp_detector = load_checkpoints(config_path='config/vox-256.yaml', \n",
+ " checkpoint_path='/content/gdrive/My Drive/first-order-motion-model/vox-cpk.pth.tar')"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "fdFdasHEj3t7",
+ "colab_type": "text"
+ },
+ "source": [
+ "**Perfrorm image animation**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "SB12II11kF4c",
+ "colab_type": "code",
+ "outputId": "2fbbb7e9-150c-480c-ef39-e9be98aa69fc",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 471
+ }
+ },
+ "source": [
+ "from demo import make_animation\n",
+ "from skimage import img_as_ubyte\n",
+ "\n",
+ "predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=True)\n",
+ "\n",
+ "#save resulting video\n",
+ "imageio.mimsave('../generated.mp4', [img_as_ubyte(frame) for frame in predictions])\n",
+ "#video can be downloaded from /content folder\n",
+ "\n",
+ "HTML(display(source_image, driving_video, predictions).to_html5_video())"
+ ],
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 211/211 [00:08<00:00, 25.28it/s]\n"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-tJN01xQCpqH",
+ "colab_type": "text"
+ },
+ "source": [
+ "**In the cell above we use relative keypoint displacement to animate the objects. We ca use absolute coordinates instead, but in this way all the object proporions will be inherited from the driving video. For example Putin haircut will be extended to match Trump haircut.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "aOE_W_kfC9aX",
+ "colab_type": "code",
+ "outputId": "f558de70-d231-466d-dec9-3108405ecb7e",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 471
+ }
+ },
+ "source": [
+ "predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=False, adapt_movement_scale=True)\n",
+ "HTML(display(source_image, driving_video, predictions).to_html5_video())"
+ ],
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 211/211 [00:08<00:00, 24.81it/s]\n"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 7
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QnXrecuX6_Kw",
+ "colab_type": "text"
+ },
+ "source": [
+ "## Running on your data\n",
+ "\n",
+ "**First we need to crop a face from both source image and video, while simple graphic editor like paint can be used for cropping from image. Cropping from video is more complicated. You can use ffpmeg for this.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "brJlA_5o72Xc",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "outputId": "72c7eec3-5c7d-406e-a5d4-2a11f57b1272"
+ },
+ "source": [
+ "!ffmpeg -i /content/gdrive/My\\ Drive/first-order-motion-model/07.mkv -ss 00:08:57.50 -t 00:00:08 -filter:v \"crop=600:600:760:50\" -async 1 hinton.mp4"
+ ],
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "ffmpeg version 3.4.6-0ubuntu0.18.04.1 Copyright (c) 2000-2019 the FFmpeg developers\n",
+ " built with gcc 7 (Ubuntu 7.3.0-16ubuntu3)\n",
+ " configuration: --prefix=/usr --extra-version=0ubuntu0.18.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared\n",
+ " libavutil 55. 78.100 / 55. 78.100\n",
+ " libavcodec 57.107.100 / 57.107.100\n",
+ " libavformat 57. 83.100 / 57. 83.100\n",
+ " libavdevice 57. 10.100 / 57. 10.100\n",
+ " libavfilter 6.107.100 / 6.107.100\n",
+ " libavresample 3. 7. 0 / 3. 7. 0\n",
+ " libswscale 4. 8.100 / 4. 8.100\n",
+ " libswresample 2. 9.100 / 2. 9.100\n",
+ " libpostproc 54. 7.100 / 54. 7.100\n",
+ "Input #0, matroska,webm, from '/content/gdrive/My Drive/first-order-motion-model/07.mkv':\n",
+ " Metadata:\n",
+ " ENCODER : Lavf57.83.100\n",
+ " Duration: 00:14:59.73, start: 0.000000, bitrate: 2343 kb/s\n",
+ " Stream #0:0(eng): Video: vp9 (Profile 0), yuv420p(tv, bt709), 1920x1080, SAR 1:1 DAR 16:9, 29.97 fps, 29.97 tbr, 1k tbn, 1k tbc (default)\n",
+ " Metadata:\n",
+ " DURATION : 00:14:59.665000000\n",
+ " Stream #0:1(eng): Audio: aac (LC), 44100 Hz, stereo, fltp (default)\n",
+ " Metadata:\n",
+ " HANDLER_NAME : SoundHandler\n",
+ " DURATION : 00:14:59.727000000\n",
+ "Stream mapping:\n",
+ " Stream #0:0 -> #0:0 (vp9 (native) -> h264 (libx264))\n",
+ " Stream #0:1 -> #0:1 (aac (native) -> aac (native))\n",
+ "Press [q] to stop, [?] for help\n",
+ "-async is forwarded to lavfi similarly to -af aresample=async=1:min_hard_comp=0.100000:first_pts=0.\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0musing SAR=1/1\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0musing cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mprofile High, level 3.1\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0m264 - core 152 r2854 e9a5903 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\n",
+ "Output #0, mp4, to 'hinton.mp4':\n",
+ " Metadata:\n",
+ " encoder : Lavf57.83.100\n",
+ " Stream #0:0(eng): Video: h264 (libx264) (avc1 / 0x31637661), yuv420p, 600x600 [SAR 1:1 DAR 1:1], q=-1--1, 29.97 fps, 30k tbn, 29.97 tbc (default)\n",
+ " Metadata:\n",
+ " DURATION : 00:14:59.665000000\n",
+ " encoder : Lavc57.107.100 libx264\n",
+ " Side data:\n",
+ " cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1\n",
+ " Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)\n",
+ " Metadata:\n",
+ " HANDLER_NAME : SoundHandler\n",
+ " DURATION : 00:14:59.727000000\n",
+ " encoder : Lavc57.107.100 aac\n",
+ "frame= 240 fps=2.9 q=-1.0 Lsize= 1301kB time=00:00:08.01 bitrate=1330.6kbits/s speed=0.0984x \n",
+ "video:1166kB audio:125kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.761764%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mframe I:1 Avg QP:22.44 size: 28019\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mframe P:62 Avg QP:23.31 size: 12894\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mframe B:177 Avg QP:28.63 size: 2068\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mconsecutive B-frames: 0.8% 1.7% 2.5% 95.0%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mmb I I16..4: 12.7% 76.2% 11.1%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mmb P I16..4: 1.9% 8.9% 1.1% P16..4: 35.3% 21.3% 10.8% 0.0% 0.0% skip:20.7%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mmb B I16..4: 0.0% 0.1% 0.0% B16..8: 39.1% 5.4% 1.0% direct: 1.4% skip:52.9% L0:35.4% L1:48.5% BI:16.2%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0m8x8 transform intra:75.2% inter:77.3%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mcoded y,uvDC,uvAC intra: 61.9% 52.1% 5.8% inter: 15.2% 6.9% 0.0%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mi16 v,h,dc,p: 69% 8% 8% 15%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mi8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 10% 19% 5% 8% 11% 8% 9% 6%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mi4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23% 8% 11% 5% 12% 21% 7% 9% 4%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mi8c dc,h,v,p: 53% 20% 19% 8%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mWeighted P-Frames: Y:21.0% UV:1.6%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mref P L0: 57.9% 21.2% 14.0% 5.9% 1.1%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mref B L0: 93.5% 5.3% 1.2%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mref B L1: 97.4% 2.6%\n",
+ "\u001b[1;36m[libx264 @ 0x562e3209e800] \u001b[0mkb/s:1192.28\n",
+ "\u001b[1;36m[aac @ 0x562e3209f700] \u001b[0mQavg: 534.430\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "NSHSxV8iGybI",
+ "colab_type": "text"
+ },
+ "source": [
+ "**Another posibility is to use some screen recording tool, or if you need to crop many images at ones use face detector(https://github.com/1adrianb/face-alignment) , see https://github.com/AliaksandrSiarohin/video-preprocessing for preprcessing of VoxCeleb.** "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "d8kQ3U7MHqh-",
+ "colab_type": "code",
+ "outputId": "2c2ba61f-5688-468f-8b69-9fb0c175064f",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 471
+ }
+ },
+ "source": [
+ "source_image = imageio.imread('/content/gdrive/My Drive/first-order-motion-model/09.png')\n",
+ "driving_video = imageio.mimread('hinton.mp4', memtest=False)\n",
+ "\n",
+ "\n",
+ "#Resize image and video to 256x256\n",
+ "\n",
+ "source_image = resize(source_image, (256, 256))[..., :3]\n",
+ "driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]\n",
+ "\n",
+ "predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=True,\n",
+ " adapt_movement_scale=True)\n",
+ "\n",
+ "HTML(display(source_image, driving_video, predictions).to_html5_video())"
+ ],
+ "execution_count": 9,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 240/240 [00:09<00:00, 24.80it/s]\n"
+ ],
+ "name": "stderr"
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 9
+ }
+ ]
+ }
+ ]
+}
diff --git a/demo.py b/demo.py
index f47aaa50a..11af4aa14 100644
--- a/demo.py
+++ b/demo.py
@@ -64,10 +64,34 @@ def make_animation(source_image, driving_video, generator, kp_detector, relative
predictions.append(np.transpose(out['prediction'].data.cpu().numpy(), [0, 2, 3, 1])[0])
return predictions
+def find_best_frame(source, driving):
+ import face_alignment
+
+ def normalize_kp(kp):
+ kp = kp - kp.mean(axis=0, keepdims=True)
+ area = ConvexHull(kp[:, :2]).volume
+ area = np.sqrt(area)
+ kp[:, :2] = kp[:, :2] / area
+ return kp
+
+ fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=True)
+ kp_source = fa.get_landmarks(255 * source)[0]
+ kp_source = normalize_kp(kp_source)
+ norm = float('inf')
+ frame_num = 0
+ for i, image in tqdm(enumerate(driving)):
+ kp_driving = fa.get_landmarks(255 * image)[0]
+ kp_driving = normalize_kp(kp_driving)
+ new_norm = (np.abs(kp_source - kp_driving) ** 2).sum()
+ if new_norm < norm:
+ norm = new_norm
+ frame_num = i
+ return frame_num
+
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument("--config", required=True, help="path to config")
- parser.add_argument("--checkpoint", default='taichi-cpk.pth.tar', help="path to checkpoint to restore")
+ parser.add_argument("--checkpoint", default='vox-cpk.pth.tar', help="path to checkpoint to restore")
parser.add_argument("--source_image", default='sup-mat/source.png', help="path to source image")
parser.add_argument("--driving_video", default='sup-mat/source.png', help="path to driving video")
@@ -76,6 +100,13 @@ def make_animation(source_image, driving_video, generator, kp_detector, relative
parser.add_argument("--relative", dest="relative", action="store_true", help="use relative or absolute keypoint coordinates")
parser.add_argument("--adapt_scale", dest="adapt_scale", action="store_true", help="adapt movement scale based on convex hull of keypoints")
+ parser.add_argument("--find_best_frame", dest="find_best_frame", action="store_true",
+ help="Generate from the frame that is the most alligned with source. (Only for faces, requires face_aligment lib)")
+
+ parser.add_argument("--best_frame", dest="best_frame", type=int, default=None,
+ help="Set frame to start from.")
+
+
parser.set_defaults(relative=False)
parser.set_defaults(adapt_scale=False)
@@ -91,6 +122,15 @@ def make_animation(source_image, driving_video, generator, kp_detector, relative
driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]
generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint)
- predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale)
+ if opt.find_best_frame or opt.best_frame is not None:
+ i = opt.best_frame if opt.best_frame is not None else find_best_frame(source_image, driving_video)
+ print (i)
+ driving_forward = driving_video[i:]
+ driving_backward = driving_video[:(i+1)][::-1]
+ predictions_forward = make_animation(source_image, driving_forward, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale)
+ predictions_backward = make_animation(source_image, driving_backward, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale)
+ predictions = predictions_backward[::-1] + predictions_forward
+ else:
+ predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale)
imageio.mimsave(opt.result_video, predictions, fps=fps)
diff --git a/sup-mat/absolute-demo.gif b/sup-mat/absolute-demo.gif
new file mode 100644
index 000000000..113bd0099
Binary files /dev/null and b/sup-mat/absolute-demo.gif differ
diff --git a/sup-mat/relative-demo.gif b/sup-mat/relative-demo.gif
new file mode 100644
index 000000000..9a635cf2c
Binary files /dev/null and b/sup-mat/relative-demo.gif differ
diff --git a/sup-mat/vox-teaser.gif b/sup-mat/vox-teaser.gif
new file mode 100644
index 000000000..fc0ce3de2
Binary files /dev/null and b/sup-mat/vox-teaser.gif differ