Initial release of "mmv".

PiperOrigin-RevId: 346305536
Hiusam · Dec 8, 2020 · c146166 · c146166
1 parent 7ed0b05
commit c146166
Show file tree

Hide file tree

Showing 18 changed files with 3,015 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -24,6 +24,7 @@ https://deepmind.com/research/publications/
 
 ## Projects
 
+*   [Self-Supervised MultiModal Versatile Networks](mmv), NeurIPS 2020
 *   [ODE-GAN: Training GANs by Solving Ordinary Differential Equations](ode_gan), NeurIPS 2020
 *   [Algorithms for Causal Reasoning in Probability Trees](causal_reasoning)
 *   [Gated Linear Networks](gated_linear_networks), NeurIPS 2020

diff --git a/mmv/README.md b/mmv/README.md
@@ -0,0 +1,83 @@
+# Self-supervised Multimodal Versatile Networks
+
+This is the code for the models in MMV - https://arxiv.org/abs/2006.16228.
+
+<img src="./imgs/mmv_fig.png" width="50%">
+
+We also make available the code for linear evaluation of a pre-trained model
+in UCF101 and the JAX checkpoints for our best models.
+
+We use different parameters for video compression in UCF101 than the ones
+used in `tensorflow_datasets`. We provide the code to download and
+preprocess the dataset. The eval_ucf101.py script reproduces the results we
+report in Table 2 of the paper, using the checkpoints provided below.
+
+Visual Backbone  | Training Dataset | Results on Linear UCF101
+------- | -------- | --------
+S3D-G | AudioSet + HowTo | 89.6
+Resnet TSM-50 | AudioSet + HowTo | 91.5
+Resnet TSM-50 (x2) | AudioSet + HowTo | 91.8
+
+
+## Setup
+
+To set up a Python virtual environment with the required dependencies, run:
+
+```shell
+python3 -m venv mmv_env
+source mmv_env/bin/activate
+pip install --upgrade pip setuptools wheel
+pip install -r mmv/requirements.txt --use-feature=2020-resolver
+```
+
+
+### Linear evaluation
+
+The linear evaluation on UCF101 can be run using:
+
+```shell
+python -m mmv.eval_ucf101 \
+  --checkpoint_path=</path/to/the/checkpointing/folder> \
+  --dataset_folder=</path/to/dataset/folder>
+```
+
+## Checkpoints
+
+We provide three checkpoints containing the best pre-trained weights for each
+of the visual backbones we use in the paper, i. e., S3D-G, Resnet-50 TSM,
+and Resnet-50 TSM x 2.
+
+- [S3D-G](https://storage.googleapis.com/deepmind-research-mmv/mmv_s3d.pkl)
+- [Resnet-50 TSM](https://storage.googleapis.com/deepmind-research-mmv/mmv_tsm_resnet_x1.pkl)
+- [Resnet-50 TSMx2](https://storage.googleapis.com/deepmind-research-mmv/mmv_tsm_resnet_x2.pkl)
+
+## References
+
+### Citing our work
+
+If you use that code for your research, please consider citing our paper:
+
+```bibtex
+@inproceedings{alayrac2020self,
+  title={{S}elf-{S}upervised {M}ulti{M}odal {V}ersatile {N}etworks},
+  author={Alayrac, Jean-Baptiste and Recasens, Adri{\`a} and Schneider, Rosalia and Arandjelovi{\'c}, Relja and Ramapuram, Jason and De Fauw, Jeffrey and Smaira, Lucas and Dieleman, Sander and Zisserman, Andrew},
+  booktitle={NeurIPS},
+  year={2020}
+}
+```
+
+### Models in TF
+
+You may also be interested in using our TF-Hub release models available at:
+
+- [S3D-G](https://tfhub.dev/deepmind/mmv/s3d/1)
+- [Resnet-50 TSM](https://tfhub.dev/deepmind/mmv/tsm-resnet50/1)
+- [Resnet-50 TSMx2](https://tfhub.dev/deepmind/mmv/tsm-resnet50x2/1)
+
+## License
+
+While the code is licensed under the Apache 2.0 License, the checkpoints weights
+are made available for non-commercial use only under the terms of the
+Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
+license. You can find details at:
+https://creativecommons.org/licenses/by-nc/4.0/legalcode.
diff --git a/mmv/config.py b/mmv/config.py
@@ -0,0 +1,85 @@
+# Copyright 2020 DeepMind Technologies Limited.
+#
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Configuration parameters for MMV."""
+
+
+def get_model_config(ckpt_path):
+  """Returns the model configuration to be used with each checkpoint."""
+
+  config = {
+      'audio_backbone': 'resnet50',
+      'audio_model_kwargs': {
+          'bn_config': {
+              'create_offset': True,
+              'create_scale': True,
+              'decay_rate': 0.9,
+              'eps': 1.0e-5
+          }
+      },
+      'bn_config_proj': {
+          'create_offset': True,
+          'create_scale': True,
+          'decay_rate': 0.9,
+          'eps': 1.0e-5
+      },
+      'config_audio_text': {
+          'embedding_dim': 512,
+          'toaud_bn_after_proj': False,
+          'toaud_head_mode': 'linear',
+          'totxt_bn_after_proj': False,
+          'totxt_head_mode': 'linear'
+      },
+      'config_video_audio': {
+          'embedding_dim': 512,
+          'toaud_bn_after_proj': True,
+          'toaud_head_mode': 'mlp@512',
+          'tovid_bn_after_proj': False,
+          'tovid_head_mode': 'linear'
+      },
+      'config_video_text': {
+          'embedding_dim': 256,
+          'totxt_bn_after_proj': True,
+          'totxt_head_mode': 'linear',
+          'tovid_bn_after_proj': False,
+          'tovid_head_mode': 'linear'
+      },
+      'mm_embedding_graph': 'fac_relu',
+      'name': 'text_audio_video',
+      'sentence_dim': 2048,
+      'use_xreplica_bn': True,
+      'vision_model_kwargs': {
+          'bn_config': {
+              'create_offset': True,
+              'create_scale': True,
+              'decay_rate': 0.9,
+              'eps': 1.0e-5
+          },
+          'n_frames': 32,
+          'width_mult': 1,
+      },
+  }
+
+  if 's3d' in ckpt_path:
+    config['visual_backbone'] = 's3d'
+
+  if 'tsm_resnet_x1' in ckpt_path:
+    config['visual_backbone'] = 'resnet50tsm'
+
+  if 'tsm_resnet_x2' in ckpt_path:
+    config['visual_backbone'] = 'resnet50tsm'
+    config['vision_model_kwargs']['width_mult'] = 2
+
+  return config