Skip to content

Commit

Permalink
Initial release of "mmv".
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 346305536
  • Loading branch information
derpson committed Dec 8, 2020
1 parent 7ed0b05 commit c146166
Show file tree
Hide file tree
Showing 18 changed files with 3,015 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ https://deepmind.com/research/publications/

## Projects

* [Self-Supervised MultiModal Versatile Networks](mmv), NeurIPS 2020
* [ODE-GAN: Training GANs by Solving Ordinary Differential Equations](ode_gan), NeurIPS 2020
* [Algorithms for Causal Reasoning in Probability Trees](causal_reasoning)
* [Gated Linear Networks](gated_linear_networks), NeurIPS 2020
Expand Down
83 changes: 83 additions & 0 deletions mmv/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Self-supervised Multimodal Versatile Networks

This is the code for the models in MMV - https://arxiv.org/abs/2006.16228.

<img src="./imgs/mmv_fig.png" width="50%">

We also make available the code for linear evaluation of a pre-trained model
in UCF101 and the JAX checkpoints for our best models.

We use different parameters for video compression in UCF101 than the ones
used in `tensorflow_datasets`. We provide the code to download and
preprocess the dataset. The eval_ucf101.py script reproduces the results we
report in Table 2 of the paper, using the checkpoints provided below.

Visual Backbone | Training Dataset | Results on Linear UCF101
------- | -------- | --------
S3D-G | AudioSet + HowTo | 89.6
Resnet TSM-50 | AudioSet + HowTo | 91.5
Resnet TSM-50 (x2) | AudioSet + HowTo | 91.8


## Setup

To set up a Python virtual environment with the required dependencies, run:

```shell
python3 -m venv mmv_env
source mmv_env/bin/activate
pip install --upgrade pip setuptools wheel
pip install -r mmv/requirements.txt --use-feature=2020-resolver
```


### Linear evaluation

The linear evaluation on UCF101 can be run using:

```shell
python -m mmv.eval_ucf101 \
--checkpoint_path=</path/to/the/checkpointing/folder> \
--dataset_folder=</path/to/dataset/folder>
```

## Checkpoints

We provide three checkpoints containing the best pre-trained weights for each
of the visual backbones we use in the paper, i. e., S3D-G, Resnet-50 TSM,
and Resnet-50 TSM x 2.

- [S3D-G](https://storage.googleapis.com/deepmind-research-mmv/mmv_s3d.pkl)
- [Resnet-50 TSM](https://storage.googleapis.com/deepmind-research-mmv/mmv_tsm_resnet_x1.pkl)
- [Resnet-50 TSMx2](https://storage.googleapis.com/deepmind-research-mmv/mmv_tsm_resnet_x2.pkl)

## References

### Citing our work

If you use that code for your research, please consider citing our paper:

```bibtex
@inproceedings{alayrac2020self,
title={{S}elf-{S}upervised {M}ulti{M}odal {V}ersatile {N}etworks},
author={Alayrac, Jean-Baptiste and Recasens, Adri{\`a} and Schneider, Rosalia and Arandjelovi{\'c}, Relja and Ramapuram, Jason and De Fauw, Jeffrey and Smaira, Lucas and Dieleman, Sander and Zisserman, Andrew},
booktitle={NeurIPS},
year={2020}
}
```

### Models in TF

You may also be interested in using our TF-Hub release models available at:

- [S3D-G](https://tfhub.dev/deepmind/mmv/s3d/1)
- [Resnet-50 TSM](https://tfhub.dev/deepmind/mmv/tsm-resnet50/1)
- [Resnet-50 TSMx2](https://tfhub.dev/deepmind/mmv/tsm-resnet50x2/1)

## License

While the code is licensed under the Apache 2.0 License, the checkpoints weights
are made available for non-commercial use only under the terms of the
Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
license. You can find details at:
https://creativecommons.org/licenses/by-nc/4.0/legalcode.
85 changes: 85 additions & 0 deletions mmv/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Copyright 2020 DeepMind Technologies Limited.
#
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Configuration parameters for MMV."""


def get_model_config(ckpt_path):
"""Returns the model configuration to be used with each checkpoint."""

config = {
'audio_backbone': 'resnet50',
'audio_model_kwargs': {
'bn_config': {
'create_offset': True,
'create_scale': True,
'decay_rate': 0.9,
'eps': 1.0e-5
}
},
'bn_config_proj': {
'create_offset': True,
'create_scale': True,
'decay_rate': 0.9,
'eps': 1.0e-5
},
'config_audio_text': {
'embedding_dim': 512,
'toaud_bn_after_proj': False,
'toaud_head_mode': 'linear',
'totxt_bn_after_proj': False,
'totxt_head_mode': 'linear'
},
'config_video_audio': {
'embedding_dim': 512,
'toaud_bn_after_proj': True,
'toaud_head_mode': 'mlp@512',
'tovid_bn_after_proj': False,
'tovid_head_mode': 'linear'
},
'config_video_text': {
'embedding_dim': 256,
'totxt_bn_after_proj': True,
'totxt_head_mode': 'linear',
'tovid_bn_after_proj': False,
'tovid_head_mode': 'linear'
},
'mm_embedding_graph': 'fac_relu',
'name': 'text_audio_video',
'sentence_dim': 2048,
'use_xreplica_bn': True,
'vision_model_kwargs': {
'bn_config': {
'create_offset': True,
'create_scale': True,
'decay_rate': 0.9,
'eps': 1.0e-5
},
'n_frames': 32,
'width_mult': 1,
},
}

if 's3d' in ckpt_path:
config['visual_backbone'] = 's3d'

if 'tsm_resnet_x1' in ckpt_path:
config['visual_backbone'] = 'resnet50tsm'

if 'tsm_resnet_x2' in ckpt_path:
config['visual_backbone'] = 'resnet50tsm'
config['vision_model_kwargs']['width_mult'] = 2

return config
Loading

0 comments on commit c146166

Please sign in to comment.