-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added readme file for mlpmizer, perceiver io and vit (#94)
* added readme file for mlpmizer, perceiver io and vit * updated readme files * updated readme files
- Loading branch information
1 parent
08c8828
commit a6b9b06
Showing
3 changed files
with
258 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo.png?raw=true#gh-light-mode-only | ||
:width: 100% | ||
:class: only-light | ||
|
||
.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo_dark.png?raw=true#gh-dark-mode-only | ||
:width: 100% | ||
:class: only-dark | ||
|
||
|
||
.. raw:: html | ||
|
||
<br/> | ||
<a href="https://pypi.org/project/ivy-models"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://badge.fury.io/py/ivy-models.svg"> | ||
</a> | ||
<a href="https://github.com/unifyai/models/actions?query=workflow%3Adocs"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/docs.yml/badge.svg"> | ||
</a> | ||
<a href="https://github.com/unifyai/models/actions?query=workflow%3Anightly-tests"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/nightly-tests.yml/badge.svg"> | ||
</a> | ||
<a href="https://discord.gg/G4aR9Q7DTN"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://img.shields.io/discord/799879767196958751?color=blue&label=%20&logo=discord&logoColor=white"> | ||
</a> | ||
<br clear="all" /> | ||
|
||
MLP-Mixer | ||
=========== | ||
|
||
`MLP-Mixer <https://arxiv.org/abs/2105.01601>`_ is based entirely on multi-layer perceptrons (MLPs), which are a type of neural network that consists of a stack of linear layers and | ||
non-linear activation functions. | ||
|
||
The main idea behind MLP-Mixer is that MLPs can be used to learn spatial and channel mixing functions that can be used to extract features from images. | ||
MLP-Mixer achieves this by stacking two types of layers. These are the patch mixing layers and the channel mixing layers. | ||
The patch mixing layers apply MLPs to each patch of the image, independently of the other patches. This allows MLP-Mixer to learn spatial mixing functions that can | ||
capture the relationships between different patches in the image. | ||
The channel mixing layers on the otherhand apply MLPs to the entire image, across all channels. This allows MLP-Mixer to learn channel mixing functions that can | ||
capture the relationships between different channels in the image. | ||
|
||
|
||
Getting started | ||
----------------- | ||
|
||
.. code-block:: python | ||
!pip install huggingface_hub | ||
import ivy | ||
from ivy_models.mlpmixer import mlpmixer | ||
ivy.set_backend("torch") | ||
# Instantiate mlpmixer model | ||
ivy_mlpmixer = mlpmixer(pretrained=True) | ||
The pretrained mlpmixer model is now ready to be used, and is compatible with any other PyTorch code | ||
|
||
Citation | ||
-------- | ||
|
||
:: | ||
|
||
@article{ | ||
title={MLP-Mixer: An all-MLP Architecture for Vision}, | ||
author={ | ||
Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, | ||
Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic and Alexey Dosovitskiy | ||
}, | ||
journal={arXiv preprint arXiv:2105.01601}, | ||
year={2021} | ||
} | ||
|
||
|
||
@article{lenton2021ivy, | ||
title={Ivy: Templated deep learning for inter-framework portability}, | ||
author={Lenton, Daniel and Pardo, Fabio and Falck, Fabian and James, Stephen and Clark, Ronald}, | ||
journal={arXiv preprint arXiv:2102.02886}, | ||
year={2021} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo.png?raw=true#gh-light-mode-only | ||
:width: 100% | ||
:class: only-light | ||
|
||
.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo_dark.png?raw=true#gh-dark-mode-only | ||
:width: 100% | ||
:class: only-dark | ||
|
||
|
||
.. raw:: html | ||
|
||
<br/> | ||
<a href="https://pypi.org/project/ivy-models"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://badge.fury.io/py/ivy-models.svg"> | ||
</a> | ||
<a href="https://github.com/unifyai/models/actions?query=workflow%3Adocs"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/docs.yml/badge.svg"> | ||
</a> | ||
<a href="https://github.com/unifyai/models/actions?query=workflow%3Anightly-tests"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/nightly-tests.yml/badge.svg"> | ||
</a> | ||
<a href="https://discord.gg/G4aR9Q7DTN"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://img.shields.io/discord/799879767196958751?color=blue&label=%20&logo=discord&logoColor=white"> | ||
</a> | ||
<br clear="all" /> | ||
|
||
Perceiver IO | ||
=========== | ||
|
||
`Perceiver IO <https://arxiv.org/abs/2107.14795>`_ is based on the Perceiver architecture, which was originally proposed by Google AI in 2021. Perceiver IO extends the Perceiver architecture | ||
by adding a new module called the Querying Module. The Querying Module allows Perceiver IO to produce outputs of arbitrary size and semantics, | ||
which makes it a more general-purpose architecture than the Perceiver. | ||
|
||
The Perceiver IO architecture consists of three main modules. These are the reading module which takes the input data and encodes it into a latent space, | ||
the processing module which refines the latent representation learned by the reading module and the querying module which takes the latent | ||
representation from the Processing Module and produces outputs of arbitrary size and semantics. | ||
|
||
The Querying Module is the key innovation of Perceiver IO. It works by first constructing a query vector for each output element. | ||
The query vector is a representation of the desired output element, and it is constructed using the output-specific features. | ||
The Querying Module then uses a self-attention mechanism to attend to the latent representation, and it produces the output element by combining | ||
the latent representation with the query vector. | ||
|
||
Getting started | ||
----------------- | ||
|
||
.. code-block:: python | ||
import ivy | ||
from ivy_models.transformers.perceiver_io import ( | ||
PerceiverIOSpec, | ||
perceiver_io_img_classification, | ||
) | ||
ivy.set_backend("torch") | ||
# params | ||
input_dim = 3 | ||
num_input_axes = 2 | ||
output_dim = 1000 | ||
batch_shape = [1] | ||
queries_dim = 1024 | ||
learn_query = True | ||
network_depth = 8 if load_weights else 1 | ||
num_lat_att_per_layer = 6 if load_weights else 1 | ||
spec = PerceiverIOSpec( | ||
input_dim=input_dim, | ||
num_input_axes=num_input_axes, | ||
output_dim=output_dim, | ||
queries_dim=queries_dim, | ||
network_depth=network_depth, | ||
learn_query=learn_query, | ||
query_shape=[1], | ||
num_fourier_freq_bands=64, | ||
num_lat_att_per_layer=num_lat_att_per_layer, | ||
device='cuda', | ||
) | ||
model = perceiver_io_img_classification(spec) | ||
The pretrained perceiver_io_img_classification model is now ready to be used!!! | ||
|
||
Citation | ||
-------- | ||
|
||
:: | ||
|
||
@article{ | ||
title={Perceiver IO: A General Architecture for Structured Inputs & Outputs}, | ||
author={ | ||
Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, | ||
Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, | ||
Andrew Zisserman, Oriol Vinyals and Joāo Carreira | ||
}, | ||
journal={arXiv preprint arXiv:2107.14795}, | ||
year={2022} | ||
} | ||
|
||
|
||
@article{lenton2021ivy, | ||
title={Ivy: Templated deep learning for inter-framework portability}, | ||
author={Lenton, Daniel and Pardo, Fabio and Falck, Fabian and James, Stephen and Clark, Ronald}, | ||
journal={arXiv preprint arXiv:2102.02886}, | ||
year={2021} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo.png?raw=true#gh-light-mode-only | ||
:width: 100% | ||
:class: only-light | ||
|
||
.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo_dark.png?raw=true#gh-dark-mode-only | ||
:width: 100% | ||
:class: only-dark | ||
|
||
|
||
.. raw:: html | ||
|
||
<br/> | ||
<a href="https://pypi.org/project/ivy-models"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://badge.fury.io/py/ivy-models.svg"> | ||
</a> | ||
<a href="https://github.com/unifyai/models/actions?query=workflow%3Adocs"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/docs.yml/badge.svg"> | ||
</a> | ||
<a href="https://github.com/unifyai/models/actions?query=workflow%3Anightly-tests"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/nightly-tests.yml/badge.svg"> | ||
</a> | ||
<a href="https://discord.gg/G4aR9Q7DTN"> | ||
<img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://img.shields.io/discord/799879767196958751?color=blue&label=%20&logo=discord&logoColor=white"> | ||
</a> | ||
<br clear="all" /> | ||
|
||
ViT | ||
=========== | ||
|
||
Vision Transformer `(ViT) <https://arxiv.org/abs/2010.11929>`_ is a neural network architecture for image classification that is based on the Transformer architecture, | ||
which was originally developed for natural language processing tasks. However, | ||
ViT replaces the convolution layers in a convolutional neural network (CNN) with self-attention layers. | ||
|
||
The main idea behind ViT is that an image can be represented as a sequence of image patches, and that these patches can be processed by a Transformer | ||
in the same way that words are processed by a Transformer in a natural language processing task. | ||
To do this, ViT first divides the image into a grid of image patches. Each patch is then flattened into a vector, | ||
and these vectors are then stacked together to form a sequence. This sequence is then passed to a Transformer, | ||
which learns to attend to different patches in the image in order to classify the image. | ||
|
||
|
||
Getting started | ||
----------------- | ||
|
||
.. code-block:: python | ||
import ivy | ||
from ivy_models.vit import vit_h_14 | ||
ivy.set_backend("torch") | ||
# Instantiate vit_h_14 model | ||
ivy_vit_h_14 = vit_h_14(pretrained=True) | ||
The pretrained vit_h_14 model is now ready to be used, and is compatible with any other PyTorch code | ||
|
||
Citation | ||
-------- | ||
|
||
:: | ||
|
||
@article{ | ||
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale}, | ||
author={ | ||
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, | ||
Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit and Neil Houlsby | ||
}, | ||
journal={arXiv preprint arXiv:2010.11929}, | ||
year={2021} | ||
} | ||
|
||
|
||
@article{lenton2021ivy, | ||
title={Ivy: Templated deep learning for inter-framework portability}, | ||
author={Lenton, Daniel and Pardo, Fabio and Falck, Fabian and James, Stephen and Clark, Ronald}, | ||
journal={arXiv preprint arXiv:2102.02886}, | ||
year={2021} | ||
} |