added readme file for mlpmizer, perceiver io and vit (#94)

* added readme file for mlpmizer, perceiver io and vit * updated readme files * updated readme files
ivy-llc · Sep 4, 2023 · a6b9b06 · a6b9b06
1 parent 08c8828
commit a6b9b06
Show file tree

Hide file tree

Showing 3 changed files with 258 additions and 0 deletions.
diff --git a/ivy_models/mlpmixer/README.rst b/ivy_models/mlpmixer/README.rst
@@ -0,0 +1,78 @@
+.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo.png?raw=true#gh-light-mode-only
+   :width: 100%
+   :class: only-light
+
+.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo_dark.png?raw=true#gh-dark-mode-only
+   :width: 100%
+   :class: only-dark
+
+
+.. raw:: html
+
+    <br/>
+    <a href="https://pypi.org/project/ivy-models">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://badge.fury.io/py/ivy-models.svg">
+    </a>
+    <a href="https://github.com/unifyai/models/actions?query=workflow%3Adocs">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/docs.yml/badge.svg">
+    </a>
+    <a href="https://github.com/unifyai/models/actions?query=workflow%3Anightly-tests">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/nightly-tests.yml/badge.svg">
+    </a>
+    <a href="https://discord.gg/G4aR9Q7DTN">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://img.shields.io/discord/799879767196958751?color=blue&label=%20&logo=discord&logoColor=white">
+    </a>
+    <br clear="all" />
+
+MLP-Mixer
+===========
+
+`MLP-Mixer <https://arxiv.org/abs/2105.01601>`_ is based entirely on multi-layer perceptrons (MLPs), which are a type of neural network that consists of a stack of linear layers and 
+non-linear activation functions.
+
+The main idea behind MLP-Mixer is that MLPs can be used to learn spatial and channel mixing functions that can be used to extract features from images. 
+MLP-Mixer achieves this by stacking two types of layers. These are the patch mixing layers and the channel mixing layers.
+The patch mixing layers apply MLPs to each patch of the image, independently of the other patches. This allows MLP-Mixer to learn spatial mixing functions that can 
+capture the relationships between different patches in the image.
+The channel mixing layers on the otherhand apply MLPs to the entire image, across all channels. This allows MLP-Mixer to learn channel mixing functions that can 
+capture the relationships between different channels in the image.
+
+
+Getting started
+-----------------
+
+.. code-block:: python
+
+    !pip install huggingface_hub
+    
+    import ivy
+    from ivy_models.mlpmixer import mlpmixer
+    ivy.set_backend("torch")
+
+    # Instantiate mlpmixer model
+    ivy_mlpmixer = mlpmixer(pretrained=True)
+
+The pretrained mlpmixer model is now ready to be used, and is compatible with any other PyTorch code
+
+Citation
+--------
+
+::
+
+    @article{
+      title={MLP-Mixer: An all-MLP Architecture for Vision},
+      author={
+        Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, 
+        Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic and Alexey Dosovitskiy
+      },
+      journal={arXiv preprint arXiv:2105.01601},
+      year={2021}
+    }
+
+
+    @article{lenton2021ivy,
+      title={Ivy: Templated deep learning for inter-framework portability},
+      author={Lenton, Daniel and Pardo, Fabio and Falck, Fabian and James, Stephen and Clark, Ronald},
+      journal={arXiv preprint arXiv:2102.02886},
+      year={2021}
+    }
diff --git a/ivy_models/transformers/README.rst b/ivy_models/transformers/README.rst
@@ -0,0 +1,104 @@
+.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo.png?raw=true#gh-light-mode-only
+   :width: 100%
+   :class: only-light
+
+.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo_dark.png?raw=true#gh-dark-mode-only
+   :width: 100%
+   :class: only-dark
+
+
+.. raw:: html
+
+    <br/>
+    <a href="https://pypi.org/project/ivy-models">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://badge.fury.io/py/ivy-models.svg">
+    </a>
+    <a href="https://github.com/unifyai/models/actions?query=workflow%3Adocs">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/docs.yml/badge.svg">
+    </a>
+    <a href="https://github.com/unifyai/models/actions?query=workflow%3Anightly-tests">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/nightly-tests.yml/badge.svg">
+    </a>
+    <a href="https://discord.gg/G4aR9Q7DTN">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://img.shields.io/discord/799879767196958751?color=blue&label=%20&logo=discord&logoColor=white">
+    </a>
+    <br clear="all" />
+
+Perceiver IO
+===========
+
+`Perceiver IO <https://arxiv.org/abs/2107.14795>`_  is based on the Perceiver architecture, which was originally proposed by Google AI in 2021. Perceiver IO extends the Perceiver architecture 
+by adding a new module called the Querying Module. The Querying Module allows Perceiver IO to produce outputs of arbitrary size and semantics, 
+which makes it a more general-purpose architecture than the Perceiver.
+
+The Perceiver IO architecture consists of three main modules. These are the reading module which takes the input data and encodes it into a latent space, 
+the processing module which refines the latent representation learned by the reading module and the querying module which takes the latent 
+representation from the Processing Module and produces outputs of arbitrary size and semantics.
+
+The Querying Module is the key innovation of Perceiver IO. It works by first constructing a query vector for each output element. 
+The query vector is a representation of the desired output element, and it is constructed using the output-specific features. 
+The Querying Module then uses a self-attention mechanism to attend to the latent representation, and it produces the output element by combining 
+the latent representation with the query vector.
+
+Getting started
+-----------------
+
+.. code-block:: python
+
+    import ivy
+    from ivy_models.transformers.perceiver_io import (
+        PerceiverIOSpec,
+        perceiver_io_img_classification,
+    )
+    ivy.set_backend("torch")
+
+    # params
+    input_dim = 3
+    num_input_axes = 2
+    output_dim = 1000
+    batch_shape = [1]
+    queries_dim = 1024
+    learn_query = True
+    network_depth = 8 if load_weights else 1
+    num_lat_att_per_layer = 6 if load_weights else 1
+
+    spec = PerceiverIOSpec(
+        input_dim=input_dim,
+        num_input_axes=num_input_axes,
+        output_dim=output_dim,
+        queries_dim=queries_dim,
+        network_depth=network_depth,
+        learn_query=learn_query,
+        query_shape=[1],
+        num_fourier_freq_bands=64,
+        num_lat_att_per_layer=num_lat_att_per_layer,
+        device='cuda',
+    )
+
+    model = perceiver_io_img_classification(spec)
+
+The pretrained perceiver_io_img_classification model is now ready to be used!!!
+
+Citation
+--------
+
+::
+
+    @article{
+      title={Perceiver IO: A General Architecture for Structured Inputs & Outputs},
+      author={
+        Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, 
+        Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, 
+        Andrew Zisserman, Oriol Vinyals and Joāo Carreira
+      },
+      journal={arXiv preprint arXiv:2107.14795},
+      year={2022}
+    }
+
+
+    @article{lenton2021ivy,
+      title={Ivy: Templated deep learning for inter-framework portability},
+      author={Lenton, Daniel and Pardo, Fabio and Falck, Fabian and James, Stephen and Clark, Ronald},
+      journal={arXiv preprint arXiv:2102.02886},
+      year={2021}
+    }
diff --git a/ivy_models/vit/README.rst b/ivy_models/vit/README.rst
@@ -0,0 +1,76 @@
+.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo.png?raw=true#gh-light-mode-only
+   :width: 100%
+   :class: only-light
+
+.. image:: https://github.com/unifyai/unifyai.github.io/blob/main/img/externally_linked/logo_dark.png?raw=true#gh-dark-mode-only
+   :width: 100%
+   :class: only-dark
+
+
+.. raw:: html
+
+    <br/>
+    <a href="https://pypi.org/project/ivy-models">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://badge.fury.io/py/ivy-models.svg">
+    </a>
+    <a href="https://github.com/unifyai/models/actions?query=workflow%3Adocs">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/docs.yml/badge.svg">
+    </a>
+    <a href="https://github.com/unifyai/models/actions?query=workflow%3Anightly-tests">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://github.com/unifyai/models/actions/workflows/nightly-tests.yml/badge.svg">
+    </a>
+    <a href="https://discord.gg/G4aR9Q7DTN">
+        <img class="dark-light" style="float: left; padding-right: 4px; padding-bottom: 4px;" src="https://img.shields.io/discord/799879767196958751?color=blue&label=%20&logo=discord&logoColor=white">
+    </a>
+    <br clear="all" />
+
+ViT
+===========
+
+Vision Transformer `(ViT) <https://arxiv.org/abs/2010.11929>`_ is a neural network architecture for image classification that is based on the Transformer architecture, 
+which was originally developed for natural language processing tasks. However, 
+ViT replaces the convolution layers in a convolutional neural network (CNN) with self-attention layers.
+
+The main idea behind ViT is that an image can be represented as a sequence of image patches, and that these patches can be processed by a Transformer 
+in the same way that words are processed by a Transformer in a natural language processing task. 
+To do this, ViT first divides the image into a grid of image patches. Each patch is then flattened into a vector, 
+and these vectors are then stacked together to form a sequence. This sequence is then passed to a Transformer, 
+which learns to attend to different patches in the image in order to classify the image.
+
+
+Getting started
+-----------------
+
+.. code-block:: python
+
+    import ivy
+    from ivy_models.vit import vit_h_14
+    ivy.set_backend("torch")
+
+    # Instantiate vit_h_14 model
+    ivy_vit_h_14 = vit_h_14(pretrained=True)
+
+The pretrained vit_h_14 model is now ready to be used, and is compatible with any other PyTorch code
+
+Citation
+--------
+
+::
+
+    @article{
+      title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
+      author={
+        Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, 
+        Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit and Neil Houlsby
+      },
+      journal={arXiv preprint arXiv:2010.11929},
+      year={2021}
+    }
+
+
+    @article{lenton2021ivy,
+      title={Ivy: Templated deep learning for inter-framework portability},
+      author={Lenton, Daniel and Pardo, Fabio and Falck, Fabian and James, Stephen and Clark, Ronald},
+      journal={arXiv preprint arXiv:2102.02886},
+      year={2021}
+    }