diff --git a/CHANGELOG.md b/CHANGELOG.md
index 0441aed0d..46ed41604 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -17,6 +17,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Removed unused file transfer how to guides
 - Removed `pennylane` as a requirement from notebooks' requirements.txt as it comes with `covalent`
 
+### Docs
+
+- Added voice cloning tutorial
+
 ## [0.233.0-rc.0] - 2024-01-07
 
 ### Authors
diff --git a/doc/source/tutorials/tutorials.rst b/doc/source/tutorials/tutorials.rst
index e426f1cca..d026a7174 100644
--- a/doc/source/tutorials/tutorials.rst
+++ b/doc/source/tutorials/tutorials.rst
@@ -75,6 +75,8 @@ Advanced
      - :doc:`Scalable API backends for LLM and generative AI <./0_ClassicalMachineLearning/genai/source>`
    * - Federated learning
      - :doc:`Federated learning <./federated_learning/source>`
+   * - Voice cloning
+     - :doc:`Voice cloning <./voice_cloning/source>`
 
 ---------------------------------
 
diff --git a/doc/source/tutorials/voice_cloning/Dockerfile_gcp b/doc/source/tutorials/voice_cloning/Dockerfile_gcp
new file mode 100644
index 000000000..01fc77aad
--- /dev/null
+++ b/doc/source/tutorials/voice_cloning/Dockerfile_gcp
@@ -0,0 +1,43 @@
+# Copyright 2023 Agnostiq Inc.
+#
+# This file is part of Covalent.
+#
+# Licensed under the GNU Affero General Public License 3.0 (the "License").
+# A copy of the License may be obtained with this software package or at
+#
+#      https://www.gnu.org/licenses/agpl-3.0.en.html
+#
+# Use of this file is prohibited except in compliance with the License. Any
+# modifications or derivative works of this file must retain this copyright
+# notice, and modified files must contain a notice indicating that they have
+# been altered from the originals.
+#
+# Covalent is distributed in the hope that it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE. See the License for more details.
+#
+# Relief from the License may be granted by purchasing a commercial license.
+
+ARG COVALENT_BASE_IMAGE=python:3.9-slim-buster
+FROM ${COVALENT_BASE_IMAGE}
+
+# Install dependencies
+ARG COVALENT_TASK_ROOT=/covalent
+WORKDIR ${COVALENT_TASK_ROOT}
+
+ARG COVALENT_PACKAGE_VERSION=0.229.0rc0
+ARG PRE_RELEASE=""
+
+COPY requirements.txt requirements.txt
+RUN apt-get update && \
+  apt-get install -y build-essential wget && \
+		pip install -r requirements.txt
+
+COPY covalent_gcpbatch_plugin/exec.py ${COVALENT_TASK_ROOT}
+
+ENV PYTHONPATH ${COVALENT_TASK_ROOT}:${PYTHONPATH}
+
+# Path where the storage bucket will be mounted inside the container
+ENV GCPBATCH_TASK_MOUNTPOINT /mnt/disks/covalent
+
+ENTRYPOINT [ "python", "exec.py" ]
diff --git a/doc/source/tutorials/voice_cloning/assets/streamlit_gcp.gif b/doc/source/tutorials/voice_cloning/assets/streamlit_gcp.gif
new file mode 100644
index 000000000..bc3691a7e
Binary files /dev/null and b/doc/source/tutorials/voice_cloning/assets/streamlit_gcp.gif differ
diff --git a/doc/source/tutorials/voice_cloning/assets/workflow.gif b/doc/source/tutorials/voice_cloning/assets/workflow.gif
new file mode 100644
index 000000000..283fb35e4
Binary files /dev/null and b/doc/source/tutorials/voice_cloning/assets/workflow.gif differ
diff --git a/doc/source/tutorials/voice_cloning/requirements.txt b/doc/source/tutorials/voice_cloning/requirements.txt
new file mode 100644
index 000000000..534f3b8e0
--- /dev/null
+++ b/doc/source/tutorials/voice_cloning/requirements.txt
@@ -0,0 +1,12 @@
+covalent==0.229.0rc0
+covalent-gcpbatch-plugin==0.11.0
+librosa==0.10.0
+pydub==0.25.1
+pytube==15.0.0
+scipy==1.11.3
+soundfile==0.12.1
+streamlit==1.28.1
+torch==2.1.0
+torchaudio==2.1.0
+transformers==4.33.3
+TTS==0.19.1
diff --git a/doc/source/tutorials/voice_cloning/source.ipynb b/doc/source/tutorials/voice_cloning/source.ipynb
new file mode 100644
index 000000000..754c998da
--- /dev/null
+++ b/doc/source/tutorials/voice_cloning/source.ipynb
@@ -0,0 +1,401 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Youtube Video Summarization with Voice Cloning"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Overview\n",
+    "\n",
+    "Our script performs several tasks:\n",
+    "\n",
+    "1. downloads and processes a YouTube video,\n",
+    "2. transcribes the audio from the YouTube video,\n",
+    "3. summarizes the transcription of the transcribed audio, and\n",
+    "4. converts the summary to speech using the user's voice.\n",
+    "\n",
+    "The script leverages Covalent for executing these tasks, either locally or on a cloud platform like GCP.\n",
+    "\n",
+    "## Import Dependencies\n",
+    "\n",
+    "First, we import necessary Python libraries:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import streamlit as st\n",
+    "\n",
+    "from pytube import YouTube\n",
+    "from pydub import AudioSegment\n",
+    "from transformers import pipeline\n",
+    "from TTS.api import TTS"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This section loads libraries for audio processing, machine learning models, and Covalent for workflow management.\n",
+    "\n",
+    "```bash\n",
+    "covalent deploy up gcpbatch\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Set Up Covalent and Dependencies\n",
+    "Covalent simplifies cloud resource management. We define dependencies for each task and configure a Covalent executor for cloud execution. \n",
+    "In the example below we utilize [Google Cloud Batch](https://cloud.google.com/batch) using our [gcp batch executor](https://docs.covalent.xyz/docs/user-documentation/api-reference/executors/gcp/). "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "audio_deps = [\n",
+    "    \"transformers==4.33.3\", \"pydub==0.25.1\",\n",
+    "    \"torchaudio==2.1.0\", \"librosa==0.10.0\",\n",
+    "    \"torch==2.1.0\"\n",
+    "]\n",
+    "text_deps = [\"transformers==4.33.3\", \"torch==2.1.0\"]\n",
+    "tts_deps = audio_deps + [\"TTS==0.19.1\"]\n",
+    "\n",
+    "executor = ct.executor.GCPBatchExecutor(\n",
+    "    container_image_uri=\"docker.io/filipbolt/covalent-gcp-0.229.0rc0\",\n",
+    "    vcpus=4,\n",
+    "    memory=8192,\n",
+    "    time_limit=3000,\n",
+    "    poll_freq=1,\n",
+    "    retries=1\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively, you may use Covalent Cloud to execute this workflow by doing:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import covalent_cloud as cc\n",
+    "\n",
+    "cc_executor = cc.CloudExecutor(num_cpus=4, env=\"genai-env\", memory=8192, time_limit=3000)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Define Covalent Tasks\n",
+    "Each step of our workflow is encapsulated in a Covalent task. Here's an example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@ct.electron\n",
+    "def download_video(url):\n",
+    "    yt = YouTube(url)\n",
+    "    # download file\n",
+    "    out_file = yt.streams.filter(\n",
+    "        only_audio=True, file_extension=\"mp4\"\n",
+    "    ).first().download(\".\")\n",
+    "\n",
+    "    # rename downloaded file\n",
+    "    os.rename(out_file, \"audio.mp4\")\n",
+    "    with open(\"audio.mp4\", \"rb\") as f:\n",
+    "        file_content = f.read()\n",
+    "    return file_content\n",
+    "\n",
+    "\n",
+    "@ct.electron\n",
+    "def load_audio(input_file_content):\n",
+    "    input_path = os.path.join(os.getcwd(), \"file.mp4\")\n",
+    "    # write to file\n",
+    "    with open(input_path, \"wb\") as f:\n",
+    "        f.write(input_file_content)\n",
+    "\n",
+    "    audio_content = AudioSegment.from_file(input_path, format=\"mp4\")\n",
+    "    return audio_content\n",
+    "\n",
+    "\n",
+    "@ct.electron(executor=executor, deps_pip=audio_deps)\n",
+    "def transcribe_audio(audio_content):\n",
+    "    # Export the audio as a WAV file\n",
+    "    audio_content.export(\"audio_file.wav\", format=\"wav\")\n",
+    "\n",
+    "    pipe = pipeline(\n",
+    "        task=\"automatic-speech-recognition\",\n",
+    "        # model=\"openai/whisper-small\",\n",
+    "        model=\"openai/whisper-large-v3\",\n",
+    "        chunk_length_s=30, max_new_tokens=2048,\n",
+    "    )\n",
+    "    transcription = pipe(\"audio_file.wav\")\n",
+    "    return transcription['text']\n",
+    "\n",
+    "\n",
+    "@ct.electron(executor=executor, deps_pip=text_deps)\n",
+    "def summarize_transcription(transcription):\n",
+    "    summarizer = pipeline(\n",
+    "        \"summarization\",\n",
+    "        model=\"facebook/bart-large-cnn\",\n",
+    "    )\n",
+    "    summary = summarizer(\n",
+    "        transcription, min_length=5, max_length=100,\n",
+    "        do_sample=False, truncation=True\n",
+    "    )[0][\"summary_text\"]\n",
+    "    return summary\n",
+    "\n",
+    "\n",
+    "@ct.electron(executor=executor, deps_pip=tts_deps)\n",
+    "def text_to_speech_voice_clone(text, speaker_content, output_file):\n",
+    "    with open(\"speaker.wav\", \"wb\") as f:\n",
+    "        f.write(speaker_content)\n",
+    "\n",
+    "    # agree to service agreement programmatically\n",
+    "    os.environ['COQUI_TOS_AGREED'] = \"1\"\n",
+    "\n",
+    "    tts = TTS(\"tts_models/multilingual/multi-dataset/xtts_v1\")\n",
+    "    tts.tts_to_file(\n",
+    "        text=text,\n",
+    "        file_path=output_file,\n",
+    "        speaker_wav=\"speaker.wav\",\n",
+    "        language=\"en\"\n",
+    "    )\n",
+    "    with open(output_file, \"rb\") as f:\n",
+    "        file_content = f.read()\n",
+    "    return file_content\n",
+    "\n",
+    "\n",
+    "@ct.electron\n",
+    "def load_wav_file(wav_file):\n",
+    "    with open(wav_file, \"rb\") as f:\n",
+    "        file_content = f.read()\n",
+    "    return file_content"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We use the `@ct.electron` decorator to define tasks like `download_video`,`mp4_to_wav`, `transcribe_audio`, etc."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Orchestrate the Workflow\n",
+    "The `@ct.lattice` decorator is used to define the workflow that orchestrates the entire process:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@ct.lattice\n",
+    "def workflow(url, user_voice_file, output_file):\n",
+    "    video_content = download_video(url)\n",
+    "    audio_content = load_audio(video_content)\n",
+    "\n",
+    "    user_voice_content = load_wav_file(user_voice_file)\n",
+    "\n",
+    "    # Use Google Cloud Batch to transcribe, summarize and re-voice\n",
+    "    transcription = transcribe_audio(audio_content)\n",
+    "    summary = summarize_transcription(transcription)\n",
+    "    output_file_content = text_to_speech_voice_clone(\n",
+    "        summary, user_voice_content, output_file\n",
+    "    )\n",
+    "    return summary, transcription, output_file_content"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Streamlit Interface\n",
+    "We use Streamlit to create an interactive web interface for the script:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2024-01-08 14:37:20.421 \n",
+      "  \u001b[33m\u001b[1mWarning:\u001b[0m to view this Streamlit app on a browser, run it with the following\n",
+      "  command:\n",
+      "\n",
+      "    streamlit run /home/filip/miniconda3/envs/google/lib/python3.9/site-packages/ipykernel_launcher.py [ARGUMENTS]\n",
+      "2024-01-08 14:37:20.422 Session state does not function when running a script without `streamlit run`\n"
+     ]
+    }
+   ],
+   "source": [
+    "import streamlit as st\n",
+    "\n",
+    "# Function to display results\n",
+    "def display_results(summary, transcription, audio_file_content):\n",
+    "    display_summary(summary)\n",
+    "    display_full_transcription(transcription)\n",
+    "    display_audio_summary(audio_file_content)\n",
+    "\n",
+    "# Function to display the summary\n",
+    "def display_summary(summary):\n",
+    "    st.subheader(\"YouTube transcription summary:\")\n",
+    "    st.text(summary)\n",
+    "\n",
+    "# Function to display the full transcription with a toggle\n",
+    "def display_full_transcription(transcription):\n",
+    "    st.subheader(\"YouTube full transcription\")\n",
+    "    if st.checkbox(\"Show/Hide\", False):\n",
+    "        st.text(transcription)\n",
+    "\n",
+    "# Function to display the audio summary\n",
+    "def display_audio_summary(audio_file_content):\n",
+    "    st.subheader(\"Summary in your own voice:\")\n",
+    "    st.audio(audio_file_content, format=\"audio/wav\")\n",
+    "\n",
+    "\n",
+    "# Streamlit app layout\n",
+    "def main():\n",
+    "    st.title(\"Summarize YouTube videos in your own voice using AI\")\n",
+    "    speaker_file, speaker_file_path = upload_speaker_file()\n",
+    "    youtube_url = st.text_input(\"Enter valid YouTube URL\")\n",
+    "\n",
+    "    if st.button(\"Process\"):\n",
+    "        process_input(speaker_file, speaker_file_path, youtube_url)\n",
+    "    elif \"transcription\" in st.session_state:\n",
+    "        display_results(\n",
+    "            st.session_state[\"summary\"],\n",
+    "            st.session_state[\"transcription\"],\n",
+    "            st.session_state[\"audio_file_content\"]\n",
+    "        )\n",
+    "\n",
+    "# Function to upload speaker file\n",
+    "def upload_speaker_file():\n",
+    "    speaker_file = st.file_uploader(\"Upload an audio file (WAV)\", type=[\"wav\"])\n",
+    "    if speaker_file:\n",
+    "        st.audio(speaker_file, format=\"audio/wav\")\n",
+    "        speaker_file_path = \"speaker.wav\"\n",
+    "        with open(speaker_file_path, \"wb\") as f:\n",
+    "            f.write(speaker_file.getbuffer())\n",
+    "        return speaker_file, speaker_file_path\n",
+    "    return None, None\n",
+    "\n",
+    "# Function to process the input\n",
+    "def process_input(speaker_file, speaker_file_path, youtube_url):\n",
+    "    if speaker_file and youtube_url:\n",
+    "        audio_file_full_path = os.path.join(os.getcwd(), \"audio.wav\")\n",
+    "        speaker_file_full_path = os.path.join(os.getcwd(), speaker_file_path)\n",
+    "\n",
+    "        dispatch_id = ct.dispatch(workflow)(\n",
+    "            youtube_url, speaker_file_full_path, audio_file_full_path\n",
+    "        )\n",
+    "        with st.spinner(f\"Processing... job dispatch id: {dispatch_id}\"):\n",
+    "            result = ct.get_result(dispatch_id, wait=True)\n",
+    "\n",
+    "        if result:\n",
+    "            summary, transcription, output_file_content = result.result\n",
+    "            st.session_state[\"transcription\"] = transcription\n",
+    "            st.session_state[\"summary\"] = summary\n",
+    "            st.session_state[\"audio_file_content\"] = output_file_content\n",
+    "            display_results(summary, transcription, output_file_content)\n",
+    "        else:\n",
+    "            st.error(\"Something went wrong. Please try again.\")\n",
+    "\n",
+    "main()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Running the Script\n",
+    "To run the script, execute it via Streamlit:\n",
+    "\n",
+    "```bash\n",
+    "streamlit run your_script.py\n",
+    "```\n",
+    "\n",
+    "In the Covalent UI, you should be seeing a workflow like the following\n",
+    "\n",
+    "![alt text](assets/workflow.gif)\n",
+    "\n",
+    "The streamlit app usage would then be:\n",
+    "\n",
+    "![alt text](assets/streamlit_gcp.gif)\n",
+    "\n",
+    "### Customizing the Workflow\n",
+    "You can tailor this script to your specific needs:\n",
+    "\n",
+    "- Modify the Covalent task functions for different processing requirements like swapping one of the models.\n",
+    "- Adjust the Covalent executor settings based on your cloud resource needs.\n",
+    "\n",
+    "### Conclusion\n",
+    "\n",
+    "This tutorial demonstrates using Covalent for fine-tuning a speech summarization model. Covalent's cloud computing abstraction simplifies executing complex workflows, making it a powerful tool for developers and researchers in AI/ML fields."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.18"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}