T5 Speech-to-Speech Model

Overview

The T5 Speech-to-Speech Model leverages state-of-the-art pretrained models to convert spoken language in real-time. By utilizing a teacher-student training paradigm, this project enables the creation of efficient and accurate speech-to-speech systems suitable for various applications such as virtual assistants, transcription services, and more.

Features

Real-Time Speech Conversion: Convert spoken input into synthesized speech instantly.
Teacher-Student Training Pipeline: Utilize pretrained models as teachers to train a lightweight student model for efficient deployment.
Modular Architecture: Integrates multiple pretrained models including Whisper, ChatGPT, and a Text-to-Speech (TTS) model.
Super-Alignment Training: Advanced training scripts to ensure high-quality alignment between input and synthesized speech.
Extensible Design: Easily extendable to incorporate additional models or functionalities.

Architecture

The teacher model is built upon a three-model pipeline:

Whisper: Handles transcription of input speech.
ChatGPT: Generates textual responses based on transcriptions.
Text-to-Speech (TTS) Model: Synthesizes speech from the generated text.

A student model is trained using the outputs from these pretrained models to perform end-to-end speech-to-speech conversion efficiently.

Repository Structure

├── README.md
├── .gitignore
├── .env
├── llama_manager.py
├── main.py
├── t5_eval.py
├── t5_example.py
├── speech_manager.py
├── train.py
└── requirements.txt

llama_manager.py: Manages interactions with the LLAMA model.
main.py: Entry point for the Speech-to-Speech application.
t5_eval.py: Evaluation scripts for the T5 model.
model.py: Defines the student model architecture.
s2smodel.py: Speech-to-Speech model utilities.
speech_manager.py: Handles speech synthesis and processing.
superalignment_example.py: Example scripts for super-alignment training.
train.py: Training script for the student model.
requirements.txt: List of dependencies.

Installation

Prerequisites

Python 3.8 or higher
Pygame (for audio playback)
Git
Pip

Steps

Clone the Repository

git clone https://github.com/peytontolbert/T5-SpeechtoSpeech.git
cd T5-SpeechtoSpeech

Create a Virtual Environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies
```
pip install -r requirements.txt
```
Setup Environment Variables

Create a .env file in the root directory and add the following:
```
OPENAI_API_KEY=your_openai_api_key
# Add other necessary environment variables here
```

Usage

Running the Application

python main.py

This will start the Speech-to-Speech application, which listens for audio input, processes it, and provides a synthesized speech response in real-time.

Training the Student Model

To train the student model using the pretrained teacher models:

python train.py

Parameters:

learning_rate: Learning rate for the optimizer (default: 1e-4)
num_epochs: Number of training epochs (default: 10)
save_steps: Frequency of saving model checkpoints (default: 2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

T5 Speech-to-Speech Model

Overview

Features

Architecture

Repository Structure

Installation

Prerequisites

Steps

Usage

Running the Application

Training the Student Model

Files

README.md

Latest commit

History

README.md

File metadata and controls

T5 Speech-to-Speech Model

Overview

Features

Architecture

Repository Structure

Installation

Prerequisites

Steps

Usage

Running the Application

Training the Student Model