Release Transcribe & Translate v1.0.0 - Initial Release · NotYuSheng/Transcribe-Translate

Release Date: 7 September 2024

Overview

This is the initial release of the Transcribe & Translate, an open-source project that allows users to:

Transcribe and translate audio and video files.
Detect the language of uploaded media automatically.
Export transcriptions and translations in multiple formats (TXT, JSON, SRT, VTT).
View transcriptions and translations with timestamps.

This release marks v1.0.0, which includes support for handling Whisper models to provide high-quality transcriptions and translations, with an easy-to-use React frontend and FastAPI backend. The application is fully containerized with Docker, making it easy to deploy.

Key Features

Transcription

Supported Media Types: Audio (MP3, WAV) and Video (MP4, MKV, AVI).
Model Selection: Choose from multiple preloaded Whisper models (base, base.en, large) to handle various transcription and translation needs.
Automatic Language Detection: The app automatically detects the language of the media file if not specified by the user.
Timestamps: Transcriptions are displayed with precise start and end timestamps.

Translation

Multilingual Support: Translate media into multiple languages with Whisper's powerful translation capabilities.
Automatic Source Language Detection: If no input language is provided, the app detects the source language automatically.
Side-by-Side View: When translating, view both the original transcription and its translation side by side.

Export Options

Export your transcription or translation into the following formats:
- TXT: Simple plain text format.
- JSON: Structured data with timestamps.
- SRT: Subtitle format with time codes.
- VTT: Web Video Text Tracks format for video captioning.

Loading Indicator

Real-time feedback with loading animations during transcription or translation, along with an elapsed time display once the process completes.

Dynamic Frontend

The frontend dynamically loads available Whisper models from the backend.
Provides media preview (video/audio) directly in the browser.
User-friendly layout with responsive design for different screen sizes.

Dockerized for Easy Deployment

The project is containerized with Docker, allowing for straightforward setup and deployment.
Nginx is used to serve the frontend, and FastAPI for the backend.

Installation & Setup

Prerequisites

Docker and Docker Compose installed.

Steps to Run the Project Locally

Clone the repository:

git clone https://github.com/your-repo/transcribe-translate-app.git
cd transcribe-translate-app

Build and start the Docker containers:
```
docker-compose up --build
```
Access the app in your browser:
```
http://localhost:3000
```
The backend API will run on:
```
http://localhost:8000
```

Whisper Models

The app downloads and uses pre-trained Whisper models (such as base, base.en, and large) for transcription and translation. These models are stored in a Docker volume for persistent storage and efficient use.

Known Issues

Performance on Large Files: The application may take a while to process large media files, especially with the larger Whisper models.
Model Download Time: On the first run, downloading the Whisper models can take a while depending on your internet connection.

Future Enhancements

Additional Language Models: Adding more language models for extended support.
Batch Processing: Implementing the ability to transcribe or translate multiple files at once.
UI Improvements: Further improving the responsiveness and design of the frontend.
More Export Formats: Adding support for additional export formats like CSV and PDF.

Contributors

Ong Yu Sheng - Full Stack Developer

Acknowledgments

This project uses OpenAI's Whisper for transcription and translation services. We extend our gratitude to the open-source community for contributing to these fantastic tools.

Download

Source Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcribe & Translate v1.0.0 - Initial Release