Transcribe & Translate v1.0.0 - Initial Release
Release Date: 7 September 2024
Overview
This is the initial release of the Transcribe & Translate, an open-source project that allows users to:
- Transcribe and translate audio and video files.
- Detect the language of uploaded media automatically.
- Export transcriptions and translations in multiple formats (TXT, JSON, SRT, VTT).
- View transcriptions and translations with timestamps.
This release marks v1.0.0, which includes support for handling Whisper models to provide high-quality transcriptions and translations, with an easy-to-use React frontend and FastAPI backend. The application is fully containerized with Docker, making it easy to deploy.
Key Features
Transcription
- Supported Media Types: Audio (MP3, WAV) and Video (MP4, MKV, AVI).
- Model Selection: Choose from multiple preloaded Whisper models (
base
,base.en
,large
) to handle various transcription and translation needs. - Automatic Language Detection: The app automatically detects the language of the media file if not specified by the user.
- Timestamps: Transcriptions are displayed with precise start and end timestamps.
Translation
- Multilingual Support: Translate media into multiple languages with Whisper's powerful translation capabilities.
- Automatic Source Language Detection: If no input language is provided, the app detects the source language automatically.
- Side-by-Side View: When translating, view both the original transcription and its translation side by side.
Export Options
- Export your transcription or translation into the following formats:
- TXT: Simple plain text format.
- JSON: Structured data with timestamps.
- SRT: Subtitle format with time codes.
- VTT: Web Video Text Tracks format for video captioning.
Loading Indicator
- Real-time feedback with loading animations during transcription or translation, along with an elapsed time display once the process completes.
Dynamic Frontend
- The frontend dynamically loads available Whisper models from the backend.
- Provides media preview (video/audio) directly in the browser.
- User-friendly layout with responsive design for different screen sizes.
Dockerized for Easy Deployment
- The project is containerized with Docker, allowing for straightforward setup and deployment.
- Nginx is used to serve the frontend, and FastAPI for the backend.
Installation & Setup
Prerequisites
- Docker and Docker Compose installed.
Steps to Run the Project Locally
-
Clone the repository:
git clone https://github.com/your-repo/transcribe-translate-app.git cd transcribe-translate-app
-
Build and start the Docker containers:
docker-compose up --build
-
Access the app in your browser:
http://localhost:3000
-
The backend API will run on:
http://localhost:8000
Whisper Models
The app downloads and uses pre-trained Whisper models (such as base
, base.en
, and large
) for transcription and translation. These models are stored in a Docker volume for persistent storage and efficient use.
Known Issues
- Performance on Large Files: The application may take a while to process large media files, especially with the larger Whisper models.
- Model Download Time: On the first run, downloading the Whisper models can take a while depending on your internet connection.
Future Enhancements
- Additional Language Models: Adding more language models for extended support.
- Batch Processing: Implementing the ability to transcribe or translate multiple files at once.
- UI Improvements: Further improving the responsiveness and design of the frontend.
- More Export Formats: Adding support for additional export formats like CSV and PDF.
Contributors
- Ong Yu Sheng - Full Stack Developer
Acknowledgments
This project uses OpenAI's Whisper for transcription and translation services. We extend our gratitude to the open-source community for contributing to these fantastic tools.