Releases: NotYuSheng/Transcribe-Translate
v1.3.2
Transcribe & Translate v1.1.0 - Feature Update
Release Date: 11 September 2024
Overview
This release, v1.1.0, introduces two major improvements: the ability to handle concurrent transcription and translation requests, and the removal of local file storage during the processing of uploaded media. These changes provide better performance and flexibility while maintaining the app’s core features for transcribing and translating media files.
Key Features
Concurrency
- The backend now supports concurrent processing of multiple requests, allowing the system to handle multiple transcriptions or translations simultaneously. This ensures faster response times and better scalability for users uploading multiple files or for high-traffic environments.
In-Memory File Processing
- Uploaded media files are now processed directly in memory, without being saved to the local filesystem. This improves processing speed and reduces disk usage, making the application more efficient and suitable for environments with limited storage.
Download
This release improves the overall performance and efficiency of the Transcribe & Translate application, making it faster and more scalable.
What's Changed
- Dev by @NotYuSheng in #10
Full Changelog: v1.0.0...v1.10
v1.3.1
Full Changelog: v1.3.0...v1.3.1
v1.3.0
What's Changed
- Nginx by @NotYuSheng in #17
- Bump send and express in /frontend by @dependabot in #16
- Bump axios from 0.21.4 to 0.28.0 in /frontend by @dependabot in #15
- Bump rollup from 2.79.1 to 2.79.2 in /frontend by @dependabot in #14
Full Changelog: v1.2.0...v1.3.0
v1.2.0
What's Changed
- Bump body-parser and express in /frontend by @dependabot in #11
- Client fix2 by @NotYuSheng in #13
New Contributors
- @dependabot made their first contribution in #11
Full Changelog: v1.10...v1.2.0
Transcribe & Translate v1.0.0 - Initial Release
Release Date: 7 September 2024
Overview
This is the initial release of the Transcribe & Translate, an open-source project that allows users to:
- Transcribe and translate audio and video files.
- Detect the language of uploaded media automatically.
- Export transcriptions and translations in multiple formats (TXT, JSON, SRT, VTT).
- View transcriptions and translations with timestamps.
This release marks v1.0.0, which includes support for handling Whisper models to provide high-quality transcriptions and translations, with an easy-to-use React frontend and FastAPI backend. The application is fully containerized with Docker, making it easy to deploy.
Key Features
Transcription
- Supported Media Types: Audio (MP3, WAV) and Video (MP4, MKV, AVI).
- Model Selection: Choose from multiple preloaded Whisper models (
base
,base.en
,large
) to handle various transcription and translation needs. - Automatic Language Detection: The app automatically detects the language of the media file if not specified by the user.
- Timestamps: Transcriptions are displayed with precise start and end timestamps.
Translation
- Multilingual Support: Translate media into multiple languages with Whisper's powerful translation capabilities.
- Automatic Source Language Detection: If no input language is provided, the app detects the source language automatically.
- Side-by-Side View: When translating, view both the original transcription and its translation side by side.
Export Options
- Export your transcription or translation into the following formats:
- TXT: Simple plain text format.
- JSON: Structured data with timestamps.
- SRT: Subtitle format with time codes.
- VTT: Web Video Text Tracks format for video captioning.
Loading Indicator
- Real-time feedback with loading animations during transcription or translation, along with an elapsed time display once the process completes.
Dynamic Frontend
- The frontend dynamically loads available Whisper models from the backend.
- Provides media preview (video/audio) directly in the browser.
- User-friendly layout with responsive design for different screen sizes.
Dockerized for Easy Deployment
- The project is containerized with Docker, allowing for straightforward setup and deployment.
- Nginx is used to serve the frontend, and FastAPI for the backend.
Installation & Setup
Prerequisites
- Docker and Docker Compose installed.
Steps to Run the Project Locally
-
Clone the repository:
git clone https://github.com/your-repo/transcribe-translate-app.git cd transcribe-translate-app
-
Build and start the Docker containers:
docker-compose up --build
-
Access the app in your browser:
http://localhost:3000
-
The backend API will run on:
http://localhost:8000
Whisper Models
The app downloads and uses pre-trained Whisper models (such as base
, base.en
, and large
) for transcription and translation. These models are stored in a Docker volume for persistent storage and efficient use.
Known Issues
- Performance on Large Files: The application may take a while to process large media files, especially with the larger Whisper models.
- Model Download Time: On the first run, downloading the Whisper models can take a while depending on your internet connection.
Future Enhancements
- Additional Language Models: Adding more language models for extended support.
- Batch Processing: Implementing the ability to transcribe or translate multiple files at once.
- UI Improvements: Further improving the responsiveness and design of the frontend.
- More Export Formats: Adding support for additional export formats like CSV and PDF.
Contributors
- Ong Yu Sheng - Full Stack Developer
Acknowledgments
This project uses OpenAI's Whisper for transcription and translation services. We extend our gratitude to the open-source community for contributing to these fantastic tools.