Skip to content

Transcribe & Translate v1.0.0 - Initial Release

Compare
Choose a tag to compare
@NotYuSheng NotYuSheng released this 07 Sep 15:38
· 72 commits to main since this release
9976724

Release Date: 7 September 2024


Overview

This is the initial release of the Transcribe & Translate, an open-source project that allows users to:

  • Transcribe and translate audio and video files.
  • Detect the language of uploaded media automatically.
  • Export transcriptions and translations in multiple formats (TXT, JSON, SRT, VTT).
  • View transcriptions and translations with timestamps.

This release marks v1.0.0, which includes support for handling Whisper models to provide high-quality transcriptions and translations, with an easy-to-use React frontend and FastAPI backend. The application is fully containerized with Docker, making it easy to deploy.


Key Features

Transcription

  • Supported Media Types: Audio (MP3, WAV) and Video (MP4, MKV, AVI).
  • Model Selection: Choose from multiple preloaded Whisper models (base, base.en, large) to handle various transcription and translation needs.
  • Automatic Language Detection: The app automatically detects the language of the media file if not specified by the user.
  • Timestamps: Transcriptions are displayed with precise start and end timestamps.

Translation

  • Multilingual Support: Translate media into multiple languages with Whisper's powerful translation capabilities.
  • Automatic Source Language Detection: If no input language is provided, the app detects the source language automatically.
  • Side-by-Side View: When translating, view both the original transcription and its translation side by side.

Export Options

  • Export your transcription or translation into the following formats:
    • TXT: Simple plain text format.
    • JSON: Structured data with timestamps.
    • SRT: Subtitle format with time codes.
    • VTT: Web Video Text Tracks format for video captioning.

Loading Indicator

  • Real-time feedback with loading animations during transcription or translation, along with an elapsed time display once the process completes.

Dynamic Frontend

  • The frontend dynamically loads available Whisper models from the backend.
  • Provides media preview (video/audio) directly in the browser.
  • User-friendly layout with responsive design for different screen sizes.

Dockerized for Easy Deployment

  • The project is containerized with Docker, allowing for straightforward setup and deployment.
  • Nginx is used to serve the frontend, and FastAPI for the backend.

Installation & Setup

Prerequisites

  • Docker and Docker Compose installed.

Steps to Run the Project Locally

  1. Clone the repository:

    git clone https://github.com/your-repo/transcribe-translate-app.git
    cd transcribe-translate-app
  2. Build and start the Docker containers:

    docker-compose up --build
  3. Access the app in your browser:

    http://localhost:3000
    
  4. The backend API will run on:

    http://localhost:8000
    

Whisper Models

The app downloads and uses pre-trained Whisper models (such as base, base.en, and large) for transcription and translation. These models are stored in a Docker volume for persistent storage and efficient use.


Known Issues

  • Performance on Large Files: The application may take a while to process large media files, especially with the larger Whisper models.
  • Model Download Time: On the first run, downloading the Whisper models can take a while depending on your internet connection.

Future Enhancements

  • Additional Language Models: Adding more language models for extended support.
  • Batch Processing: Implementing the ability to transcribe or translate multiple files at once.
  • UI Improvements: Further improving the responsiveness and design of the frontend.
  • More Export Formats: Adding support for additional export formats like CSV and PDF.

Contributors

  • Ong Yu Sheng - Full Stack Developer

Acknowledgments

This project uses OpenAI's Whisper for transcription and translation services. We extend our gratitude to the open-source community for contributing to these fantastic tools.


Download