Persian Whisper Speech Recognition

This project utilizes the Whisper model for automatic speech recognition (ASR) of Persian language audio files. By leveraging state-of-the-art natural language processing techniques, this application transcribes spoken Persian into text, providing a valuable tool for various applications, such as language learning, transcription services, and more.

Project Overview

The Persian Whisper Speech Recognition application allows users to upload audio files and receive transcriptions in real-time. Built using the Whisper model from Hugging Face's Transformers library, it offers a user-friendly interface powered by Gradio.

Challenges Solved

Tokenization: Implemented a tokenizer that efficiently converts audio inputs into text format suitable for processing by the Whisper model.
Audio Format Support: Enabled support for various audio formats, including .ogg, .wav, and .mp3, to ensure a wide range of usability.
Real-time Transcription: Developed an interactive web app that transcribes audio in real-time, allowing users to see immediate results.
User Interface: Created an intuitive user interface using Gradio, making it accessible for users without technical backgrounds.

Features

Upload audio files in various formats (e.g., .ogg, .wav, .mp3).
Transcribe Persian speech into text.
Download the transcription as a .txt file.

Requirements

To run this project, you'll need the following dependencies:

Python 3.7+
torch
transformers
gradio

You can install the required libraries using the following command:

pip install torch transformers gradio

Usage

Clone this repository:

git clone https://github.com/AmirTahaMim/PersianWhisper cd PersianWhisper
Run the Jupyter Notebook:

jupyter PersianWhisper.ipynb
Upload your audio file using the Gradio interface.
View the transcription and download it as a .txt file.

Contributing

Contributions are welcome! If you have suggestions for improvements or features, please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Hugging Face for providing the Whisper model and Transformers library.
Gradio for the user-friendly interface framework.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
PersianWhisperGradio.ipynb		PersianWhisperGradio.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persian Whisper Speech Recognition

Project Overview

Challenges Solved

Features

Requirements

Usage

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

License

AmirTahaMim/PersianWhisper

Folders and files

Latest commit

History

Repository files navigation

Persian Whisper Speech Recognition

Project Overview

Challenges Solved

Features

Requirements

Usage

Contributing

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages