Lip-Reading

This repository contains the codes for lip reading using 3D cross audio-visual Convolutional Neural Networks.
Link to our project report : ref

Brief Description of the project

In this small project, we tried to re-engineer [1], by using similar network architecture, but using our own data and different video and audio preprocessing techniques, as described below. Due to large computational requirements for Audio and Visual Preprocessing, we trained the model on a dummy dataset, with random placeholders for the data, instead of actual intensity values.

Steps to run the code

Audio and Video Preprocessing

Download either VidTimit or the BBC Lip Reading in the Wild datasets and place them in ./dataset/ folder
To extract the lip region (bounding box) using Histogram of Oriented Gradients: cd Visual_Preprocessing. Then run python mouth_cropping_in_video.py for getting the crops of the mouth region from the video.
To run the audio preprocessing: cd Audio_Preperocessing. Then run the file: matlab MMSESTSA84.m, which performs the audio preprocessing using the MMSE STSA method. Another Audio Preprocessing, Voice Activity Detection, which is an energy based method is also supported, which can be run using python unsupervised_vad.py.

Training the CNN Model

To train the CNN model, run python train.py, with the appropriate paths to the audio and video files.

Dependencies

References

3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition. Amirsina Torfi, Seyed Mehdi Iranmanesh, Nasser Nasrabadi, Jeremy Dawson et al. IEEE Access, Volume 5.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Audio_Prepocessing		Audio_Prepocessing
Visual_Preprocessing		Visual_Preprocessing
dataset		dataset
LICENSE		LICENSE
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lip-Reading

Brief Description of the project

Steps to run the code

Audio and Video Preprocessing

Training the CNN Model

Dependencies

References

About

Releases

Packages

Contributors 2

Languages

License

meghbhalerao/lip-reading

Folders and files

Latest commit

History

Repository files navigation

Lip-Reading

Brief Description of the project

Steps to run the code

Audio and Video Preprocessing

Training the CNN Model

Dependencies

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages