Pyhton code to apply voice activity detector to wave file. Voice activity detector based on ration between energy in speech band and total energy.
- numpy
- scipy
- matplotlib
Input audio data treated as following:
- Convert stereo to mono
- Move a window of 20ms along the audio data
- Calculate ration between energy of speech band and total energy for window
- If ratio is more than threshold (0.6 by default) label windows as speech
- Apply median filter with length of 0.5s to smooth detected speech regions
- Represent speech regions as intervals of time
Create object:
- import vad module
- create instance of class VoiceActivityDetector with full path to wave file
- run method to detect speech regions
- optionally, plot original wave data and detected speech region
Example python script which saves speech intervals in json file:
./detectVoiceInWave.py ./wav-sample.wav ./results.json
Example pyhton code to plot detected speech regions:
from vad import VoiceActivityDetector
filename = '/Users/user/wav-sample.wav'
v = VoiceActivityDetector(filename)
v.plot_detected_speech_regions()