Face Detection, Face Recognition, Audio Detection and Classification to determine cheating behaviour using CNN and pretrained model
Use the library dlib to extract faces from a frame recorded by OpenCV and use face_recognition library to extract facial encodings from both the frame and a given student's photo. Compare this facial encodings to detect cheating behaviour.
Transform audio signals with duration of 2.5 seconds from 5 categories: Computer Keyboard, Working, Whispering, Speech, and Siren into melspectrogram by applying STFT on overlapping windows, typically 25 ms with 10 ms stride, take the Power Spectrum and apply Mel-filterbanks. This melspectrogram is fed into the CNN to learn features and result in a model that can classify effectively this sound events. The melspectrogram are done by a package called Kapre as a direct input layer of the CNN.
Dataset are 1486 wavefiles with duration of 2.5 seconds collected from multiple sources: AudioSet by Google, ESC-50, and self-recorded. The data are time-shifted right and left randomly to increase variation to the data.
Model performed well during crossvalidation and during training. The model also achieved 99% precision and recall score and 95% for accuracy on a seperate never-before-seen test set. Run evaluate.py to see the scores.
- Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What's In-Between
- Build a Deep Audio Classifier with Python and Tensorflow
- Deep Learning for Audio Classification (kapre version)
- Build a Deep CNN Image Classifier with ANY Images
- Audio Classification with Machine Learning (EuroPython 2019)
- Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras