Skip to content

usc-sail/egocentric-fg-speech-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

egocentric-fg-speech-detection

Foreground speech detection in egocentric audio. Current pipeline supports 30s audio files, can extend to support to longer audio files in the future.

Setup instructions

  1. Clone github repo:
git clone https://github.com/usc-sail/egocentric-fg-speech-detection.git
cd egocentric-fg-speech-detection
  1. Install Anaconda and python dependencies
conda create -n fgenv python=3.7
conda activate fgenv
pip install -r requirements.txt
  1. Download pretrained model(s) here. Make sure to move them under the pretrained_models directory. Edit the config.yaml to reflect the path of the pretrained model you want to use.

  2. Run inference! You can process input audio files in either of two ways:

    • Provide a single path to directory containing all of the audio-files (if they are all in one place)
    • Provide a single text file containing complete paths to audio files, one on each line (if they are distributed)

Foreground label predictions are written in a csv file in the format <fileid, posterior, label> - posterior is a confidence measure in the range [0,1] and label is a binary (0/1) value for the absence/presence of foreground speech in the audio-snippet.

  python foreground.py -h
  Usage: foreground.py [-h] [--config CONFIG] audio_paths out_file
  
  Foreground-Speech Detection
  positional arguments:
    audio_paths         text-file containing complete paths to audio-files on each line
                        OR directory containing audio-files
    out_file            Output text-file name
  
  optional arguments:
    -h, --help       show this help message and exit
    --config CONFIG  config file (typically "config.yaml")

example usage:
        python foreground.py wav_files.txt results.csv
        python foreground.py /path/to/wav/files/ results.csv
        python foreground.py --config config.yaml /path/to/wav/files/ results.csv
        

Model comparison

Name Training Dataset AUC F1 Score
LogMel (Hebbar et al.) DSE 0.8 0.72
SAD emb (Hebbar et al.) DSE 0.87 0.82
vit-small DSE 0.88 0.66
vit-small-f128-t6 DSE+Aging 0.84 0.76

About

Egocentric Foreground Speech Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published