Skip to content

Latest commit

 

History

History
4 lines (3 loc) · 481 Bytes

README.md

File metadata and controls

4 lines (3 loc) · 481 Bytes

AVSpeech-Filtering

Scripts for filtering AVSpeech data using a vision and language transformer. The videos were filtered to select samples with high audio-visual correspondence. The filtered data was used for Self-Supervised Visual-Acoustic Matching.

Note: Github often has issues rendering python notebooks, so the analysis notebook can also be viewed here