This repository contains the code and models of "Simple Pooling Front-ends for Efficient Audio Classification " [ICASSP 2023].
See usage in main.py:
- frontend: mel-spectrogram computation & temporal dimension reduction (using spectral pooling)
- backbone: CNN-based model for audio feature extraction (pre-trained model is available)
- input: [batch, waveform]
- embedding: [batch, T, 512]
@inproceedings{liu2022simple,
title={Simple Pooling Front-ends For Efficient Audio Classification},
author={Liu, Xubo and Liu, Haohe and Kong, Qiuqiang and Mei, Xinhao and Plumbley, Mark D and Wang, Wenwu},
booktitle = {IEEE International Conference on Acoustic, Speech and Signal Procssing (ICASSP)},
year = {2023}
}