This repository contains the PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing", the Sheffield entry for the first Clarity enhancement challenge (CEC1). The system consists of a Conv-TasNet based denoising module, and a finite-inpulse-response (FIR) filter based amplification module. A differentiable approximation to the Cambridge MSBG model released in the CEC1 is used in the loss function.
Please see the latest more organised version in the Clarity repo: https://github.com/claritychallenge/clarity/tree/main/recipes/cec1/e009_sheffield
To run the training recipe of the amplification module, the MSBG package and PyTorch STOI are needed.
To build the overall system, the Conv-TasNet based denoising module needs to be trained in the first stage, and the scripts are in the recipe_den_convtasnet. The FIR based amplification module is trained in the second stage, and the scripts are in the recipe_amp_fir. The MBSTOI folder contains the MBSTOI implementation from the CEC1 project, with also the DBSTOI implementation.
- [1] Luo Y, Mesgarani N. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation[J]. IEEE/ACM transactions on audio, speech, and language processing, 2019, 27(8): 1256-1266.
- [2] Andersen A H, de Haan J M, Tan Z H, et al. Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions[J]. Speech Communication, 2018, 102: 1-13.
- [3] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen 'A Short-Time Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech', ICASSP 2010, Texas, Dallas.
If you use this work, please cite:
@article{tutwo,
title={A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing},
author={Tu, Zehai and Zhang, Jisi and Ma, Ning and Barker, Jon},
year={2021},
booktitle={The Clarity Workshop on Machine Learning Challenges for Hearing Aids (Clarity-2021)},
}