Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition
DPSL-ASR is a novel method for end-to-end noise-robust speech recognition. It has extended our prior work IFF-Net (Interactive Feature Fusion Network) with dual-path inputs and style learning, which achieved better ASR performance on RATS Channel-A dataset and CHiME-4 1-Channel Track Dataset.
Left figure: (a) joint SE-ASR approach, (b) IFF-Net baseline, (c) the proposed DPSL-ASR approach.
Right figure: back-end ASR module with style learning and consistency loss in our DPSL-ASR. The dashed lines denote sharing parameters.
If you find DPSL-ASR useful in your research, please use the following BibTeX entry for citation:
@article{hu2022dualpath,
title={Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition},
author={Hu, Yuchen and Hou, Nana and Chen, Chen and Chng, Eng Siong},
journal={arXiv preprint arXiv:2203.14838},
year={2022}
}
@article{hu2021interactive,
title={Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition},
author={Hu, Yuchen and Hou, Nana and Chen, Chen and Chng, Eng Siong},
journal={arXiv preprint arXiv:2110.05267},
year={2021}
}
Our code implementation is based on ESPnet. You can intall it directly using our provided ESPnet(v.0.9.6) folder, or install from official website and then add files from our repo. Use the command pip install -e .
to install ESPnet.
In our foler, the running scripts are at egs2/rats_chA/asr_with_enhancement/{run_rats_chA_dpsl_asr, rats_chA_dpsl_asr}.sh
, and the network code are at espnet2/{asr/, enh/, layers/}
.
Tips:
- To go over the entire project, please start from the script
egs2/rats_chA/asr_with_enhancement/run_rats_chA_dpsl_asr.sh
- To read the network code only, please start from the script
espnet2/asr/dpsl_asr.py