Demos for paper submitted to Interspeech2015: Multi-objective Learning and Mask-based Post-processing for Deep Neural Network based Speech Enhancement
*Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai and Chin-Hui LEE, Fellow, IEEE
E-mail: xuyong62@mail.ustc.edu.cn, *Yong Xu
1. Enhancement for 16kHz noisy TIMIT speech:
Fig. 1: Comparison of spectrograms of a 16kHz TIMIT utterance corrupted by Buccaneer1 noise at SNR=5dB: proposed DNN (PESQ=2.815), DNN baseline (PESQ=2.585), LogMMSE (PESQ=2.284), Noisy (bottom left, PESQ=1.591) and clean speech (bottom right, PESQ=4.5)
2. Enhancement for 16kHz real-world noisy speech:
Fig. 2 Spectrograms of a noisy 16kHz utterance extracted from the movie Forrest Gump with: Proposed DNN, DNN baseline, LogMMSE and Noisy speech