Demos for paper submitted to Interspeech2015: Multi-objective Learning and Mask-based Post-processing for Deep Neural Network based Speech Enhancement

*Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai and Chin-Hui LEE, Fellow, IEEE

E-mail: xuyong62@mail.ustc.edu.cn, *Yong Xu

 

1. Enhancement for 16kHz noisy TIMIT speech:

Proposed DNN

Baseline DNN

LogMMSE

Noisy

Clean

Fig. 1: Comparison of spectrograms of a 16kHz TIMIT utterance corrupted by Buccaneer1 noise at SNR=5dB: proposed DNN (PESQ=2.815), DNN baseline (PESQ=2.585), LogMMSE (PESQ=2.284), Noisy (bottom left, PESQ=1.591) and clean speech (bottom right, PESQ=4.5)

2. Enhancement for 16kHz real-world noisy speech:

Proposed DNN

Baseline DNN

LogMMSE

Noisy

Fig. 2 Spectrograms of a noisy 16kHz utterance extracted from the movie Forrest Gump with: Proposed DNN, DNN baseline, LogMMSE and Noisy speech