Supervised speech enhancement using online Group-Sparse Convolutive NMF

In supervised speech enhancement methods based on Non-negative Matrix Factorization (NMF), signals are described as linear combinations of dictionary atoms. In order to learn dictionary atoms capable of revealing the hidden structure in speech, long temporal context of speech signals must be considered. In contrast to the standard NMF, convolutive model has an advantage of finding repeated patterns possessed by many realistic signals. Learning spectro-temporal atoms spanning several consecutive frames is done through training large volumes of data-sets which places unrealistic demand on computation power and memory. In this paper a new algorithm based on Convolutive NMF is proposed to identify automatically temporal patterns in speech without the two mentioned obstacles. Online approach is addressed to save memory in processing large data-sets. To tackle the problem of large computation power, group sparsity constraint is employed. The results of the proposed algorithm show that using online Group-Sparse Convolutive NMF algorithm can significantly increase the enhanced clean speech PESQ.
    • Correction
    • Source
    • Cite
    • Save