Frame-Based Overlapping Speech Detection Using Convolutional Neural Networks

Midia Yousefi,John H. L. Hansen

Frame-Based Overlapping Speech Detection Using Convolutional Neural Networks

2020

Midia Yousefi
John H. L. Hansen

Naturalistic speech recordings usually contain speech signals from multiple speakers. This phenomenon can degrade the performance of speech technologies due to the complexity of tracing and recognizing individual speakers. In this study, we investigate the detection of overlapping speech on segments as short as 25 ms using Convolutional Neural Networks. We evaluate the detection performance using different spectral features, and show that pyknogram features outperforms other commonly used speech features. The proposed system can predict overlapping speech with an accuracy of 84% and Fs-core of 88% on a dataset of mixed speech generated based on the GRID dataset.

Keywords:

Artificial intelligence
Computer science
Pattern recognition
Convolutional neural network
Voice activity detection
frame based

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations