When Speaker Recognition Meets Noisy Labels: Optimizations for Front-Ends and Back-Ends

2022 
A typical speaker recognition system often involves two modules: a feature extractor front-end and a speaker identification back-end. Despite the superior performance that deep neural networks have achieved for the front-end, their success benefits from the availability of large-scale, correctly labeled datasets. While label noise is unavoidable in speaker recognition datasets, both the front-end and back-end are affected by label noise, which degrades speaker recognition performance. In this paper, we first conduct comprehensive experiments to help improve our understanding of the effects of label noise on both the front-end and back-end. Then, we propose a simple yet effective training paradigm and loss correction method to handle label noise in the front-end. We combine our proposed method with the recently proposed Bayesian estimation of PLDA for noisy labels, and the whole system shows strong robustness to label noise. Furthermore, we show two practical applications of the improved system: one application corrects noisy labels based on an utterance’s chunk-level predictions, and the other algorithmically filters out high-confidence noisy samples within a dataset. By applying the second application to the NIST SRE04–10 dataset and verifying filtered utterances by human validation, we identify that approximately 1% of the NIST SRE04–10 dataset is made up of label errors.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []