Significance of CMVN for Replay Spoof Detection

2020 
In this paper, significance of the Cepstral Mean and Variance Normalization (CMVN) is investigated for replay Spoofed Speech Detection (SSD) task. Literature shows that application of the CMVN produces significantly better performance on many feature sets, which is counter-intuitive for replay SSD task. This behaviour is analyzed by performing experiments for environment-independent and dependent cases with % Equal Error Rate (EER) as evaluation metric. Furthermore, analysis is also performed with the help of estimated probability density function (pdf) of the genuine vs. spoof speech feature representations. The experiments are performed on the publicly available and statistically meaningful ASVspoof 2017 version-2 dataset using well-known CQCC-GMM and LFCC-GMM SSD systems. This dataset comprised of seven acoustic environments for replay speech. This study reveals that performance of the SSD system is better with application of the CMVN on environment-independent case. Whereas performance degrades drastically on environment-dependent scenario with application of the CMVN. For this scenario, the CMVN suppresses the transmission channel distortion, which is in fact the discriminative cues for the genuine vs. replay speech signal. This results in degradation of the performance. However, for environment-independent scenario, CMVN scale down the variability in feature space across the different environment, which improves the performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    1
    Citations
    NaN
    KQI
    []