Speaker Verification from Codec-Distorted Speech Through Combination of Affine Transform and Feature Switching

2021 
A high-performance speaker verification system from codec-distorted speech is developed and implemented in this paper. Apriori knowledge of the type of the speech codec is utilized in this. Code excited linear prediction-based codec which is one of the most commonly used codecs in mobile communications is assumed here. A novel method is developed by applying the concepts of feature switching and affine transform for the design and implementation of the proposed speaker verification system. In this system, best feature set for each speaker is identified during training phase from affine transformed speech features to make feature selection more robust. Mel frequency cepstral coefficients and modified power normalized cepstral coefficients are identified as features for feature switching. Feature switching is done using direct method in feature level itself and an indirect method in the i-vector framework. During testing phase, best feature set of the claimed speaker is extracted from the codec-distorted speech and affine transform is applied to reflect the feature space during training. Speaker verification is performed using this affine transformed feature set. Classifiers based on Gaussian mixture model-universal background model and i-vector are used for verification. The performance of the proposed system is tested using two databases, namely TIMIT and VoxCeleb1. For both databases with the above two classifiers, we could achieve very low equal error rate when compared with the other competitive methods available in the literature. Hence, the proposed system is a very good candidate for critical applications like forensic speaker verification.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    48
    References
    0
    Citations
    NaN
    KQI
    []