Speaker and Channel Factors in Text-Dependent Speaker Recognition

2016 
We reformulate joint factor analysis so that it can serve as a feature extractor for text-dependent speaker recognition. The new formulation is based on left-to-right modeling with tied mixture HMMs and it is designed to deal with problems such as the inadequacy of subspace methods in modeling speaker-phrase variability, UBM mismatches that arise as a result of variable phonetic content, and the need to exploit text-independent resources in text-dependent speaker recognition. We pass the features extracted by factor analysis to a trainable backend which plays a role analogous to that of PLDA in the i-vector/PLDA cascade in text-independent speaker recognition. We evaluate these methods on a proprietary dataset consisting of English and Urdu passphrases collected in Pakistan. By using both text-independent data and text-dependent data for training purposes and by fusing results obtained with multiple front ends at the score level, we achieved equal error rates of around 1.3% and 2% on the English and Urdu portions of this task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []