A technique for adjusting Gaussian mixture model weights that improves speaker identification performance in the presence of phonemic train/test mismatch.

2011 
Speaker identification is complicated by cases where training material is phonemically deficient. Misclassifications can result either because subsequent test material from that speaker contains primarily the phonemes missing from the training data or because that test material is phonemically most consistent with another talker’s model. This situation can arise in any dialog where, for reasons of brevity and clarity, conventions must be imposed on phraseology. We present here a technique for detecting phonemic deficiencies in a speaker model, and then correcting that model to partially compensate for the biased training data. This technique relies upon a specially constructed universal background model (UBM) from which speaker models are adapted. This UBM is formed by weighting several dozen phoneme GMMs using EM training. As a result, each Gaussian component of the UBM (and of the resulting speaker models) corresponds to a specific phoneme. Analysis of the speaker model weights reveals whether the train...
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []