GMM/SVM N-best speaker identification under mismatch channel conditions

Ilija Zeljkovic,Patrick Haffner,Brian Amento,Jay Gordon Wilpon

GMM/SVM N-best speaker identification under mismatch channel conditions

2008

Ilija Zeljkovic
Patrick Haffner
Brian Amento
Jay Gordon Wilpon

Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.

Keywords:

Speech recognition
Gaussian process
Formant
Speaker recognition
Support vector machine
Word error rate
Histogram
Pattern recognition
Artificial intelligence
Communication channel
Computer science
speaker identification
channel error rate

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations