Automatic speech recognition fusion approach to unsupervised speaker clustering and labeling

Aaron Lawson,Mark C. Huggins,John J. Grieco,S.A. Galligan,David M. Harris

Automatic speech recognition fusion approach to unsupervised speaker clustering and labeling

2006

Aaron Lawson
Mark C. Huggins
John J. Grieco
S.A. Galligan
David M. Harris

This paper describes a fully unsupervised approach to speaker clustering and labeling employing speech recognition (ASR) technology to bootstrap speaker identification (SID). An algorithm that combined these two technologies was able to correctly cluster and label 299 NATO ship-to-ship transmissions with an accuracy of 89% in an on-line (no a priori training) scenario. This fusion approach out-performed ASR alone by 23.6%, and outperformed manually-trained VQ-SID by 12.7% and GMM/UMB-SID by 8.6%. This paper demonstrates that, under certain circumstances, unsupervised, self-organizing systems can be more effective than manually-trained ones.

Keywords:

Speaker recognition
Fusion
Speech recognition
Speaker diarisation
Cluster analysis
Engineering
Artificial intelligence
Pattern recognition
Bootstrapping (electronics)
speaker identification
A priori and a posteriori
Mobile radio

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations