Blind Speaker Clustering Using Phonetic and Spectral Features in Simulated and Realistic Police Interviews

2012 
In this study, we present a novel approach to the blind automatic segmentation of speakers in police interviews, combining the use of phonetic features like pitch and the statistical pattern recognition of short-term power spectrum features like Mel Frequency Cepstral Coefficients (MFCCs). This approach requires minimal user intervention and allows for easy segmentation of the speech of separate speakers from multi-speaker recordings. This approach can have significant benefit in the harvesting of the speech of a single speaker for use in phonetic and automatic speaker recognition as well as gleaning quick intelligence in surveillance recordings. We propose a two-tiered approach to speaker segmentation, the first using discontinuities in the pitch trajectories to identify potential speaker clusters, and the second using an iterative speaker assignment and training method based on Gaussian mixture models. This approach will be demonstrated using realistic and simulated police witness interviews. Proposed approach and test databases The pitch tracks for the voiced segments are extracted from the interview recording using the autocorrelation-based pitch tracker in Praat (Boersma, 1993). Based on discontinuities in the pitch track, we extract ‘zones of reliability’ for the identity of a speaker. A continuous ‘run’ of similar values in the pitch track provides such a zone of reliability and any significant discontinuities in the pitch track, either in time or frequency, is used to define a candidate transition point between speakers. These candidate transition points are then used to define clusters as illustrated in Figure 1a. We use the clusters with sufficient information to model potential speakers. A statistical model of each cluster is then compared to all other segments in order to get the most divergent pair of segments.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []