Speaker Recognition Score-Normalization to Compensate for SNR and Duration

Jørgen E. Harmse,Steven D. Beck,Hirotaka Nakasone

Speaker Recognition Score-Normalization to Compensate for SNR and Duration

2006

The decision criterion for automatic speaker verification tests is based on minimization of a weighted sum of the miss and false alarm probabilities. These probabilities are derived from an evaluation of claimant and impostor scores using a representative population of recorded speech samples. However, in applications such as forensic speaker verification, the signal quality and the recording conditions of the speech samples are usually unknown and generally not matched to the evaluation conditions for the defined error probabilities [1]. For example, test samples are often of short duration, have significant noise, and are from uncertain channels. It is therefore necessary to normalize the speaker test scores or to adjust detection thresholds in accordance with the recorded signal conditions. Instead of accounting for all possibilities, evaluations were conducted for a few specific joint combinations of signal-to-noise ratio (SNR) and speech duration for both the training and test sets. A composite regression model was developed to predict the necessary adjustments for any measured value of these conditions. In addition, a method is discussed to interpret the normalized scores relative to a set of desired Type I and Type II error probabilities.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations