Can voice similarity be assessed using an automatic speaker recognition system

2021 
How Automatic Speaker Recognition (ASR) systems ‘perceive’ voice similarity is of increasing relevance in forensic phonetics. Assessing perceived voice similarity is fundamental to the execution of voice parades (analogous to visual parades) to ensure a comparison between voices that is fair to the suspect. Currently, the assessment of voice similarity is time-consuming and expensive as it involves recruiting naive listener participants. A recent study by Gerlach et al. (2020) showed promising results regarding the relationship between human and machine voice similarity ratings using an ASR system for a group of SSBE speakers. The present study further explores the topic by evaluating the correlation of voice similarity ratings by humans and the ASR system across sets of same- and different-accented English speakers. Results corroborate previous findings: the correlation between ASR and human voice similarity ratings is positive and significant, supporting further investigation of its potential use in voice parade construction.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []