Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion

2019 
Automatic speaker verification (ASV) systems in practice are greatly vulnerable to spoofing attacks. The latest voice conversion technologies are able to produce perceptually natural-sounding speech that mimics any target speakers. However, the perceptual closeness to a speaker's identity may not be enough to deceive an ASV system. In this work, we propose a framework that uses the output scores of an ASV system as the feedback to a voice conversion system. The attack framework is a black-box adversary that steals one's voice identity because it requires no knowledge about the ASV system but the system outputs. The target speakers are chosen from CMU-ARCTIC database, while another 282 speakers from the Wall Street Journal corpus are used as the source speakers to perform the studies. It is found that the proposed feedback-controlled voice conversion framework produces adversarial samples that are more deceptive than straightforward voice conversion, thereby boosting the impostor scores in ASV experiments. Further, the perceptual evaluation studies reveal that converted speech thus obtained do not deteriorate significantly from baseline voice conversion system.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    2
    Citations
    NaN
    KQI
    []