Text-Dependent Closed-Set Two-Speaker Recognition of a Key Phrase Uttered Synchronously by Two Persons

2021 
In this paper, we propose a novel text-dependent speaker recognition system using a key phrase uttered synchronously by two persons, i.e., uttered in unison. Hereafter, we refer to this speech as a duo utterance. This proposed system accepts the duo utterance uttered by the enrollment speaker pair but rejects the duo utterance from other speaker pairs and the utterance uttered by a single speaker. The difference from the conventional speaker recognition systems is that the proposed system requires the duo utterance for speaker recognition. Hence, it is expected that the proposed system is a high-level security system because this system requires duo utterance. To realize this proposed system, we have employed a d-vector and a hidden Markov model (HMM). The d-vector is the feature vector extracted by a speaker identification deep neural network (DNN) that is trained to identify the speakers and the speaker pairs using the frame-level acoustic features. Additionally, it is widely known that the HMM is suitable for the speaker model of text-dependent speaker recognition system. To evaluate the proposed method, we conducted speaker identification experiments. Experimental results show that the proposed system achieved higher performance than the MFCC-based system. In addition, we investigated the errors of the experimental results for improving future systems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []