Text-Dependent Closed-Set Two-Speaker Recognition of a Key Phrase Uttered Synchronously by Two Persons

Toshiyuki Ugawa,Satoru Tsuge,Yasuo Horiuchi,Shingo Kuroiwa

Text-Dependent Closed-Set Two-Speaker Recognition of a Key Phrase Uttered Synchronously by Two Persons

2021

In this paper, we propose a novel text-dependent speaker recognition system using a key phrase uttered synchronously by two persons, i.e., uttered in unison. Hereafter, we refer to this speech as a duo utterance. This proposed system accepts the duo utterance uttered by the enrollment speaker pair but rejects the duo utterance from other speaker pairs and the utterance uttered by a single speaker. The difference from the conventional speaker recognition systems is that the proposed system requires the duo utterance for speaker recognition. Hence, it is expected that the proposed system is a high-level security system because this system requires duo utterance. To realize this proposed system, we have employed a d-vector and a hidden Markov model (HMM). The d-vector is the feature vector extracted by a speaker identification deep neural network (DNN) that is trained to identify the speakers and the speaker pairs using the frame-level acoustic features. Additionally, it is widely known that the HMM is suitable for the speaker model of text-dependent speaker recognition system. To evaluate the proposed method, we conducted speaker identification experiments. Experimental results show that the proposed system achieved higher performance than the MFCC-based system. In addition, we investigated the errors of the experimental results for improving future systems.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations