PAPER Special Section on Processing Natural Speech Variability for Improved Verbal Human-Computer Interaction Improvements of the One-to-Many Eigenvoice Conversion System

Yamato Ohtani,Tomoki Toda,Hiroshi Saruwatari,Kiyohiro Shikano

PAPER Special Section on Processing Natural Speech Variability for Improved Verbal Human-Computer Interaction Improvements of the One-to-Many Eigenvoice Conversion System

2010

SUMMARY We have developed a one-to-many eigenvoice conversion (EVC) system that allows us to convert a single source speaker’s voice into an arbitrary target speaker’s voice using an eigenvoice Gaussian mixture model (EV-GMM). This system is capable of effectively building a conversion model for an arbitrary target speaker by adapting the EV-GMM using only a small amount of speech data uttered by the target speaker in a text-independent manner. However, the conversion performance is still insufficient for the following reasons: 1) the excitation signal is not precisely modeled; 2) the oversmoothing of the converted spectrum causes muffled sounds in converted speech; and 3) the conversion model is affected by redundant acoustic variations among a lot of pre-stored target speakers used for building the EV-GMM. In order to address these problems, we apply the following promising techniques to one-to-many EVC: 1) mixed excitation; 2) a conversion algorithm considering global variance; and 3) adaptive training of the EV-GMM. The experimental results demonstrate that

Keywords:

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations